Skip to content

Advanced search query syntax

TCSE's advanced search mode supports a rich query syntax for linguistic analysis. All annotations are produced by spaCy 3.8 (en_core_web_lg).

Notation

Purpose Notation Example
Lemma [LEMMA] [be], [help]
Part-of-speech {POS} {n}, {v}, {adj}
Surface + POS SURFACE{POS} (no space) help{v}, help{-v}
Lemma + POS [LEMMA]{POS} (no space) [help]{n}, [be]{v}
Negative POS {-POS} help{-v} (help that is NOT a verb)
Dependency / Tag {@DEP} {@nsubj}, {@auxpass}
Morphological feature {#MORPH} {#past}, {#mod}
Named entity %ENTITY %PERSON, %ORG, %GPE
Logical OR A\|B [news\|paper\|article]
AND condition (negative) A&B -word1&word2
Segment onset ^ ^ having {v}
Negative match -WORD -rid
Prefix match +PREFIX +un {adj}
Literal surface form ['SURFACE'] ['s]
Noun chunk placeholder _ [give] _ _
Wildcard (one word) -_ to -_ surprise
Wildcard (multiple words) * to * surprise

Wildcards: _ vs -_ vs *

Notation Matches Use case
_ Exactly one noun chunk (may span multiple words) [give] _ _ matches "give the students a chance"
-_ Exactly one word (any word) to -_ surprise matches "to my surprise", "to our surprise"
* Zero or more words (greedy) to * surprise matches "to his great surprise"

POS Tags

Common POS tags used in queries (case-insensitive). For the complete list of all POS tags, fine-grained tags, dependency labels, and morphological features, see Linguistic Reference.

Tag Meaning Tag Meaning
{n} Noun {v} Verb
{adj} Adjective {adv} Adverb
{p} Adposition (preposition) {dt} Determiner
{prp} Pronoun {conj} Conjunction
{num} Numeral {part} Particle
{intj} Interjection {aux} Auxiliary

Morphological Features

Use {#feature} to search by morphological properties (partial matching on the morph annotation):

Feature Matches
{#past} Past tense forms (Tense: Past)
{#mod} Modal verbs (VerbType: Mod)
{#ger} Gerund forms (VerbForm: Ger)
{#plur} Plural nouns/pronouns (Number: Plur)

Passive voice

spaCy's en_core_web_lg model does not annotate passive voice in the morphological features for English. Use dependency labels instead: {@auxpass} (passive auxiliary) or {@nsubjpass} (passive nominal subject).

Use %ENTITY notation to search for named entities. See Named Entity Search for the full list of 18 entity types.

Example Matches
%PERSON said Named persons followed by "said"
%ORG Organization names
in %GPE "in" followed by a geo-political entity
%DATE Date expressions

Contractions

TCSE's corpus is tokenized by spaCy, which splits contractions into separate tokens. For example, I'm is stored as two tokens: I + 'm. In advanced search, contractions are automatically split to match spaCy's tokenization, so you can type them naturally.

Input Interpreted as Matches
I'm going I 'm going I'm going to ...
don't do n't don't, Don't
let's let 's let's, Let's
Tom's fine Tom 's fine Tom's fine
can't [be] ca n't [be] can't be, can't have been

To search for all forms of a verb including contractions, use lemma notation:

Input Matches
I [be] I am, I'm, I was, I were
[do] n't don't, doesn't, didn't
I [have] I have, I've, I had
I [will] I will, I'll

To disambiguate 's (which can be be, have, or possessive), add a POS filter:

Input Matches
it's all uses of it's
it ['s]{aux} it's = it is (be)
it ['s]{part} its possessive (it's rarely used this way)
Tom [be] Tom is, Tom's (be), Tom was

Already-split input

If you already type the contraction with a space (e.g., I 'm), TCSE will not double-split it. Both I'm and I 'm produce the same results.

Examples

Example Possible Matches
[excite] excite, excites, excited, exciting
{n} nouns of any kind (except for pronouns)
{v} verbs of any kind
to * surprise to our surprise, to his surprise, etc.
[read] {dt} [news\|paper\|article] they read these articles, reading the paper, etc.
^ having {v} Having started the process, Having said that, etc.
[help]{n} an aunt offered financial help, we called people for help, etc.
[help]{v} {p} {v} helped us build, help you keep away, etc.
[get] -rid of get outside of, get ahead of, got tired of, etc.
help{-v} help as a noun (not a verb)
+un {adj} words starting with "un" followed by an adjective
[give] _ _ ditransitive "give" with two noun chunks
['s] the literal surface form "'s"
%PERSON said sentences where a named person said something
{@auxpass} passive auxiliary verbs (was built, been given)
{@nsubj} [be] nominal subjects followed by forms of "be"
{#past} {#past} two consecutive past tense tokens