Collocation Network

The Network tab in Collocation mode provides an interactive visualization of word co-occurrence relationships using a force-directed graph. When you search for a word, the Network tab is shown by default if sufficient collocation data is available; otherwise, the display automatically falls back to the 2-gram tab.

How to access

Click on Collocation to switch to Collocation mode
Enter a search word and click Search
The Network tab is selected by default

Network tab showing collocation graph

Understanding the graph

Nodes represent lemmas (base forms of words). For example, searching for "makes" will show the lemma "make", aggregating all inflected forms (make, makes, made, making).
Edges connect words that frequently co-occur in the corpus. Thicker edges indicate higher MI (Mutual Information) scores.
Node size reflects the word's overall frequency in the corpus.
Node color indicates the part of speech:

Color	Part of speech
Blue	Noun
Red	Verb
Green	Adjective
Orange	Adverb
Gray	Other

Controls

MI threshold

The MI ≥ buttons (2, 3, 4, 5) control the minimum Mutual Information score required for an edge to be displayed. Higher thresholds show only the strongest collocations. When the default MI ≥ 3 yields fewer than 5 nodes, the system automatically lowers the threshold to MI ≥ 2.

Max nodes

The Max nodes buttons (15, 30, 50) control the maximum number of words displayed in the graph. Fewer nodes produce a cleaner, more focused visualization.

Spacing

The Spacing slider (100–800) adjusts the repulsion force between nodes. Higher values spread nodes apart, reducing label overlap. Lower values produce a more compact graph.

Zoom

The Zoom slider (50%–200%) adjusts the magnification level of the graph. Drag the slider to zoom in or out. The current zoom percentage is displayed next to the slider.

Interaction

Pan: Drag the background to move the entire graph
Drag: Drag individual nodes to rearrange the layout
Hover over a node to highlight its connections and show MI scores on edges
Click center node (the search word): Navigate to a token search for [lemma], showing all corpus examples
Click peripheral node: Open a modal showing co-occurrence patterns (2–4 grams) between the two words, sorted by frequency. Click a pattern row to search for those specific examples in the corpus

Filtering criteria

The network applies the following filters to select meaningful collocations:

MI threshold: Minimum Mutual Information score (default ≥ 3, with automatic fallback to ≥ 2)
Frequency: Minimum co-occurrence frequency ≥ 3
Talk count: Minimum number of distinct talks ≥ 3 — a collocation must appear across at least 3 independent TED Talks to be included

The talk count filter leverages the document-level structure of the TED corpus. Since each talk is an independent discourse event by a different speaker, collocations attested across multiple talks provide stronger evidence of genuine linguistic association than those concentrated in a single talk.

Stop words

The following categories of function words are excluded from the network as collocates (but can still be searched as the center word):

Articles: the, a, an
Be verbs: is, was, are, were, be, been, being
Prepositions: of, in, to, for, on, with, at, by, from, about, into, over, after, before, between, through, during
Conjunctions: and, or, but, if, so, than
Negation: not
Auxiliaries: has, have, had, do, does, did, will, would, can, could, may, might, shall, should, must
Pronouns: it, its, this, that, there, their, they, them, he, she, him, her, his, we, our, you, your, who, which, what, how
Quantifiers/determiners: all, some, no, any, each, every, much, many, more, most, such, only
Numerals: one through ten, first, second, third, last, next
High-frequency adverbs: also, just, even, still, back, up, out, then, too, when, where, here

Content words (nouns, verbs, adjectives, most adverbs) are not excluded, as they may form meaningful discourse-level collocations.

Lemma-based aggregation

The network uses lemma-based aggregation: all inflected forms of a word are merged into a single node. For example:

"make", "makes", "made", "making" → single node "make"
"great", "greater", "greatest" → single node "great"

This produces cleaner, more meaningful networks by combining related forms and showing the true strength of collocational relationships across all surface variants.

Tips

Start with a content word (noun, verb, adjective) for the most informative networks
Use the Spacing slider to reduce label overlap in dense networks, and the Zoom slider to adjust magnification
Click peripheral nodes to explore co-occurrence patterns, then click a pattern to see corpus examples
The network complements the Colloc tabs by providing a visual overview of a word's collocational profile