Export search results
Beta feature
This feature is in beta. It is available on the TCSE website, but the interface and data fields may change based on user feedback. Feedback is welcome.
You can export token search results as structured data files for use in linguistic research, statistical analysis, or further processing.
How to export
- Perform a token search (regular, advanced, or translation search)
- When results are displayed, click TSV or JSON in the button bar above the results
- The currently visible page (up to 200 hits) will be downloaded
- To export additional pages, navigate with Prev/Next and click export again
Export formats
TSV (ZIP download)
Downloads a ZIP archive containing two files:
- data.tsv — Tab-separated values file (UTF-8 with BOM for Excel compatibility). Each row is one search hit with all data fields as columns.
- metadata.json — Export metadata including query, total hit count, sampling method, and license information.
This format is best for opening in Excel, Google Sheets, or other spreadsheet applications.
JSON (single file download)
Downloads a single JSON file containing both metadata and data in a structured format:
{
"metadata": { "query": "...", "total_hits": 1234, ... },
"data": [ { "talk_id": 1, "match": "...", ... }, ... ]
}
This format is best for processing with Python, R, or other programming languages.
Data fields
Each exported hit includes:
| Field | Description |
|---|---|
| talk_id | Talk ID number |
| talk_title | Title of the TED Talk |
| speaker | Speaker name |
| year | Year of publication |
| video_type | Talk type (e.g., "TED Stage Talk", "TEDx Talk", "TED-Ed Original") |
| talk_duration | Total talk duration in seconds |
| talk_url | URL to the talk on ted.com |
| segment_id | Unique segment ID (for reproducibility) |
| match | Matched word/phrase (search query for regular search; actual surface form for advanced search) |
| segment_text | Full segment text containing the match |
| segment_position | Position in talk (e.g., "42/187") |
| start_time | Segment start time in seconds |
| duration | Segment duration in seconds |
| context_before_1 | One segment before the match |
| context_before_2 | Two segments before the match |
| context_after_1 | One segment after the match |
| context_after_2 | Two segments after the match |
Advanced search fields
When using Advanced Search, each hit additionally includes:
| Field | Description |
|---|---|
| pos | Part of speech of the matched word(s) |
| lemma | Lemma (base form) of the matched word(s) |
| dep | Dependency relation label |
Translation fields
When a translation language is selected, each hit additionally includes:
| Field | Description |
|---|---|
| translation_lang | Translation language code |
| translation_segment | Translated text of the matching segment |
| translation_context_before_1 | Translation of one segment before |
| translation_context_before_2 | Translation of two segments before |
| translation_context_after_1 | Translation of one segment after |
| translation_context_after_2 | Translation of two segments after |
Pagination and export scope
Each export downloads the currently visible page of search results (up to 200 hits, matching the regular pagination size). This design has several benefits:
- Reproducibility: The same page always produces the same data (when randomize is off). You can share exact data with collaborators.
- WYSIWYG: You export exactly what you see on the screen.
- Full corpus access: To retrieve all hits for a query, navigate through pages (Prev/Next) and export each one. The
pageandtotal_pagesfields in the metadata track your progress.
The metadata includes:
total_hits: Total number of hits for the queryexported_count: Hits in this page (≤ 200)page: Current page numbertotal_pages: Total number of pagesrandomized: Whether results are in random order (true if the Randomize checkbox is on)
A short 5-second cooldown applies between consecutive exports to prevent accidental double-clicks. The TSV/JSON buttons show a countdown tooltip on hover and automatically re-enable when the timer reaches zero.
License
Exported data includes TED Talk transcripts used under the Creative Commons BY-NC-ND 4.0 license. The metadata file includes a license notice. Exported data is intended for research and educational purposes only.
Tips
- To export all results, navigate through pages and export each one. Filenames include the page number (e.g.,
tcse_export_20260411_p3.zip). - The
segment_positionfield helps analyze where in a talk a pattern tends to occur - Combine Advanced Search annotations (POS, lemma, dep) with context for detailed discourse analysis
- For comprehensive corpus-level statistics, use the N-gram and Collocation tabs instead of exporting