Skip to content

Frequently Asked Questions

How can I refer to TCSE?

Please cite the following when you publish work which utilizes TCSE:

  • Hasebe, Yoichiro. (2015) Design and Implementation of an Online Corpus of Presentation Transcripts of TED Talks. Procedia: Social and Behavioral Sciences 198(24), 174-182.

What are the terms of use of TED data?

TCSE uses data provided by TED under the Creative Commons BY-NC-ND license.

TCSE is made available free for non-commercial educational and scientific use, but please use this system at your own risk. All materials and information are provided "as is," with no warranties or guarantees whatsoever.

TCSE is created by Yoichiro Hasebe (yohasebe@gmail.com) at Doshisha University, Kyoto, Japan.

What version of TCSE is currently running?

The current version is 12.0.0, containing 6,419 TED Talks. All transcript data is annotated using spaCy 3.8 (en_core_web_lg) for part-of-speech tagging, lemmatization, dependency parsing, morphological analysis, and named entity recognition.

How frequently is TCSE database updated?

TCSE is updated periodically with newly added talks, transcriptions, and translations. Thus the statistical data of TCSE as a linguistic corpus continuously change through time.

What are the criteria of translation language choice?

Transcripts of TED Talks are being translated in a number of different languages. The number of talks translated varies from language to language. TCSE offers data of languages in which more than 1,000 talks have been translated. Currently 34 languages are available (aside from English, the language of the original talks).

See the main page of TCSE for numbers of talks translated in each of the languages.

List of translation languages available on TCSE

  • Arabic
  • Bulgarian
  • Burmese
  • Chinese, Simplified
  • Chinese, Traditional
  • Croatian
  • Czech
  • Dutch
  • French
  • German
  • Greek
  • Hebrew
  • Hindi
  • Hungarian
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Kurdish, Central
  • Kurdish, Northern
  • Persian
  • Polish
  • Portuguese
  • Portuguese, Brazilian
  • Romanian
  • Russian
  • Serbian
  • Slovak
  • Spanish
  • Swedish
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese