The Telugu Web 2021 corpus, with 100+ million words and part-of-speech tagging, is now available in Sketch Engine! #corpuslinguistics, #digitalhumanities, #linguistics
www.sketchengine.eu/tetenten-tel...
@sketchengine.eu
Sketch Engine is a linguistic search engine and corpus query system with text analysis tools and corpora in 100+ languages. Concordance, n-grams, term extraction, co-occurrences (Word Sketch) are only some of its features.
The Telugu Web 2021 corpus, with 100+ million words and part-of-speech tagging, is now available in Sketch Engine! #corpuslinguistics, #digitalhumanities, #linguistics
www.sketchengine.eu/tetenten-tel...
Registration is open for Lexicom 2026 in Palermo ๐ฎ๐น! Since 2001, this workshop in #lexicography and #corpuslinguistics has welcomed 700+ participants worldwide. Join the community and take part in the next edition, 14โ18 September 2026.
๐ lexicom.courses/lexicom-2026...
Our new Chinese corpus in Traditional Chinese (็น้ซๅญ) is now available. It is part-of-speech tagged and partly annotated for topics and genres. A useful resource for research and language teaching. #corpuslinguistics #digitalhumanities
www.sketchengine.eu/zhtenten-chi...
Weโve published a new Chinese corpus in Simplified Chinese (็ฎไฝๅญ). It is part-of-speech tagged and partly annotated for topics and genres. A useful resource for research and language technology. #corpuslinguistics #linguistics #nlp
www.sketchengine.eu/zhtenten-chi...
An example of Sketch Engine used outside the field of pure linguistics. This study in media discourse analysis will be published in @nature.com www.nature.com/articles/s41... #MediaRepresentation #discourseanalysis #corpuslinguistics
Weโve published the Urdu Corpus 2021 in Sketch Engine, with 328 million words and topic and genre classification. Urdu is the 11th most spoken language worldwide (Ethnologue, 2025).
๐ www.sketchengine.eu/urtenten-urd...
#corpuslinguistics #TextAnalysis #ุงุฑุฏู
The new Latvian Corpus 2021 now available in Sketch Engine. The corpus is enriched with part-of-speech tagging and lemmatization. Perfect for #corpuslinguistics, #digitalhumanities, #linguistics, #lexicography, and #nlp.
๐ข Registration is open for Lexicom 2026 in Palermo ๐ฎ๐น! Apply for this hands-on workshop on #lexicography, #corpuslinguistics, and #dictionaries. Learn from experts, explore new tools, and build your skills.
๐
14โ18 September 2026
๐ lexicom.courses/lexicom-2026...
You can search for multiple variants at the same time in the Word Sketch tool. Just add a comma between them โ Christmas, Xmas โ to see the results for both: ske.li/bav0
The Word Sketch tool automatically separates senses and organizes collocations by their meaning, so you can analyze exactly what you want: ske.li/064
#collocations
Take the step to become a master in corpus analysis and corpus building with Sketch Engine. Choose from our online or face-to-face course options. Learn more at www.sketchengine.eu/bootcamp
#corpuslinguistics #TextAnalysis #appliedlinguistics
Find all n-grams containing the word โChristmasโ: ske.li/060
#ngrams
Parallel Concordance is an easy way to find translations of fixed expressions (idioms?) in multiple languages. Check out how others wish Merry Christmas at: ske.li/066
Lexicom 2026 is open for registration! Held in Palermo ๐ฎ๐น, 14โ18 September. Join 700+ graduates worldwide who have already attended this workshop in #lexicography, #corpuslinguistics, #dictionaries, and lexical computing.
๐ lexicom.courses/lexicom-2026...
Word Sketch can group collocations by meaning, so you can instantly tell which "bow" they belong to. The #collocations related to a decorative bow are highlighted in blue at ske.li/068
#wordsense
Curious how many words start with "snow"? Snowball, snowman, snowflake... Use the Wordlist tool to generate the full list from our billion-word corpora: ske.li/bam9
Among the 800+ corpora that we offer, you can also find many surprises โ such as OpenSubtitles, corpora made up of translated movie subtitles. Can you guess the movie where Christmas is mentioned the most? ske.li/bam7
#corpuslinguistics #opensubtitles
The Word Sketch tool organizes every collocation into clear grammatical categories, making it easy to navigate through the data: ske.li/06q
Compare all your favorite Christmas treats in one go. Use the "From this list" feature in the Wordlist tool to analyze the whole dessert table simultaneously: ske.li/banf
#wordlist
With the English Trends corpus, you can compare the most prominent topics of each month. See what we typically talk about in December: ske.li/ban0
#trendingwords #trendingtopics
Translated into over 300 languages, Silent Night is one of the most famous carols. Check out its #translations to other languages in the parallel concordance: ske.li/065
#paralleltexts
What sets Christmas apart from other holidays? Find unique collocations for each of them in the Word Sketch Difference tool: ske.li/babp
Word Sketches can also be generated for multi-word expressions. Find the strongest collocations for โChristmas treeโ at ske.li/06v
#collocation
www.sketchengine.eu/guide/word-s...
While we can't find two identical snowflakes, you can find words similar to "snowflake" when we look in a thesaurus: ske.li/06s
#thesaurus #similarwords
No existing corpus that fits your niche research topic? Build your own corpus! With seed words, the corpus theme might be anything โ even Christmas. www.sketchengine.eu/guide/create...
#textdata #textcorpus
Our Advent 2025 series starts today!๐
See how often snow appeared in past years with the Timeline tool in our Trends corpora: ske.li/06x
More small insights with Sketch Engine coming soon.
#corpuslinguistics #languagedata
Sketch Engine towel on the shore of Lake Bled, Slovenia.
Sketch Engine in โ2 ยฐC air and +11 ยฐC water. Still working ๐A good memory of our eLex 2025 days in Bled ๐ธ๐ฎ
Photo by Madis Jรผrviste, the Institute of the Estonian Language.
It was a pleasure to meet friends, users and the wider lexicographic community. Thanks for visiting our booth.
elex.link/elex2025
#elex2025 #lexicography
The Sketch Engine team is at the eLex conference in Bled, Slovenia!
Thanks to Michael Rundell for his fascinating talk on the changes in lexicography.
If you want to hear more from us, be sure to catch Ondลej Herman's talk on the development of monitor corpora today at 17:30!
elex.link/elex2025/
A major milestone for our English Trends corpus: 100 billion tokens (โ 86 billion words) since 2014, with 70 million words added each week. Itโs available with a free trial.
๐ www.sketchengine.eu/english-tren...
#corpuslinguistics #bigdata #language