https://github.com/google/corpuscrawler
Crawler for linguistic corpora
https://github.com/google/corpuscrawler
corpus-builder corpus-linguistics crawling linguistics minority-language
Last synced: over 1 year ago
JSON representation
Crawler for linguistic corpora
- Host: GitHub
- URL: https://github.com/google/corpuscrawler
- Owner: google
- License: other
- Created: 2017-09-08T22:21:03.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2023-12-05T23:11:10.000Z (over 2 years ago)
- Last Synced: 2025-03-11T14:47:43.873Z (over 1 year ago)
- Topics: corpus-builder, corpus-linguistics, crawling, linguistics, minority-language
- Language: Python
- Size: 487 KB
- Stars: 204
- Watchers: 20
- Forks: 55
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Corpus Crawler
_Corpus Crawler_ is a tool for
[Corpus Linguistics](https://en.wikipedia.org/wiki/Corpus_linguistics).
Modern linguistic research works on language corpora, which are large samples of
“real world” text. This crawler helps to build such corpora: it follows links
to publicly accessible web pages known to be written in a certain language; it
removes boilerplate and HTML markup; finally, it writes its output into
plaintext files. The crawler implements the
[Robots Exclusion Standard](https://en.wikipedia.org/wiki/Robots_exclusion_standard),
and it is intentionally slow so it does not cause much load on the crawled
web sites.
This is not an official Google product. But if you’re a linguistic researcher,
or if you’re writing a spell checker (or similar language-processing software)
for an “exotic” language, you might find _Corpus Crawler_ useful.
To build corpora for not-yet-supported languages, please read the
[contribution guidelines](./CONTRIBUTING.md) and send us
[GitHub pull requests](https://help.github.com/categories/collaborating-with-issues-and-pull-requests/).
The crawled corpora have been used to compute word frequencies in
Unicode’s [Unilex project](https://github.com/unicode-org/unilex).
## Supported Languages
| IETF BCP47 Code | Language | Tokens¹ |
| :------------------ | :--------------------------- | ----------------------------------------------------------------------------------: |
| `aai` | Arifama-Miniafia | 181K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aai.txt) |
| `aak` | Ankave | 194K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aak.txt) |
| `aau` | Abau | 313K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aau.txt) |
| `aaz` | Amarasi | 308K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aaz.txt) |
| `abt` | Ambulas | 297K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/abt.txt) |
| `aby` | Aneme Wake | 233K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aby.txt) |
| `acd` | Gikyode | 323K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/acd.txt) |
| `ace` | Aceh/Acehnese | 817K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ace.txt) |
| `acf` | Saint Lucian Creole French | 236K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/acf.txt) |
| `ach` | Acoli | 178K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ach.txt) |
| `acn` | Achang | 232K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/acn.txt) |
| `acr` | Achi | 239K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/acr.txt) |
| `acu` | Achuar-Shiwiar | 174K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/acu.txt) |
| `ade` | Adele | 267K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ade.txt) |
| `adh` | Adhola | 166K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/adh.txt) |
| `adj` | Adioukrou | 233K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/adj.txt) |
| `ae` | Avestan | 129K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ae.txt) |
| `ae-Latn` | Avestan (Latin) | 141K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ae-Latn.txt) |
| `aey` | Amele | 218K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aey.txt) |
| `agd` | Agarabi | 256K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/agd.txt) |
| `agg` | Angor | 214K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/agg.txt) |
| `agm` | Angaataha | 238K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/agm.txt) |
| `agn` | Agutaynen | 234K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/agn.txt) |
| `agr` | Aguaruna | 149K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/agr.txt) |
| `ahk` | Akha | 367K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ahk.txt) |
| `aia` | Arosi | 223K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aia.txt) |
| `akb` | Batak Angkola | 220K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/akb.txt) |
| `ake` | Akawaio | 190K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ake.txt) |
| `akh` | Akha | 408K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/akh.txt) |
| `akp` | Siwu | 191K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/akp.txt) |
| `alj` | Alangan | 185K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/alj.txt) |
| `alp` | Alune | 225K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/alp.txt) |
| `alt` | Southern Altai | 121K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/alt.txt) |
| `alz` | Alur | 160K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/alz.txt) |
| `am` | Amharic | 2,170K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/am.txt) |
| `ame` | Yanesha' | 221K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ame.txt) |
| `amf` | Hamer-Banna | 152K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/amf.txt) |
| `amk` | Ambai | 229K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/amk.txt) |
| `amm` | Ama (Papua New Guinea) | 246K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/amm.txt) |
| `amn` | Amanab | 207K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/amn.txt) |
| `amp` | Alamblak | 241K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/amp.txt) |
| `amr` | Amarakaeri | 151K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/amr.txt) |
| `amu` | Guerrero Amuzgo | 202K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/amu.txt) |
| `ann` | Obolo | 236K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ann.txt) |
| `anv` | Denya | 214K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/anv.txt) |
| `aoj` | Mufian | 217K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aoj.txt) |
| `aom` | Ömie | 231K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aom.txt) |
| `aon` | Bumbita Arapesh | 294K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aon.txt) |
| `aoz` | Uab Meto | 197K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aoz.txt) |
| `ape` | Bukiyip | 294K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ape.txt) |
| `apr` | Arop-Lokep | 373K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/apr.txt) |
| `apz` | Safeyoka | 235K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/apz.txt) |
| `ar` | Arabic | 19,593K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ar.txt) |
| `arl` | Arabela | 206K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/arl.txt) |
| `asg` | Cishingini | 270K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/asg.txt) |
| `aso` | Dano | 290K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/aso.txt) |
| `ata` | Pele-Ata | 248K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ata.txt) |
| `atb` | Zaiwa | 291K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/atb.txt) |
| `atg` | Ivbie North-Okpela-Arhe | 229K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/atg.txt) |
| `atq` | Aralle-Tabulahan | 202K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/atq.txt) |
| `auy` | Awiyaana | 164K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/auy.txt) |
| `av` | Avaric | 111K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/av.txt) |
| `avn` | Avatime | 229K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/avn.txt) |
| `avt` | Au | 263K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/avt.txt) |
| `avu` | Avokaya | 391K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/avu.txt) |
| `awa` | Awadhi | 211K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/awa.txt) |
| `awb` | Awa (Papua New Guinea) | 179K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/awb.txt) |
| `ay` | Aymara | 482K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ay.txt) |
| `ayo` | Ayoreo | 264K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ayo.txt) |
| `az` | Azerbaijani | 3,413K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/az.txt) |
| `azg` | San Pedro Amuzgos Amuzgo | 271K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/azg.txt) |
| `azz` | Highland Puebla Nahuatl | 265K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/azz.txt) |
| `ba` | Bashkir | 666K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ba.txt) |
| `ban` | Balinese | 211K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ban.txt) |
| `bao` | Waimaha | 232K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bao.txt) |
| `bav` | Vengo | 250K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bav.txt) |
| `bba` | Baatonum | 792K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bba.txt) |
| `bbb` | Barai | 289K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bbb.txt) |
| `bbo` | Northern Bobo Madaré | 211K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bbo.txt) |
| `bbr` | Girawa | 245K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bbr.txt) |
| `bch` | Bariai | 248K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bch.txt) |
| `bcw` | Bana | 304K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bcw.txt) |
| `bdd` | Bunama | 171K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bdd.txt) |
| `be` | Belarusian | 1,441K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/be.txt) |
| `be-tarask` | Belarusian (Taraškievica) | 108,431K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/be-tarask.txt) |
| `bef` | Benabena | 239K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bef.txt) |
| `bep` | Besoa | 204K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bep.txt) |
| `bex` | Jur Modo | 254K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bex.txt) |
| `bfd` | Bafut | 276K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bfd.txt) |
| `bfo` | Malba Birifor | 260K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bfo.txt) |
| `bg` | Bulgarian | 10,597K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bg.txt) |
| `bgr` | Bawm Chin | 213K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bgr.txt) |
| `bgz` | Banggai | 186K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bgz.txt) |
| `bhl` | Bimin | 324K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bhl.txt) |
| `bhw` | Biak | 164K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bhw.txt) |
| `bi` | Bislama | 315K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bi.txt) |
| `bib` | Bissa | 243K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bib.txt) |
| `big` | Biangai | 229K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/big.txt) |
| `bik` | Central Bikol | 183K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bik.txt) |
| `bim` | Bimoba | 215K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bim.txt) |
| `biv` | Southern Birifor | 221K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/biv.txt) |
| `bjr` | Binumarien | 226K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bjr.txt) |
| `bjv` | Bedjond | 268K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bjv.txt) |
| `bkl` | Berik | 306K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bkl.txt) |
| `bku` | Buhid | 204K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bku.txt) |
| `bkv` | Bekwarra | 244K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bkv.txt) |
| `blh` | Kuwaa | 259K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/blh.txt) |
| `blt-Latn` | Tai Dam (Latin) | 262K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/blt-Latn.txt) |
| `blz` | Balantak | 199K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/blz.txt) |
| `bm` | Bambara | 30K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bm.txt) |
| `bmh` | Kein | 253K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bmh.txt) |
| `bmq` | Bomu | 207K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bmq.txt) |
| `bmr` | Muinane | 122K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bmr.txt) |
| `bmu` | Somba-Siawari | 234K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bmu.txt) |
| `bmv` | Bum | 258K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bmv.txt) |
| `bn` | Bangla | 7,258K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bn.txt) |
| `bnj` | Eastern Tawbuid | 239K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bnj.txt) |
| `bnp` | Bola | 263K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bnp.txt) |
| `bo` | Tibetan | 5,642K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bo.txt) |
| `boa` | Bora | 133K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/boa.txt) |
| `boj` | Anjam | 255K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/boj.txt) |
| `bon` | Bine | 244K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bon.txt) |
| `bov` | Tuwuli | 203K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bov.txt) |
| `box` | Buamu | 274K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/box.txt) |
| `bpr` | Koronadal Blaan | 204K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bpr.txt) |
| `bps` | Sarangani Blaan | 214K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bps.txt) |
| `bqc` | Boko | 567K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bqc.txt) |
| `bqj` | Bandial | 175K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bqj.txt) |
| `bqp` | Busa | 162K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bqp.txt) |
| `bru` | Eastern Bru | 261K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bru.txt) |
| `bs` | Bosnian | 8,993K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bs.txt) |
| `bsn` | Barasana-Eduria | 225K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bsn.txt) |
| `bss` | Akoose | 199K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bss.txt) |
| `btd` | Batak Dairi | 192K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/btd.txt) |
| `bts` | Batak Simalungun | 175K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bts.txt) |
| `btt` | Bete-Bendi | 266K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/btt.txt) |
| `btx` | Batak Karo | 189K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/btx.txt) |
| `bua` | Buriat | 143K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bua.txt) |
| `bud` | Ntcham | 207K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bud.txt) |
| `buk` | Bugawac | 264K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/buk.txt) |
| `bus` | Bokobaru | 159K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bus.txt) |
| `bvc` | Baelelea | 308K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bvc.txt) |
| `bvz` | Bauzi | 509K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bvz.txt) |
| `bwq` | Southern Bobo Madaré | 214K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bwq.txt) |
| `bwu` | Buli | 285K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bwu.txt) |
| `byr` | Baruya | 182K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/byr.txt) |
| `byx` | Qaqet | 387K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/byx.txt) |
| `bzh` | Mapos Buang | 251K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bzh.txt) |
| `bzi` | Bisu | 381K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bzi.txt) |
| `bzj` | Belize Kriol English | 240K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/bzj.txt) |
| `ca-valencia` | Valencian | 24,295K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ca-valencia.txt) |
| `caa` | Chortí | 307K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/caa.txt) |
| `cab` | Garifuna | 154K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cab.txt) |
| `cac` | Chuj | 244K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cac.txt) |
| `cak` | Kaqchikel | 259K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cak.txt) |
| `cap` | Chipaya | 154K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cap.txt) |
| `car` | Galibi Carib | 160K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/car.txt) |
| `cax` | Chiquitano | 149K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cax.txt) |
| `cbc` | Carapana | 256K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cbc.txt) |
| `cbi` | Chachi | 187K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cbi.txt) |
| `cbl` | Bualkhaw Chin | 210K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cbl.txt) |
| `cbr` | Cashibo-Cacataibo | 236K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cbr.txt) |
| `cbs` | Cashinahua | 198K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cbs.txt) |
| `cbt` | Chayahuita | 150K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cbt.txt) |
| `cbv` | Cacua | 265K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cbv.txt) |
| `cce` | Chopi | 204K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cce.txt) |
| `ccp` | Chakma | 79K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ccp.txt) |
| `cdf` | Chiru | 193K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cdf.txt) |
| `ce` | Chechen | 669K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ce.txt) |
| `ceb` | Cebuano | 1,067K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ceb.txt) |
| `ceg` | Chamacoco | 232K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ceg.txt) |
| `cfm` | Falam Chin | 438K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cfm.txt) |
| `cgc` | Kagayanen | 299K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cgc.txt) |
| `chj` | Ojitlán Chinantec | 305K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/chj.txt) |
| `chm` | Mari | 132K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/chm.txt) |
| `chr` | Cherokee | 119K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/chr.txt) |
| `chz` | Ozumacín Chinantec | 205K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/chz.txt) |
| `cjo` | Ashéninka Pajonal | 141K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cjo.txt) |
| `cjp` | Cabécar | 199K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cjp.txt) |
| `cjv` | Chuave | 286K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cjv.txt) |
| `cko` | Anufo | 272K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cko.txt) |
| `cle` | Lealao Chinantec | 313K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cle.txt) |
| `cme` | Cerma | 230K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cme.txt) |
| `cmr` | Mro-Khimi Chin | 275K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cmr.txt) |
| `cnh` | Hakha Chin | 934K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cnh.txt) |
| `cni` | Asháninka | 122K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cni.txt) |
| `cnk` | Khumi Chin | 237K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cnk.txt) |
| `cnl` | Lalana Chinantec | 308K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cnl.txt) |
| `cnt` | Tepetotutla Chinantec | 261K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cnt.txt) |
| `coe` | Koreguaje | 181K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/coe.txt) |
| `cof` | Colorado | 183K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cof.txt) |
| `cok` | Santa Teresa Cora | 230K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cok.txt) |
| `con` | Cofán | 151K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/con.txt) |
| `cot` | Caquinte | 128K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cot.txt) |
| `crh` | Crimean Tatar | 505K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/crh.txt) |
| `cs` | Czech | 3,141K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cs.txt) |
| `csk` | Jola-Kasa | 177K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/csk.txt) |
| `cso` | Sochiapam Chinantec | 328K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cso.txt) |
| `ctd-Latn` | Tedim Chin (Latin) | 852K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ctd-Latn.txt) |
| `ctu` | Chol | 203K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ctu.txt) |
| `cub` | Cubeo | 220K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cub.txt) |
| `cuc` | Usila Chinantec | 278K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cuc.txt) |
| `cui` | Cuiba | 292K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cui.txt) |
| `cuk` | San Blas Kuna | 187K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cuk.txt) |
| `cul` | Culina | 221K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cul.txt) |
| `cv` | Chuvash | 111K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cv.txt) |
| `cwe` | Kwere | 144K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cwe.txt) |
| `cwt` | Kuwaataay | 168K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cwt.txt) |
| `cy` | Welsh | 11,519K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cy.txt) |
| `cya` | Nopala Chatino | 245K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/cya.txt) |
| `czt` | Zotung Chin | 227K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/czt.txt) |
| `da` | Danish | 655K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/da.txt) |
| `daa` | Dangaléat | 208K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/daa.txt) |
| `dad` | Marik | 197K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dad.txt) |
| `dah` | Gwahatike | 274K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dah.txt) |
| `ddn` | Dendi | 210K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ddn.txt) |
| `de` | German | 46,431K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/de.txt) |
| `ded` | Dedua | 146K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ded.txt) |
| `des` | Desano | 210K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/des.txt) |
| `dga` | Southern Dagaare | 458K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dga.txt) |
| `dgi` | Northern Dagara | 257K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dgi.txt) |
| `dgz` | Daga | 219K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dgz.txt) |
| `din` | Southwestern Dinka | 196K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/din.txt) |
| `dip` | Northeastern Dinka | 193K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dip.txt) |
| `djk` | Eastern Maroon Creole | 307K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/djk.txt) |
| `dln` | Darlong | 776K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dln.txt) |
| `dnw` | Western Dani | 254K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dnw.txt) |
| `dob` | Dobu | 179K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dob.txt) |
| `dop` | Lukpa | 226K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dop.txt) |
| `dsh` | Daasanach | 211K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dsh.txt) |
| `dtb` | Labuk-Kinabatangan Kadazan | 248K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dtb.txt) |
| `dtp` | Kadazan Dusun | 1,038K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dtp.txt) |
| `dts` | Toro So Dogon | 202K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dts.txt) |
| `due` | Umiray Dumaget Agta | 247K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/due.txt) |
| `dug` | Duruma | 172K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dug.txt) |
| `duo` | Dupaninan Agta | 266K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/duo.txt) |
| `dwr` | Dawro | 254K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dwr.txt) |
| `dww` | Dawawa | 208K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dww.txt) |
| `dyi` | Djimini Senoufo | 268K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dyi.txt) |
| `dyo` | Jola-Fonyi | 158K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dyo.txt) |
| `dyu` | Dyula | 1,156K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dyu.txt) |
| `dz` | Dzongkha | 61K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/dz.txt) |
| `ee` | Ewe | 421K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ee.txt) |
| `eka` | Ekajuk | 213K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/eka.txt) |
| `el` | Greek | 5,470K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/el.txt) |
| `emi` | Mussau-Emira | 176K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/emi.txt) |
| `emp` | Northern Emberá | 158K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/emp.txt) |
| `enb` | Markweeta | 147K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/enb.txt) |
| `enq` | Enga | 217K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/enq.txt) |
| `enx` | Enxet | 772K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/enx.txt) |
| `eri` | Ogea | 269K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/eri.txt) |
| `es` | Spanish | 32,670K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/es.txt) |
| `ese` | Ese Ejja | 226K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ese.txt) |
| `et` | Estonian | 3,658K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/et.txt) |
| `eu` | Basque | 130K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/eu.txt) |
| `ewo` | Ewondo | 158K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ewo.txt) |
| `eza` | Ezaa | 963K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/eza.txt) |
| `fa` | Persian | 9,114K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fa.txt) |
| `fa-AF` | Dari | 7,363K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fa-AF.txt) |
| `faa` | Fasu | 238K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/faa.txt) |
| `fai` | Faiwol | 256K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fai.txt) |
| `fal` | South Fali | 198K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fal.txt) |
| `far` | Fataleka | 286K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/far.txt) |
| `fi` | Finnish | 4,837K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fi.txt) |
| `fil` | Tagalog | 184K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fil.txt) |
| `fip` | Fipa | 134K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fip.txt) |
| `fit` | Tornedalen Finnish | 292K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fit.txt) |
| `fj` | Fijian | 257K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fj.txt) |
| `fo` | Faroese | 851K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fo.txt) |
| `fon` | Fon | 266K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fon.txt) |
| `for` | Fore | 169K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/for.txt) |
| `fr` | French | 5,488K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fr.txt) |
| `fue` | Borgu Fulfulde | 148K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fue.txt) |
| `fuf` | Pular | 174K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fuf.txt) |
| `fuq` | Central-Eastern Niger Fulfulde | 156K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fuq.txt) |
| `fuv` | Nigerian Fulfulde | 13K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/fuv.txt) |
| `ga` | Irish | 7,587K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ga.txt) |
| `gag` | Gagauz | 245K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gag.txt) |
| `gah` | Alekano | 210K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gah.txt) |
| `gam` | Kandawo | 250K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gam.txt) |
| `gaw` | Nobonob | 246K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gaw.txt) |
| `gbi` | Galela | 288K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gbi.txt) |
| `gd` | Scottish Gaelic | 17,105K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gd.txt) |
| `gde` | Gude | 217K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gde.txt) |
| `gdn` | Umanakaina | 306K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gdn.txt) |
| `gdr` | Wipi | 271K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gdr.txt) |
| `gej` | Gen | 236K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gej.txt) |
| `gfk` | Patpatar | 294K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gfk.txt) |
| `ghs` | Guhu-Samane | 186K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ghs.txt) |
| `gil` | Gilbertese | 228K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gil.txt) |
| `gkn` | Gokana | 267K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gkn.txt) |
| `gmv-Latn` | Gamo (Latin) | 127K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gmv-Latn.txt) |
| `gn` | Guarani | 142K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gn.txt) |
| `gnd` | Zulgo-Gemzek | 364K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gnd.txt) |
| `gng` | Ngangam | 219K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gng.txt) |
| `gnw` | Western Bolivian Guaraní | 263K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gnw.txt) |
| `gof` | Gofa | 124K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gof.txt) |
| `gog` | Gogo | 173K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gog.txt) |
| `gor` | Gorontalo | 211K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gor.txt) |
| `gqr` | Gor | 218K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gqr.txt) |
| `grb` | Northern Grebo | 270K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/grb.txt) |
| `grt` | Garo | 141K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/grt.txt) |
| `gso` | Southwest Gbaya | 228K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gso.txt) |
| `gsw-u-sd-chag` | Swiss German (Aargau) | 99K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gsw-u-sd-chag.txt) |
| `gsw-u-sd-chbe` | Swiss German (Bern) | 73K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gsw-u-sd-chbe.txt) |
| `gsw-u-sd-chfr` | Swiss German (Fribourg) | 42K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gsw-u-sd-chfr.txt) |
| `gu` | Gujarati | 702K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gu.txt) |
| `gub` | Guajajára | 997K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gub.txt) |
| `guc` | Wayuu | 211K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/guc.txt) |
| `gud` | Yocoboué Dida | 216K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gud.txt) |
| `guh` | Guahibo | 204K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/guh.txt) |
| `gui` | Eastern Bolivian Guaraní | 197K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gui.txt) |
| `gum` | Guambiano | 186K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gum.txt) |
| `gun` | Mbyá Guaraní | 176K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gun.txt) |
| `guo` | Guayabero | 203K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/guo.txt) |
| `guq` | Aché | 184K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/guq.txt) |
| `gur` | Farefare | 240K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gur.txt) |
| `gux` | Gourmanchéma | 215K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gux.txt) |
| `gv` | Manx Gaelic | 152K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gv.txt) |
| `gvc` | Guanano | 241K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gvc.txt) |
| `gvf` | Golin | 276K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gvf.txt) |
| `gvl` | Gulay | 270K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gvl.txt) |
| `gwr` | Gwere | 157K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gwr.txt) |
| `gym` | Ngäbere | 294K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gym.txt) |
| `gyr` | Guarayu | 176K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/gyr.txt) |
| `ha` | Hausa | 1,775K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ha.txt) |
| `hae` | Eastern Oromo | 163K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hae.txt) |
| `hag` | Hanga | 202K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hag.txt) |
| `haw` | Hawaiian | 2,221K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/haw.txt) |
| `hay` | Haya | 112K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hay.txt) |
| `heh` | Hehe | 136K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/heh.txt) |
| `hi` | Hindi | 10,004K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hi.txt) |
| `hif` | Fiji Hindi | 204K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hif.txt) |
| `hig` | Kamwe | 261K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hig.txt) |
| `hil` | Hiligaynon | 208K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hil.txt) |
| `hla` | Halia | 273K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hla.txt) |
| `hne` | Chhattisgarhi | 207K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hne.txt) |
| `hnn` | Hanunoo | 212K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hnn.txt) |
| `hns` | Caribbean Hindustani | 312K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hns.txt) |
| `ho` | Hiri Motu | 240K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ho.txt) |
| `hot` | Hote | 222K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hot.txt) |
| `hr` | Croatian | 8,188K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hr.txt) |
| `ht` | Haitian | 1,101K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ht.txt) |
| `hto` | Minica Huitoto | 182K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hto.txt) |
| `hu` | Hungarian | 600K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hu.txt) |
| `hub` | Huambisa | 160K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hub.txt) |
| `hui` | Huli | 232K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hui.txt) |
| `hus` | Huastec | 236K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hus.txt) |
| `huu` | Murui Huitoto | 165K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/huu.txt) |
| `huv` | San Mateo Del Mar Huave | 197K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/huv.txt) |
| `hvn` | Sabu | 312K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hvn.txt) |
| `hy` | Armenian | 25,972K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/hy.txt) |
| `ian` | Iatmul | 224K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ian.txt) |
| `iba` | Iban | 179K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/iba.txt) |
| `icr` | Islander Creole English | 248K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/icr.txt) |
| `id` | Indonesian | 6,634K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/id.txt) |
| `ifa` | Amganad Ifugao | 810K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ifa.txt) |
| `ifb` | Batad Ifugao | 835K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ifb.txt) |
| `ife` | Ifè | 300K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ife.txt) |
| `ifk` | Tuwali Ifugao | 214K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ifk.txt) |
| `ifu` | Mayoyao Ifugao | 258K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ifu.txt) |
| `ify` | Keley-I Kallahan | 863K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ify.txt) |
| `ig` | Igbo | 13K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ig.txt) |
| `ign` | Ignaciano | 161K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ign.txt) |
| `ik` | Inupiaq | 96K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ik.txt) |
| `ilo` | Iloko | 169K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ilo.txt) |
| `imo` | Imbongu | 280K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/imo.txt) |
| `inb` | Inga | 151K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/inb.txt) |
| `ino` | Inoke-Yate | 236K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ino.txt) |
| `iou` | Tuma-Irumu | 225K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/iou.txt) |
| `ipi` | Ipili | 312K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ipi.txt) |
| `iri` | Irigwe | 243K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/iri.txt) |
| `irk` | Iraqw | 184K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/irk.txt) |
| `iry` | Iraya | 205K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/iry.txt) |
| `it` | Italian | 13,569K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/it.txt) |
| `itv` | Itawit | 242K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/itv.txt) |
| `iu` | Inuktitut | 98K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/iu.txt) |
| `iws` | Sepik Iwam | 307K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/iws.txt) |
| `izr` | Izere | 216K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/izr.txt) |
| `izz` | Izii | 908K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/izz.txt) |
| `ja` | Japanese | 2,116K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ja.txt) |
| `jac` | Popti' | 221K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jac.txt) |
| `jae` | Yabem | 186K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jae.txt) |
| `jam` | Jamaican Creole English | 254K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jam.txt) |
| `jbu` | Jukun Takum | 264K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jbu.txt) |
| `jic` | Tol | 285K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jic.txt) |
| `jiv` | Shuar | 134K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jiv.txt) |
| `jmc` | Machame | 150K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jmc.txt) |
| `jun` | Juang | 178K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jun.txt) |
| `jv` | Javanese | 177K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jv.txt) |
| `jvn` | Caribbean Javanese | 211K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/jvn.txt) |
| `ka` | Georgian | 4,978K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ka.txt) |
| `kaa` | Kara-Kalpak | 135K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kaa.txt) |
| `kab-Arab` | Kabyle (Arabic) | 715K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kab-Arab.txt) |
| `kab-Tfng` | Kabyle (Tifinagh) | 1,338K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kab-Tfng.txt) |
| `kab` | Kabyle | 66K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kab.txt) |
| `kac` | Kachin | 1,057K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kac.txt) |
| `kao` | Xaasongaxango | 205K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kao.txt) |
| `kaq` | Capanahua | 164K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kaq.txt) |
| `kbh` | Camsá | 193K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kbh.txt) |
| `kbm` | Iwal | 298K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kbm.txt) |
| `kbp` | Kabiyè | 571K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kbp.txt) |
| `kbq` | Kamano | 156K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kbq.txt) |
| `kbr` | Kafa | 147K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kbr.txt) |
| `kcg` | Tyap | 279K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kcg.txt) |
| `kdc` | Kutu | 140K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kdc.txt) |
| `kdi` | Kumam | 195K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kdi.txt) |
| `kdj` | Karamojong | 163K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kdj.txt) |
| `kdn` | Kunda | 144K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kdn.txt) |
| `kek` | Kekchí | 406K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kek.txt) |
| `ken` | Kenyang | 200K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ken.txt) |
| `keo` | Kakwa | 215K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/keo.txt) |
| `ker` | Kera | 267K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ker.txt) |
| `kew` | West Kewa | 247K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kew.txt) |
| `kez` | Kukele | 173K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kez.txt) |
| `kgf` | Kube | 175K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kgf.txt) |
| `kgr` | Abun | 356K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kgr.txt) |
| `khz` | Keapara | 196K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/khz.txt) |
| `kia` | Kim | 525K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kia.txt) |
| `kij` | Kilivila | 155K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kij.txt) |
| `kj` | Kuanyama | 1,474K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kj.txt) |
| `kjb` | Q'anjob'al | 263K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kjb.txt) |
| `kje` | Kisar | 235K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kje.txt) |
| `kjh` | Khakas | 128K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kjh.txt) |
| `kjs` | East Kewa | 251K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kjs.txt) |
| `kk` | Kazakh | 642K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kk.txt) |
| `kki` | Kagulu | 125K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kki.txt) |
| `kkj` | Kako | 263K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kkj.txt) |
| `kln` | Kalenjin | 149K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kln.txt) |
| `km` | Khmer | 29,110K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/km.txt) |
| `kma` | Konni | 230K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kma.txt) |
| `kmg` | Kâte | 127K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kmg.txt) |
| `kmo` | Kwoma | 213K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kmo.txt) |
| `kms` | Kamasau | 293K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kms.txt) |
| `kmu` | Kanite | 214K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kmu.txt) |
| `kn` | Kannada | 126K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kn.txt) |
| `kne` | Kankanaey | 230K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kne.txt) |
| `knf` | Mankanya | 164K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/knf.txt) |
| `knj` | Western Kanjobal | 1,350K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/knj.txt) |
| `knk` | Kuranko | 228K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/knk.txt) |
| `kno` | Kono | 360K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kno.txt) |
| `knv` | Tabo | 243K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/knv.txt) |
| `kog` | Cogui | 189K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kog.txt) |
| `kpf` | Komba | 174K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kpf.txt) |
| `kpg` | Kapingamarangi | 967K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kpg.txt) |
| `kpr` | Korafe-Yegha | 262K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kpr.txt) |
| `kpw` | Kobon | 288K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kpw.txt) |
| `kpx` | Mountain Koiali | 190K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kpx.txt) |
| `kpz` | Kupsabiny | 166K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kpz.txt) |
| `kqc` | Doromu-Koki | 209K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kqc.txt) |
| `kqe` | Kalagan | 241K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kqe.txt) |
| `kqp` | Kimré | 254K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kqp.txt) |
| `kqw` | Kandas | 201K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kqw.txt) |
| `kqy` | Koorete | 156K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kqy.txt) |
| `krc` | Karachay-Balkar | 132K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/krc.txt) |
| `kri` | Krio | 256K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kri.txt) |
| `krj` | Kinaray-A | 228K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/krj.txt) |
| `kru` | Kurukh | 182K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kru.txt) |
| `ksd` | Kuanua | 228K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ksd.txt) |
| `ksr` | Borong | 233K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ksr.txt) |
| `ktb` | Kambaata | 113K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ktb.txt) |
| `ktj` | Plapo Krumen | 356K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ktj.txt) |
| `kto` | Kuot | 286K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kto.txt) |
| `ku` | Kurdish | 2,479K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ku.txt) |
| `kub` | Kutep | 281K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kub.txt) |
| `kud` | ‘Auhelawa | 167K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kud.txt) |
| `kue` | Kuman (Papua New Guinea) | 230K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kue.txt) |
| `kum` | Kumyk | 142K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kum.txt) |
| `kup` | Kunimaipa | 279K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kup.txt) |
| `kus` | Kusaal | 200K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kus.txt) |
| `kv` | Komi | 122K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kv.txt) |
| `kvn` | Border Kuna | 212K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kvn.txt) |
| `kwf` | Kwara'ae | 296K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kwf.txt) |
| `kwi` | Awa-Cuaiquer | 165K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kwi.txt) |
| `kwj` | Kwanga | 290K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kwj.txt) |
| `kxc` | Konso | 148K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kxc.txt) |
| `kxm` | Northern Khmer | 257K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kxm.txt) |
| `ky` | Kyrgyz | 18,597K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ky.txt) |
| `kyc` | Kyaka | 220K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kyc.txt) |
| `kyf` | Kouya | 215K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kyf.txt) |
| `kyg` | Keyagana | 190K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kyg.txt) |
| `kyq` | Kenga | 250K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kyq.txt) |
| `kyu` | Western Kayah | 466K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kyu.txt) |
| `kyz` | Kayabí | 324K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kyz.txt) |
| `kze` | Kosena | 164K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kze.txt) |
| `kzf` | Da'a Kaili | 213K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kzf.txt) |
| `kzj` | Coastal Kadazan | 215K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/kzj.txt) |
| `la` | Latin | 48K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/la.txt) |
| `laj` | Lango | 175K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/laj.txt) |
| `las` | Lama | 235K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/las.txt) |
| `law` | Lauje | 262K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/law.txt) |
| `lb` | Luxembourgish | 5,173K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lb.txt) |
| `lcm` | Tungag | 239K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lcm.txt) |
| `lee` | Lyélé | 257K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lee.txt) |
| `lef` | Lelemi | 211K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lef.txt) |
| `lem` | Nomaande | 249K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lem.txt) |
| `leu` | Kara (Papua New Guinea) | 255K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/leu.txt) |
| `lew` | Ledo Kaili | 198K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lew.txt) |
| `lex` | Luang | 271K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lex.txt) |
| `lgg` | Lugbara | 188K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lgg.txt) |
| `lhu` | Lahu | 352K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lhu.txt) |
| `lia` | West-Central Limba | 247K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lia.txt) |
| `lid` | Nyindrou | 308K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lid.txt) |
| `lif` | Limbu | 138K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lif.txt) |
| `lip` | Sekpele | 214K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lip.txt) |
| `lis` | Lisu | 304K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lis.txt) |
| `ljp` | Lampung Api | 188K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ljp.txt) |
| `lln` | Lele | 291K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lln.txt) |
| `lme` | Pévé | 245K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lme.txt) |
| `lmk` | Lamkang | 217K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lmk.txt) |
| `lnd` | Lundayeh | 670K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lnd.txt) |
| `lo` | Lao | 4,384K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lo.txt) |
| `lob` | Lobi | 192K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lob.txt) |
| `loe` | Saluan | 220K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/loe.txt) |
| `lok` | Loko | 264K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lok.txt) |
| `lon` | Malawi Lomwe | 137K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lon.txt) |
| `lsi` | Lashi | 1,077K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lsi.txt) |
| `lsm` | Saamia | 156K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lsm.txt) |
| `lt` | Lithuanian | 39,575K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lt.txt) |
| `luc` | Aringa | 242K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/luc.txt) |
| `lus` | Lushai | 204K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lus.txt) |
| `lv` | Latvian | 1,020K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lv.txt) |
| `lwo` | Luwo | 255K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/lwo.txt) |
| `maa` | San Jerónimo Tecóatl Mazatec | 487K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/maa.txt) |
| `mad` | Madurese | 706K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mad.txt) |
| `mag` | Magahi | 193K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mag.txt) |
| `mai` | Maithili | 211K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mai.txt) |
| `maj` | Jalapa De Díaz Mazatec | 188K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/maj.txt) |
| `mak` | Makasar | 179K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mak.txt) |
| `mam` | Mam | 834K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mam.txt) |
| `maw` | Mampruli | 251K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/maw.txt) |
| `maz` | Central Mazahua | 286K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/maz.txt) |
| `mbb` | Western Bukidnon Manobo | 278K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mbb.txt) |
| `mbc` | Macushi | 221K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mbc.txt) |
| `mbh` | Mangseng | 321K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mbh.txt) |
| `mbt` | Matigsalug Manobo | 226K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mbt.txt) |
| `mca` | Maca | 208K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mca.txt) |
| `mcb` | Machiguenga | 132K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mcb.txt) |
| `mcd` | Sharanahua | 200K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mcd.txt) |
| `mco` | Coatlán Mixe | 217K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mco.txt) |
| `mcp` | Makaa | 237K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mcp.txt) |
| `mcq` | Ese | 158K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mcq.txt) |
| `mcu` | Cameroon Mambila | 260K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mcu.txt) |
| `mda` | Mada | 312K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mda.txt) |
| `mdy` | Male | 589K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mdy.txt) |
| `med` | Melpa | 283K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/med.txt) |
| `mee` | Mengen | 301K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mee.txt) |
| `mej` | Meyah | 323K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mej.txt) |
| `mek` | Mekeo | 234K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mek.txt) |
| `men` | Mende | 210K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/men.txt) |
| `meq` | Merey | 291K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/meq.txt) |
| `meu` | Motu | 175K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/meu.txt) |
| `mfe` | Morisyen | 172K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mfe.txt) |
| `mfh` | Matal | 238K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mfh.txt) |
| `mfi` | Wandala | 265K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mfi.txt) |
| `mfk` | North Mofu | 248K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mfk.txt) |
| `mfq` | Moba | 232K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mfq.txt) |
| `mfy` | Mayo | 167K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mfy.txt) |
| `mfz` | Mabaan | 237K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mfz.txt) |
| `mg` | Malagasy | 1,623K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mg.txt) |
| `mgd` | Moru | 192K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mgd.txt) |
| `mgh` | Makhuwa-Meetto | 150K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mgh.txt) |
| `mgo` | Meta' | 251K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mgo.txt) |
| `mh` | Marshallese | 750K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mh.txt) |
| `mhi` | Ma'di | 192K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mhi.txt) |
| `mhl` | Mauwake | 235K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mhl.txt) |
| `mhx` | Maru | 291K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mhx.txt) |
| `mhy` | Ma'anyan | 190K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mhy.txt) |
| `mi` | Maori | 1,504K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mi.txt) |
| `mib` | Atatláhuca Mixtec | 263K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mib.txt) |
| `mif` | Mofu-Gudur | 283K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mif.txt) |
| `mil` | Peñoles Mixtec | 365K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mil.txt) |
| `min` | Minangkabau | 242K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/min.txt) |
| `mio` | Pinotepa Nacional Mixtec | 288K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mio.txt) |
| `miq` | Mískito | 214K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/miq.txt) |
| `mit` | Southern Puebla Mixtec | 273K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mit.txt) |
| `mk` | Macedonian | 10,422K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mk.txt) |
| `mkl` | Mokole | 230K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mkl.txt) |
| `ml` | Malayalam | 118K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ml.txt) |
| `mlh` | Mape | 235K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mlh.txt) |
| `mlp` | Bargam | 297K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mlp.txt) |
| `mmo` | Mangga Buang | 269K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mmo.txt) |
| `mmx` | Madak | 271K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mmx.txt) |
| `mna` | Mbula | 257K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mna.txt) |
| `mnb` | Muna | 151K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mnb.txt) |
| `mnf` | Mundani | 241K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mnf.txt) |
| `mnw` | Mon | 1,836K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mnw.txt) |
| `moa` | Mwan | 308K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/moa.txt) |
| `mog` | Mongondow | 220K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mog.txt) |
| `mop` | Mopán Maya | 296K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mop.txt) |
| `mor` | Moro | 152K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mor.txt) |
| `mox` | Molima | 222K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mox.txt) |
| `mpg` | Marba | 210K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mpg.txt) |
| `mpm` | Yosondúa Mixtec | 336K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mpm.txt) |
| `mps` | Dadibi | 1,270K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mps.txt) |
| `mpt` | Mian | 256K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mpt.txt) |
| `mpx` | Misima-Panaeati | 227K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mpx.txt) |
| `mqb` | Mbuko | 302K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mqb.txt) |
| `mqj` | Mamasa | 164K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mqj.txt) |
| `mqn` | Moronene | 164K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mqn.txt) |
| `mr` | Marathi | 16,594K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mr.txt) |
| `mrw` | Maranao | 912K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mrw.txt) |
| `ms` | Malay | 659K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ms.txt) |
| `msm` | Agusan Manobo | 225K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/msm.txt) |
| `msy` | Aruamu | 229K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/msy.txt) |
| `mt` | Maltese | 3,331K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mt.txt) |
| `mta` | Cotabato Manobo | 262K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mta.txt) |
| `mti` | Maiwa (Papua New Guinea) | 166K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mti.txt) |
| `mtj` | Moskona | 321K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mtj.txt) |
| `mto` | Totontepec Mixe | 233K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mto.txt) |
| `mtp` | Wichí Lhamtés Nocten | 183K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mtp.txt) |
| `muh` | Mündü | 392K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/muh.txt) |
| `mur` | Murle | 210K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mur.txt) |
| `mux` | Bo-Ung | 363K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mux.txt) |
| `muy` | Muyang | 265K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/muy.txt) |
| `mva` | Manam | 231K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mva.txt) |
| `mvp` | Duri | 174K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mvp.txt) |
| `mwv` | Mentawai | 141K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mwv.txt) |
| `mxb` | Tezoatlán Mixtec | 281K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mxb.txt) |
| `mxt` | Jamiltepec Mixtec | 267K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mxt.txt) |
| `my` | Burmese | 1,007K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/my.txt) |
| `my-t-d0-zawgyi` | Burmese (Zawgyi encoding) | 593K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/my-t-d0-zawgyi.txt) |
| `myb` | Mbay | 192K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/myb.txt) |
| `myk` | Mamara Senoufo | 272K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/myk.txt) |
| `myv` | Erzya | 143K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/myv.txt) |
| `myw` | Muyuw | 150K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/myw.txt) |
| `myx` | Masaaba | 164K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/myx.txt) |
| `myy` | Macuna | 245K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/myy.txt) |
| `mza` | Santa María Zacatepec Mixtec | 316K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mza.txt) |
| `mzi` | Ixcatlán Mazatec | 190K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mzi.txt) |
| `mzk` | Nigeria Mambila | 283K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mzk.txt) |
| `mzm` | Mumuye | 265K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/mzm.txt) |
| `naf` | Nabak | 220K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/naf.txt) |
| `nak` | Nakanai | 333K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nak.txt) |
| `nan-Latn` | Min Nan Chinese (Latin) | 231K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nan-Latn.txt) |
| `nas` | Naasioi | 168K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nas.txt) |
| `nca` | Iyo | 203K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nca.txt) |
| `nch` | Central Huasteca Nahuatl | 195K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nch.txt) |
| `ncj` | Northern Puebla Nahuatl | 164K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ncj.txt) |
| `ncu` | Chumburung | 312K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ncu.txt) |
| `ndj` | Ndamba | 141K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ndj.txt) |
| `ndy` | Lutos | 216K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ndy.txt) |
| `ndz` | Ndogo | 350K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ndz.txt) |
| `neb` | Toura | 326K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/neb.txt) |
| `new` | Newari | 150K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/new.txt) |
| `nfr` | Nafaanra | 233K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nfr.txt) |
| `ngp` | Ngulu | 149K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ngp.txt) |
| `nho` | Takuu | 309K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nho.txt) |
| `nhu` | Noone | 270K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nhu.txt) |
| `nhw` | Western Huasteca Nahuatl | 194K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nhw.txt) |
| `nhy` | Northern Oaxaca Nahuatl | 185K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nhy.txt) |
| `nia` | Nias | 182K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nia.txt) |
| `nii` | Nii | 316K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nii.txt) |
| `nij` | Ngaju | 194K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nij.txt) |
| `nim` | Nilamba | 117K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nim.txt) |
| `nin` | Ninzo | 267K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nin.txt) |
| `nkf` | Inpui Naga | 197K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nkf.txt) |
| `nko` | Nkonya | 168K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nko.txt) |
| `nl` | Dutch | 58,357K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nl.txt) |
| `nlc` | Nalca | 241K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nlc.txt) |
| `nmz` | Nawdm | 209K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nmz.txt) |
| `nnb` | Nande | 127K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nnb.txt) |
| `nnq` | Ngindo | 137K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nnq.txt) |
| `nnw` | Southern Nuni | 291K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nnw.txt) |
| `noa` | Woun Meu | 275K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/noa.txt) |
| `nog` | Nogai | 104K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nog.txt) |
| `nop` | Numanggang | 183K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nop.txt) |
| `not` | Nomatsiguenga | 141K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/not.txt) |
| `nou` | Ewage-Notu | 266K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nou.txt) |
| `npl` | Southeastern Puebla Nahuatl | 148K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/npl.txt) |
| `npy` | Napu | 192K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/npy.txt) |
| `nsn` | Nehan | 248K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nsn.txt) |
| `nsu` | Sierra Negra Nahuatl | 170K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nsu.txt) |
| `ntm` | Nateni | 229K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ntm.txt) |
| `ntp` | Northern Tepehuan | 173K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ntp.txt) |
| `ntr` | Delo | 272K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ntr.txt) |
| `nuj` | Nyole | 151K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nuj.txt) |
| `nus` | Nuer | 195K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nus.txt) |
| `nvm` | Namiae | 290K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nvm.txt) |
| `nwb` | Nyabwa | 316K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nwb.txt) |
| `nwi` | Southwest Tanna | 230K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nwi.txt) |
| `ny` | Nyanja | 356K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/ny.txt) |
| `nyf` | Giryama | 169K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nyf.txt) |
| `nyn` | Nyankole | 120K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nyn.txt) |
| `nyo` | Nyoro | 120K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nyo.txt) |
| `nyy` | Nyakyusa-Ngonde | 138K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nyy.txt) |
| `nzi` | Nzima | 201K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/nzi.txt) |
| `obo` | Obo Manobo | 266K [💾](http://www.gstatic.com/i18n/corpora/wordcounts/obo.txt) |
| `o