An open API service indexing awesome lists of open source software.

https://github.com/alaaalzahrani/character_set_55

Character sets for 55 different languages
https://github.com/alaaalzahrani/character_set_55

character-set character-sets language letters nlp orthography

Last synced: 5 months ago
JSON representation

Character sets for 55 different languages

Awesome Lists containing this project

README

          

# Character Sets for 55 Languages

This repository presents character sets for 55 different languages in a JSON file.

# Disclaimer

All character sets were collected from various sources including GitHub repositories, Python libraries, and Wikipedia. I do not claim ownership or rights to this material.

Below, each language entry includes a link to the original source where the character set was obtained.

# Included languages

| Code | Language | Source |
|------|----------|--------|
| af | Afrikaans | [Link](https://en.wikipedia.org/wiki/Afrikaans) |
| ar | Arabic | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| bg | Bulgarian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| bn | Bengali | [Link](https://en.wikipedia.org/wiki/Bengali_alphabet) |
| br | Breton | [Link](https://en.wikipedia.org/wiki/Breton_language) |
| bs | Bosnian | [Link](https://en.wikipedia.org/wiki/Bosnian_language) |
| ca | Catalan | [Link](https://en.wikipedia.org/wiki/Catalan_orthography) |
| cs | Czech | [Link](https://en.wikipedia.org/wiki/Czech_orthography) |
| da | Danish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| de | German | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| el | Greek | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| en | English | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| eo | Esperanto | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| es | Spanish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| et | Estonian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| eu | Basque | [Link](https://en.wikipedia.org/wiki/Basque_alphabet) |
| fa | Farsi (Persian) | [Link](https://en.wikipedia.org/wiki/Persian_alphabet) |
| fi | Finnish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| fr | French | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| gl | Galician | [Link](https://en.wikipedia.org/wiki/Galician_alphabet) |
| he | Hebrew | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| hi | Hindi | [Link](https://en.wikipedia.org/wiki/Devanagari) |
| hr | Croatian | [Link](https://en.wikipedia.org/wiki/Gaj%27s_Latin_alphabet) |
| hu | Hungarian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| hy | Armenian | [Link](https://en.wikipedia.org/wiki/Armenian_alphabet) |
| id | Indonesian | [Link](https://en.wikipedia.org/wiki/Indonesian_language) |
| is | Icelandic | [Link](https://en.wikipedia.org/wiki/Icelandic_orthography) |
| it | Italian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| ka | Georgian | [Link](https://github.com/am-tropin/georgian-letters/blob/main/georgian%20alphabet.ipynb) |
| kk | Kazakh | [Link](https://en.wikipedia.org/wiki/Kazakh_alphabets) |
| ko | Korean | [Link](https://github.com/rjtngit/rawchars/blob/master/Korean-Hangul.txt) |
| lt | Lithuanian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| lv | Latvian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| mk | Macedonian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| ml | Malayalam | [Link](https://en.wikipedia.org/wiki/Malayalam_script) |
| ms | Malay | [Link](https://en.wikipedia.org/wiki/Malay_orthography) |
| nl | Dutch | [Link](https://en.wikipedia.org/wiki/Dutch_orthography) |
| no | Norwegian | [Link](https://en.wikipedia.org/wiki/Norwegian_orthography) |
| pl | Polish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| pt | Portuguese | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| ro | Romanian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| ru | Russian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| si | Sinhala | [Link](https://en.wikipedia.org/wiki/Sinhala_script) |
| sk | Slovak | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| sl | Slovenian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| sq | Albanian | [Link](https://en.wikipedia.org/wiki/Albanian_alphabet) |
| sr | Serbian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| sv | Swedish | [Link](https://en.wikipedia.org/wiki/Swedish_alphabet) |
| ta | Tamil | [Link](https://en.wikipedia.org/wiki/Tamil_script) |
| te | Telugu | [Link](https://en.wikipedia.org/wiki/Telugu_script) |
| tl | Tagalog (Filipino) | [Link](https://en.wikipedia.org/wiki/Filipino_alphabet) |
| tr | Turkish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| uk | Ukrainian | [Link](https://en.wikipedia.org/wiki/Ukrainian_alphabet) |
| ur | Urdu | [Link](https://github.com/urduhack/urdu-characters/blob/master/urdu_characters.py) |
| vi | Vietnamese | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |

# Similar projects

- [Chardet: The Universal Character Encoding Detector](https://github.com/chardet/chardet)
- [rawchars](https://github.com/rjtngit/rawchars/tree/master)