https://github.com/alaaalzahrani/character_set_55
Character sets for 55 different languages
https://github.com/alaaalzahrani/character_set_55
character-set character-sets language letters nlp orthography
Last synced: 5 months ago
JSON representation
Character sets for 55 different languages
- Host: GitHub
- URL: https://github.com/alaaalzahrani/character_set_55
- Owner: AlaaAlzahrani
- Created: 2024-08-25T14:29:48.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-25T14:50:23.000Z (almost 2 years ago)
- Last Synced: 2025-02-05T21:43:01.831Z (over 1 year ago)
- Topics: character-set, character-sets, language, letters, nlp, orthography
- Homepage:
- Size: 10.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Character Sets for 55 Languages
This repository presents character sets for 55 different languages in a JSON file.
# Disclaimer
All character sets were collected from various sources including GitHub repositories, Python libraries, and Wikipedia. I do not claim ownership or rights to this material.
Below, each language entry includes a link to the original source where the character set was obtained.
# Included languages
| Code | Language | Source |
|------|----------|--------|
| af | Afrikaans | [Link](https://en.wikipedia.org/wiki/Afrikaans) |
| ar | Arabic | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| bg | Bulgarian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| bn | Bengali | [Link](https://en.wikipedia.org/wiki/Bengali_alphabet) |
| br | Breton | [Link](https://en.wikipedia.org/wiki/Breton_language) |
| bs | Bosnian | [Link](https://en.wikipedia.org/wiki/Bosnian_language) |
| ca | Catalan | [Link](https://en.wikipedia.org/wiki/Catalan_orthography) |
| cs | Czech | [Link](https://en.wikipedia.org/wiki/Czech_orthography) |
| da | Danish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| de | German | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| el | Greek | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| en | English | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| eo | Esperanto | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| es | Spanish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| et | Estonian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| eu | Basque | [Link](https://en.wikipedia.org/wiki/Basque_alphabet) |
| fa | Farsi (Persian) | [Link](https://en.wikipedia.org/wiki/Persian_alphabet) |
| fi | Finnish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| fr | French | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| gl | Galician | [Link](https://en.wikipedia.org/wiki/Galician_alphabet) |
| he | Hebrew | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| hi | Hindi | [Link](https://en.wikipedia.org/wiki/Devanagari) |
| hr | Croatian | [Link](https://en.wikipedia.org/wiki/Gaj%27s_Latin_alphabet) |
| hu | Hungarian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| hy | Armenian | [Link](https://en.wikipedia.org/wiki/Armenian_alphabet) |
| id | Indonesian | [Link](https://en.wikipedia.org/wiki/Indonesian_language) |
| is | Icelandic | [Link](https://en.wikipedia.org/wiki/Icelandic_orthography) |
| it | Italian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| ka | Georgian | [Link](https://github.com/am-tropin/georgian-letters/blob/main/georgian%20alphabet.ipynb) |
| kk | Kazakh | [Link](https://en.wikipedia.org/wiki/Kazakh_alphabets) |
| ko | Korean | [Link](https://github.com/rjtngit/rawchars/blob/master/Korean-Hangul.txt) |
| lt | Lithuanian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| lv | Latvian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| mk | Macedonian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| ml | Malayalam | [Link](https://en.wikipedia.org/wiki/Malayalam_script) |
| ms | Malay | [Link](https://en.wikipedia.org/wiki/Malay_orthography) |
| nl | Dutch | [Link](https://en.wikipedia.org/wiki/Dutch_orthography) |
| no | Norwegian | [Link](https://en.wikipedia.org/wiki/Norwegian_orthography) |
| pl | Polish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| pt | Portuguese | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| ro | Romanian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| ru | Russian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| si | Sinhala | [Link](https://en.wikipedia.org/wiki/Sinhala_script) |
| sk | Slovak | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| sl | Slovenian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| sq | Albanian | [Link](https://en.wikipedia.org/wiki/Albanian_alphabet) |
| sr | Serbian | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| sv | Swedish | [Link](https://en.wikipedia.org/wiki/Swedish_alphabet) |
| ta | Tamil | [Link](https://en.wikipedia.org/wiki/Tamil_script) |
| te | Telugu | [Link](https://en.wikipedia.org/wiki/Telugu_script) |
| tl | Tagalog (Filipino) | [Link](https://en.wikipedia.org/wiki/Filipino_alphabet) |
| tr | Turkish | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
| uk | Ukrainian | [Link](https://en.wikipedia.org/wiki/Ukrainian_alphabet) |
| ur | Urdu | [Link](https://github.com/urduhack/urdu-characters/blob/master/urdu_characters.py) |
| vi | Vietnamese | [Link](https://github.com/chardet/chardet/blob/main/chardet/metadata/languages.py) |
# Similar projects
- [Chardet: The Universal Character Encoding Detector](https://github.com/chardet/chardet)
- [rawchars](https://github.com/rjtngit/rawchars/tree/master)