Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-kyrgyz-nlp
Kyrgyz language processing software, models and datasets.
https://github.com/alexeyev/awesome-kyrgyz-nlp
Last synced: 5 days ago
JSON representation
-
Datasets
- Manas-UdS - text metadata.
- kkWaC
- Kyrgyz in Leipzig Corpora Collecion
- TilCorpusu
- Kyrgyz language hand-written letters (Kyrgyz MNIST) - written Kyrgyz alphabet letters collection for machine learning applications; original images (a total of 80213) have been transformed to 50x50 images, then to CSV format
- kloop corpus
- UD project comments
- KTMU's UD Treebank, 781 sentences
- Small UD Treebank: 145 sentences (incl. 20 Cairo sentences), and ~ 100 sentences suggested by UD Turkic Group; a part of UD Turkic Treebank
- Verbal paradigms for Kyrgyz (100 Kyrgyz verbs fully conjugated in all tenses)
- WikiANN
- KyrgyzNER
- Kyrgyz Multi-Label News Classification
- Kyrgyz Word Embedding Evaluation
- Machine-Translated Alpaca - lab/stanford_alpaca) instructions translated into Kyrgyz using ChatGPT and Google Translate
- Country names table - Russian-English
- KyrSpell
- Tatu Ylonen's enwiktionary-based dictionary - Ky Anki deck](https://ankiweb.net/shared/info/518863963) for language learners)
-
Pretrained models
- Polyglot morfessor
- fastText - dimensional fastText vectors provided by the authors: [bin](https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.ky.300.bin.gz), [txt](https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.ky.300.vec.gz).
- compressed fastText - ky-mini](https://zenodo.org/record/4905385/files/fasttext-ky-mini?download=1) prepared by Liebl Bernhard in 2021.
- BERT-based NER - base-multilingual-cased` fine-tuned on Wikiann for NER on Kyrgyz. The author warns that this model is not usable and is built just as a proof of concept. Will be updated later.
- Manas-GPT
-
Methods/Software
- spaCy
- Kyrgyz for Apertium - tagging; installation script: [install_apertium_kir.sh](/install_apertium_kir.sh). A [much, much easier way](https://github.com/apertium/apertium-python/): `import apertium; apertium.installer.install_module("kir")`.
- DEPRECATED
-
Hate Speech detection
- Jupyter Notebook for hate speech detection
- Tilchi - Kyrgyz dictionary, open source desktop application
- ӨҮҢизатор - of-concept letter replacement Telegram bot demo code, fixes incorrect usages of 'О','У', 'Н' => 'Ө', 'Ү','Ң'
- Number-to-words conversion
- Number-to-words conversion
- Telegram bot for Kyrgyz morphological analysis - kir](https://github.com/sasha-kir) based on [Apertium data for Kyrgyz](https://github.com/apertium/apertium-kir/)
-
Online Demos
- Cyrillic-to-Latin online converter - budet-kyrgyzskaya-latinitsa).
-
Miscellaneous
Programming Languages
Categories
Sub Categories
Keywords
kyrgyz
8
nlp
2
word-embeddings
2
fasttext-embeddings
2
turkic-languages
1
distributional-semantics
1
topic-classification
1
word-embedding-evaluation
1
news-classification
1
word2vec
1
alpaca
1
multi-label-classification
1
fasttext
1
python
1
apertium-languages
1
agglutinative
1
dcg
1
morphology
1
telegram-bots
1
morphological-analysis
1
linguistics
1
words-to-numbers
1
latin
1
kazakh
1
converter
1
words
1
to
1
number
1
kg
1
into
1
convert
1
russian
1
dictionary
1
swi-prolog
1
prolog
1
parser
1