Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
turkish-nlp-resources
🔡 List of Tools, Libraries, Models, Datasets and other resources for Turkish NLP.
https://github.com/agmmnn/turkish-nlp-resources
Last synced: 5 days ago
JSON representation
-
Tools/Libraries
- ITU Turkish NLP
- VNLP
- TDD - Tools
- Zemberek-NLP - NLP provides Natural Language Processing tools for Turkish.
- Zemberek-Python
- Zemberek-Server
- Mukayese - checking to NLU tasks.
- SadedeGel - based news summarization using several old and new NLP techniques.
- sinKAF
- TrTokenizer
- Morphological Analysis - Py), [Dependency Parser](https://github.com/StarlangSoftware/TurkishDependencyParser-Py), [Deasciifier](https://github.com/StarlangSoftware/TurkishDeasciifier-Py), [NER](https://github.com/StarlangSoftware/TurkishNamedEntityRecognition-Py).
- snnclsr/NER
-
Models
-
Word Embeddings
-
- BERTurk - it/turkish-bert)
- ELMO For ManyLangs - trained ELMo Representations for Many Languages.
- Fasttext - Word Vector - trained word vectors for 157 languages, trained on Common Crawl and Wikipedia using fastText.
- Loodos/Turkish Language Models
- Hugging Face - Models/Turkish
-
-
Datasets
-
Word Embeddings
-
Multilingual Datasets:
-
Treebank:
- Universal Dependencies - linguistically consistent treebank annotation of morphology and syntax for multiple languages. [![][repo]](https://github.com/UniversalDependencies)
- UD Turkish Kenet - Kenet UD Treebank consists of 18,700 manually annotated sentences and 178,700 tokens. Its corpus consists of dictionary examples from TDK. [![][repo]](https://github.com/StarlangSoftware/TurkishWordNet)
- UD Turkish BOUN - tabi/UD_Turkish-BOUN)
-
Other Data:
- Turkish Song Lyrics (Türkçe Şarkı Sözleri)
- Turkish Folk Song Lyrics (Türkçe Türkü Sözleri)
- Turkish Poems (Türkçe Şiirler)
- Turkish Idioms and Proverbs (Türkçe Atasözleri ve Deyimler)
- hermitdave/Frequency Word List_
- Fırat University - Veri Setleri
- Bilkent Turkish Writings Dataset
- 170k Turkish Sentences from Wikipedia
- Wiktionary:Frequency Lists - Turkish
- ooguz/Bad Word Blacklist for Turkish
- ahmetax/Turkish Stop Words List
- NLTK - Stop Words
- Tatoeba: Multilingual Sentences.
- 466k English Words.
-
Other Sources:
-
-
Other Resources
-
Books:
-
Videos:
- BOUN - Yapay Öğrenmeye Giriş - İsmail Arı Yaz Okulu 2018
- BOUN - Doğal Dil İşleme - İsmail Arı Yaz Okulu 2018
- BOUN - Konuşma / İşleme - İsmail Arı Yaz Okulu 2018
- BOUN - Yapay Öğrenme Yaz Okulu 2020
- Açık Seminer - NLP 101 Doğal Dil İşlemeye Giriş ve Uygulamalı Metin Madenciliği
- Starlang Yazılım Channel
- NLP with Duygu
-
Articles:
- Türkçe ve Doğal Dil İşleme
- Türkçe Tweetler Üzerinde Otomatik Soru Tespiti
- Classification of News according to Age Groups Using NLP
- Açık Kaynak Doğal Dil İşleme Kütüphaneleri
- Neden yasaklandı? Depremle ilgili Ekşi Sözlük yorumlarına NLP gözüyle bakış
- A collection of brand new datasets for Turkish NLP
-
Sample Notebooks/Snippets:
-
Blog Posts:
-
Programming Languages
Categories
Sub Categories
Keywords
nlp
10
turkish
7
turkish-language
5
turkish-nlp
4
sentence-tokenizer
3
deep-learning
2
morphology
2
morphological-analysis
2
natural-language-processing
2
ner
2
spelling-correction
2
sentiment-analysis
2
spark
1
rest
1
part-of-speech-tagger
1
javascript
1
docker
1
zemberek
1
language-model
1
machine-translation
1
segmentation
1
summarization
1
text-classification
1
acikhack2
1
zemberek-nlp
1
language
1
word2vec
1
word-embeddings
1
stopword-removal
1
stemming
1
sentence-splitting
1
part-of-speech-tagging
1
number-to-words
1
normalization
1
named-entity-recognition
1
morphological-disambiguation
1
fasttext
1
dependency-parsing
1
deasciifier
1
word-segmentation
1
word-tokenizing
1
finite-state-machine
1
morphological-analyser
1
elmo
1
multilingual
1
language-models
1
bilkent-university
1
creative-writing
1
dataset
1
nlp-datasets
1