Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-urdu
📖 A curated list of resources dedicated to Urdu language.
https://github.com/urduhack/awesome-urdu
Last synced: 2 days ago
JSON representation
-
Urdu Datasets
-
General NLP Datasets
- Web news Data - Urdu Web news Data
- Urdu Paraphrase Plagiarism Corpus, 2016
- COrpus of Urdu News TExt Reuse (CoUNTeR), 2016
- Urdu Short Text Reuse Corpus (USTRC), 2018
- TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages
- Roman Urdu Dataset - Data for sentiment analysis, along with misc compiled data for Roman Urdu
- Collection of Urdu Datasets - Datasets for POS, NER and NLP tasks
- Urdu Universal Dependency Treebank
- UrduSummary Corpus Benchmark, 2016
- Rekhta Ghazals
-
Urdu Text Classification
-
Urdu Named-Entity Recognition
-
Urdu Monolingual Corpora
- UFAL Corpus, 2014 - 5.4M sentences (with POS tags)
- OSCAr Corpus, 2020
- CC-100 Corpus, 2019 - CC crawls from Jan-Dec 2018
- WMT Raw 2017 - CC crawls from 2012-2016
- iNLTK Wiki Articles, 2020 - NLP/Tatoeba-Challenge/blob/master/data/Backtranslations.md), [2016 UrduWikiCorpus](http://urdu-corpus.blogspot.com/p/published-packages.html)
- Leipzig Corpora
- UrduWaC-2010 and urTenTen-2018, SketchEngine
- A Gold Standard Urdu Raw Text Corpus, LDCIL
-
Urdu Sentiment Datasets
- Urdu IMDb Movie Reviews - IMDB Movie Reviews data in Urdu
- 2010 Disaster Response Messages
- Urdu Sentiment Lexicon
- Sentiment Polarity Lexicons, 2017
- UCI Roman-Urdu Sentiment Classification, 2018 - 20k records
- Did You Offend Me? Classification of Offensive Tweets, 2018 - 3k tweets
-
Urdu OCR Datasets
- U-HAT - Urdu Hand-Written Text Dataset
- IIIT-Hyderabad: Unconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks, 2017
- CLE Pakistan Urdu Image Corpora
- Cursive-Text: A Benchmark for Urdu Text Recognition in Natural Scene Images, 2020 - 2500 images, email for dataset
-
Urdu Parallel Corpora for Machine Translation
- OPUS Corpora - >ur)
- CC-Aligned - 1310/), [OpenSubtitles](https://www.aclweb.org/anthology/L16-1147/), [TED](https://www.ted.com/participate/translate), [QED](https://www.aclweb.org/anthology/L14-1675/), etc.
- IIIT-Hyderabad MT Bhasha
- PM India Parallel Corpus
- English-Urdu Religious Parallel Corpus
- Urdu-Nepali-English Parallel Corpus
- Cross-Language English-Urdu (CLEU) Corpus, 2018
- Flickr 8k Benchmark - 2.7k sentences
- Universal Declaration of Human Rights (benchmark)
- EMILLE/CIIL Corpus - Contains monolingual data as well
- National Platform for Language Technology
- Technology Development for Indian Languages
- National Platform for Language Technology
-
Urdu Transliteration Datasets
-
Urdu Lexical Resources
- CLE Urdu WordNet
- Verb List
- MTurks-10k Multilingual Dictionary, 2014
- Microsoft IT Terminology
- Urdu N-grams, 2020 - Uni-Gram, Bi-Gram, Tri-Gram and Tetra-Gram
- CLE Urdu Books N-Grams
-
Urdu Speech Datasets
- Urdu 250 Isolated Words, 2018
- CLE Phonetically Rich Urdu Speech Corpus
- CMU Wilderness Speech Dataset, 2019
- FCBH Recordings
- LibriVox AudioBooks
- CLE Pakistan Urdu Speech Corpus
- LDC UPenn Datasets - Filter search by selecting language
- Urdu Raw Speech Corpus, LDCIL
- LDCIL ASR Corpus
- Urdu-Sindhi Speech Emotion Corpus, 2020 - Introducing_the_Urdu_Sindhi_Speech_Emotion_Corpus.pdf))
- Speech Emotion Recognition Benchmark, 2018
-
-
Urdu NLP Tools, Libraries and Models
-
Cross-lingual Datasets
-
Language Models
-
Word Embeddings
- UrduHack Word-Vectors, 2019 - Word2Vec and FastText models
- Wiki-2016 - 2017](https://fasttext.cc/docs/en/crawl-vectors.html), [Multilingual Aligned, 2017](https://github.com/babylonhealth/fastText_multilingual)
- BPEmb: Subword Embeddings, 2017 - its.org/bpemb/multi/))
- Polyglot Embeddings, 2013
-
Translation Models
-
Transliteration Libraries
- PolyGlot
- AksharaMukhi - Devanagari (Hindi) to Urdu script converter
- Google Transliterate API - Roman Urdu to Perso-Arabic
-
-
Online Resources/Services
-
Transliteration Libraries
-
Urdu News websites
- JANG Group
- BBC Urdu
- Voice of America Urdu
- Nawa-i-Waqt Group
- Urdu Point Network
- More news websites
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- JANG Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
- Nawa-i-Waqt Group
-
Dictionaries
- ur.oxforddictionaries.com - Oxford Dictionary
- English Urdu Dictionary - English Urdu Dictionary
- Urdu English Dictionary 2 - Urdu English Dictionary 2
-
Programming Languages
Sub Categories
Urdu News websites
50
Urdu Parallel Corpora for Machine Translation
13
Urdu Speech Datasets
11
General NLP Datasets
10
Urdu Monolingual Corpora
8
Urdu Lexical Resources
6
Urdu Sentiment Datasets
6
Transliteration Libraries
4
Urdu OCR Datasets
4
Word Embeddings
4
Language Models
4
Cross-lingual Datasets
3
Dictionaries
3
Urdu Named-Entity Recognition
2
Urdu Text Classification
1
Urdu Transliteration Datasets
1
Translation Models
1