Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-hungarian-nlp
A curated list of NLP resources for Hungarian
https://github.com/oroszgy/awesome-hungarian-nlp
Last synced: 1 day ago
JSON representation
-
Tools
-
Word tokenization, sentence splitting
-
Morphology
- emMorph (Humor)
- hunmorph - checking, stemming and morphological analysing of agglutinative, german and other languages.
- hunspell - source spell-checker, stemmer and morphological analyzer
- Lemmagen
- emMorph (Humor)
- emMorphPy
- hunmorph-foma
- lara-hungarian-nlp
- Simplemma
-
PoS / Morphological taggers
- hunpos - of-speech tagger by Thorsten Brants.
- PurePos
- purepos.py
-
Taggers / Chunkers
- SzegedNER
- HunTag
- HunTag3
- DBpedia Spotlight - hungarian/)
- emBERT - trained Transfomer-based models. It provides tagging models based on Huggingface's transformers package.
-
Pipelines with Hungarian NLP components
- magyarlanc
- UDPipe - U files
- polyglot
- Google Syntaxnet
-
Syntactic parsers
-
Semantic analysis
- mT5-small-HunSum-1 - base-HunSum-1](https://huggingface.co/SZTAKI-HLT/mT5-base-HunSum-1), [Bert2Bert-HunSum-1](https://huggingface.co/SZTAKI-HLT/Bert2Bert-HunSum-1),
-
-
Datasets
-
Corpora
- HumSum-1
- Hungarian Webcorpus
- Hungarian Webcorpus 2.0
- emLam
- web2corpus
- OpinHuBank - annotated corpus to aid the research of opinion mining and sentiment analysis in Hungarian
- The Hungarian forum corpus for Opinion Mining
- Hungarian sentiment corpus (HuSent)
- Szeged Treebank
- Szeged Dependency Treebank - tree format version of the Szeged Treebank.
- Hungarian Named Entity Corpora
- hunNERwiki
- Hungarian word sense disambiguated corpus
- HunLearner
- HuRC
- Hunglish Corpus - aligned Hungarian-English parallel corpus of about 120 million words in 4 million sentence pairs.
- SzegedParallel - Hungarian parallel corpus contains texts selected on the basis of grammatical and translational criteria.
- HunOr - Russian Parallel corpus comprises approximately 800 thousand words.
- TED talks transcripts parallel corpus
- TaPaCo Corpus
- Duolingo STAPLE
- PPDB
- OpenSubtitles Corpus
- CoNLL 2017 Shared Task Hungarian data
- HunLearner
-
Linguistic resources
- morphdb.hu - founded theoretical decisions.
- Hungarian Sentiment Lexicon - Affect lexicons.
- Named Entity lists for Hungarian
- Mazsola ISZ
-
Linked Open Data
-
Geo data
- OpenStreetMap(OSM)
- Hungary
- Who's On First - data/whosonfirst-data-admin-hu) )
-
Speech related data
-
-
Language models
-
Word embeddings
- FasText Wikipedia - trained word vectors for 90 languages, trained on Wikipedia using fastText.
- FasText Common Crawl & Wikipedia - trained word vectors for 157 languages, trained on Wikipedia and the Common Crawl using fastText's CBOW model.
- polyglot vectors
- hunembed0.0 - off of 10 words.
- Szeged word vectors
- questions-words-hu
- Multi-sense word embeddings
- BytePair Embeddings
- HuSpaCy 300d
- HuSpaCy 100d
-
Transformer models
- `huBERT`
- HIL* Transformer models
- PULI-BERT-Large
- PULI-GPT-2 - 2 model
- PULI-GPT-3SX - NeoX model (6.7 billion parameter)
-
-
Academy
-
Journals
-
Conferences
-
Institutes
- Department of Language Technology and Applied Linguistics, RIL-MTA
- Human Language Technology Research Group of the Budapest University of Technology and Economics
- Natural Language Processing Group of the SzegedUniversity
- BME - Laboratory of Speech Acoustics
- Natural Language Processing Group of the Pázmány Péter Catholic University Faculty of Information Tehnology and Bionics
-
-
Learning resources
-
Books
-
Courses
-
Tutorials
-
-
Communities
-
Tutorials
- Kereső világ
- Hungarian NLP Meetup
- Deep Learning Reading Seminar Meetup
- HuNLP Slack
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
- Kereső világ
-
-
Other Hungarian related resource collections
Categories
Sub Categories
Tutorials
56
Corpora
25
Word embeddings
10
Morphology
9
Institutes
5
Taggers / Chunkers
5
Transformer models
5
Pipelines with Hungarian NLP components
4
Linguistic resources
4
Linked Open Data
4
Word tokenization, sentence splitting
3
PoS / Morphological taggers
3
Geo data
3
Books
3
Speech related data
2
Courses
2
Semantic analysis
1
Conferences
1
Syntactic parsers
1
Journals
1
Keywords
nlp
4
hungarian
2
lemmatizer
2
morphological-analysis
2
chatbot
1
hungarian-language
1
python3
1
stemmer
1
corpus-tools
1
language-detection
1
language-identification
1
lemmatiser
1
lemmatization
1
low-resource-nlp
1
tokenization
1
tokenizer
1
wordlist
1
parser
1
pos-tagger
1
tagger
1
content-tagging
1
dbpedia-spotlight
1
entity-extraction
1
entity-linking
1
rdfa-annotation
1
semantic-web
1
text-annotation
1