Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-hungarian-nlp
A curated list of NLP resources for Hungarian
https://github.com/oroszgy/awesome-hungarian-nlp
- huntoken
- quntoken
- emMorph (Humor)
- emMorphPy
- hunmorph - checking, stemming and morphological analysing of agglutinative, german and other languages.
- hunmorph-foma
- hunspell - source spell-checker, stemmer and morphological analyzer
- lara-hungarian-nlp
- Lemmagen
- Simplemma
- hunpos - of-speech tagger by Thorsten Brants.
- PurePos
- purepos.py
- HunTag
- HunTag3
- SzegedNER
- DBpedia Spotlight - hungarian/)
- emBERT - trained Transfomer-based models. It provides tagging models based on Huggingface's transformers package.
- magyarlanc
- magyarlanc_spark
- eszterland
- HuSpaCy - strength Hungarian Natural Language Processing
- huNLP
- hunlp-GATE
- Trendminer Hungarian Processing Pipeline
- Google Syntaxnet
- UDPipe - U files
- polyglot
- emtsv - module communication via tsv + REST API
- Stanza
- spaCy StanfordNLP
- trankit - Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
- hunpars
- HunParse - based parser using KR-style morphological annotation
- Anagramma Parser
- benepar - accuracy parser with models for 11 languages, implemented in Python. Based on Constituency Parsing with a Self-Attentive Encoder from ACL 2018.
- SentimentAnalysisHUN - source sentiment analysis tool for Hungarian language, written in Python.
- hun-date-parser
- mT5-small-HunSum-1 - base-HunSum-1](https://huggingface.co/SZTAKI-HLT/mT5-base-HunSum-1), [Bert2Bert-HunSum-1](https://huggingface.co/SZTAKI-HLT/Bert2Bert-HunSum-1),
- emLam
- pywnxml
- Hun-appointment-chatbot
- neural-punctuator
- hunaccent
- Diacritics_restoration
- NYTK MT
- syntax-augmentation-nmt - based data augmentation for Hungarian-English machine translation
- anonymizer_hu
- FasText Wikipedia - trained word vectors for 90 languages, trained on Wikipedia using fastText.
- FasText Common Crawl & Wikipedia - trained word vectors for 157 languages, trained on Wikipedia and the Common Crawl using fastText's CBOW model.
- FastText_multilingual
- polyglot vectors
- wordvectors - trained word2vec and fasttext word vectors on wikipedia of 30+ languages
- hunembed0.0 - off of 10 words.
- Szeged word vectors
- questions-words-hu
- Conceptnet Numberbatch - and cross-lingual semantic word embeddings
- Multi-sense word embeddings
- BytePair Embeddings
- HuSpaCy 300d
- HuSpaCy 100d
- ELMo Representations
- `huBERT`
- HIL* Transformer models
- PULI-BERT-Large
- PULI-GPT-2 - 2 model
- PULI-GPT-3SX - NeoX model (6.7 billion parameter)
- Hungarian Webcorpus
- Hungarian Webcorpus 2.0
- OSCAR
- emLam
- Leipzig corpora
- web2corpus
- CC-100
- CoNLL 2017: Automatically Annotated Raw Texts and Word Embeddings
- OpinHuBank - annotated corpus to aid the research of opinion mining and sentiment analysis in Hungarian
- HunEmPoli - agenda speeches of the Hungarian National Assembly (2014-2018) and consists 764008 tokens/36475 sentences. Aspect level emotion annotation, with 39840 identified emotions, in addition, marked the keywords that evoked the emotion.
- The Hungarian forum corpus for Opinion Mining
- Hungarian sentiment corpus (HuSent)
- Szeged Treebank
- Szeged Dependency Treebank - tree format version of the Szeged Treebank.
- Universal Dependencies
- Hungarian Named Entity Corpora
- KorKor Pilotcorpus
- NerKor
- NerKor 1.41e - token Hungarian named entity dataset with ~30 entity types derived from NYTK-NerKor
- hunNERwiki
- Mazsola database
- PrevCons
- Hungarian word sense disambiguated corpus
- HunLearner
- HuLU
- HuCOLA
- HuCoPA
- HuSST
- HuWNLI
- HuWS
- HuRC
- ELTE Poetry Corpus
- ELTE Novel Corpus
- ELTE Drama Corpus
- HumSum-1
- HAPP
- Hunglish Corpus - aligned Hungarian-English parallel corpus of about 120 million words in 4 million sentence pairs.
- SzegedParallel - Hungarian parallel corpus contains texts selected on the basis of grammatical and translational criteria.
- HunOr - Russian Parallel corpus comprises approximately 800 thousand words.
- CoNLL 2017 Shared Task Hungarian data
- CSS10
- Hungarian-Russian Prisoner of War Database
- TED talks transcripts parallel corpus
- TaPaCo Corpus
- Duolingo STAPLE
- PPDB
- OpenSubtitles Corpus
- MASSIVE dataset
- PWS
- morphdb.hu - founded theoretical decisions.
- huwn
- Hungarian Sentiment Lexicon - Affect lexicons.
- poltextLAB's sentiment lexicons
- 4lang
- Named Entity lists for Hungarian
- Mazsola ISZ
- Manocska
- PrevLex
- panmorph
- hun_ner_checklist
- Wikipedia dumps
- Wikidata dumps
- DBPedia dumps
- huwn.rdf
- Conceptnet
- OpenStreetMap(OSM)
- Hungary
- Natural-earth-vector - earth-vector/blob/master/packages/Natural_Earth_quick_start/LOCALIZATION.md) imported from wikidata labels)
- Who's On First - data/whosonfirst-data-admin-hu) )
- Hungarian Single Speaker Speech Dataset
- Mozilla Common Voice
- Acta Cybernetica
- MSZNY
- Natural Language Processing Group of the Pázmány Péter Catholic University Faculty of Information Tehnology and Bionics
- Department of Language Technology and Applied Linguistics, RIL-MTA
- Human Language Technology Research Group of the Budapest University of Technology and Economics
- Natural Language Processing Group of the SzegedUniversity
- BME - Laboratory of Speech Acoustics
- Szövegbányászat
- Szövegbányászat és mesterséges intelligencia R-ben
- Kvantitatív szövegelemzés és szövegbányászat a politikatudományban
- NLP Courses by the University Of Szeged
- NLP Courses by the HLT Group of the Budapest University of Technology
- Mini NLP Course by the Center Of Digital Humanities
- Tutorial on Text Mining for Hungarian
- Kereső világ
- Hungarian NLP Meetup
- Deep Learning Reading Seminar Meetup
- HuNLP Slack
- EENLP
- European Language Grid
- Hugging Face Datasets (filtered for Hungarian)
Programming Languages
Keywords
nlp
17
natural-language-processing
8
machine-learning
7
hungarian
6
python
5
hungarian-language
4
spacy
3
lemmatization
3
morphological-analysis
3
universal-dependencies
3
pos-tagger
3
dependency-parsing
3
text-mining
3
named-entity-recognition
3
spacy-models
2
spacy-pipeline
2
artificial-intelligence
2
corenlp
2
pytorch
2
multilingual
2
ner
2
information-extraction
2
parser
2
tokenizer
2
tokenization
2
lemmatizer
2
chatbot
2
dataset
2
tagger
1
content-tagging
1
dbpedia-spotlight
1
entity-extraction
1
entity-linking
1
rdfa-annotation
1
semantic-web
1
text-annotation
1
gis
1
hunlp
1
huspacy
1
evaluation-framework
1
speech-to-text
1
speech
1
elmo
1
word2vec
1
vector
1
language
1
workshop
1
tutorial
1
textacy
1
text-mining-workshop
1