Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-italian
A list of awesome NLP resources for Italian language.
https://github.com/AlessandroGianfelici/awesome-italian
Last synced: 2 days ago
JSON representation
-
Corpora
-
Sentiment Analysis
- Italian review dataset - Trustpilot-crawled dataset with 146,910 reviews.
- Happy Parents - Annotated datasets of parent to parent and parents to children dialogues.
- Italian Sentiment Analysis - Smartphone review dataset.
- Sentipolc2016 - Dataset for the Evalita Sentipolc competition, ed.2016.
- Absita2018 - Booking-crawled dataset for the Evalita Absita competition, ed.2018.
- Distributional Polarity Lexicon - Annotated dataset of sentiment polarity for short (i.e. few words) expressions.
- SentiML - a collection of documents annotatated to identify sentiment at the sentence level.
- Sentic - multi-lingual sentiment analysis dataset.
- TWITA - dataset of Italian tweets.
-
Hate speech recognition
- HaSpeeDe - Dataset for the Evalita Hate Speech Detection competition, ed.2018 and 2020.
- IHSC - Twitter corpus built with the aim of representing and analyzing hate speech against some minority groups in Italy.
- WhatsApp Dataset - WhatsApp dataset to study cyberbullying among Italian students aged 12-13 in the context of the CREEP EIT project
-
Irony detection
- Irony and Tweets - labeled dataset of ironic tweets in several languages.
- IronITA 2018 - dataset for the IronITA (Irony Detection in Italian Tweets) competition, organised within Evalita 2018.
-
Word collections
- paroleitaliane - Lists of italian words about different topics and from several sources.
-
Part of speech tagging
- PoS-Tagging Evalita 2009 - Annotated PoS tagging dataset for the Evalita 2009 competition.
-
Named Entity Recognition
- I-CAB - Corpora of annotated articles from "L'Adige" for NER tasks.
- PAISA - Corpora of annotated articles scraped from the web.
- itWaC - a 2 billion word corpus constructed from the Web limiting the crawl to the .it domain and using medium-frequency words from the Repubblica corpus and basic Italian vocabulary lists as seeds.
-
Linguistic Complexity
- Italian Complexity Dataset - 1,123 Italian sentences rated by humans with a judgment of complexity.
-
Parallel corpora
-
Spoken language corpora
- kiparla - The largest corpus of spoken Italian available so far (for research purpose only).
-
-
Models
-
Sentiment Analysis
- SentITA - a Bidirectional LSTM-CNN that operates at word level for sentiment polarty classification.
- Feel-IT - a BERT-based sentiment and emotion classifier for Italian.
- SentITA - a Bidirectional LSTM-CNN that operates at word level for sentiment polarty classification.
- Feel-IT - a BERT-based sentiment and emotion classifier for Italian.
-
Language Models
-
Text summarization
- multilang-summarizer - A multilingual text summarization model partially supported by the National Council of Science and Technology (CONACYT) of Mexico.
-
-
Useful libraries
-
Only Italian
- italian-dictionary - a Python library to retrieve the meaning of italian lemmas
-
Multilingual (supporting also Italian)
-
Programming Languages
Categories
Sub Categories
Sentiment Analysis
13
Hate speech recognition
3
Named Entity Recognition
3
Language Models
2
Parallel corpora
2
Multilingual (supporting also Italian)
2
Irony detection
2
Word collections
1
Text summarization
1
Linguistic Complexity
1
Spoken language corpora
1
Only Italian
1
Part of speech tagging
1
Keywords
italian
2
sentiment-analysis
2
italy
2
aspect-based-sentiment-analysis
1
aspect-extraction
1
computational-linguistics
1
machine-learning
1
natural-language-processing
1
hatespeech
1
twitter-corpus
1
bruteforce
1
dictionaries
1
dictionary-attack
1
wordlist
1
emotion
1
emotion-detection
1
emotion-recognition
1
sentiment-classification
1