Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bureaucratic-labs/dostoevsky
Sentiment analysis library for russian language
https://github.com/bureaucratic-labs/dostoevsky
natural-language-processing russian-specific sentiment-analysis
Last synced: 13 days ago
JSON representation
Sentiment analysis library for russian language
- Host: GitHub
- URL: https://github.com/bureaucratic-labs/dostoevsky
- Owner: bureaucratic-labs
- License: mit
- Archived: true
- Created: 2018-05-09T14:09:31.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-10-30T12:28:09.000Z (about 1 year ago)
- Last Synced: 2024-08-31T19:19:49.235Z (2 months ago)
- Topics: natural-language-processing, russian-specific, sentiment-analysis
- Language: Python
- Homepage:
- Size: 193 KB
- Stars: 312
- Watchers: 11
- Forks: 32
- Open Issues: 19
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG
- License: LICENSE
Awesome Lists containing this project
README
# Dostoevsky ![Test & Lint](https://github.com/bureaucratic-labs/dostoevsky/workflows/Test%20&%20Lint/badge.svg?branch=master)
Sentiment analysis library for russian language
## Install
Please note that `Dostoevsky` supports only Python 3.7+ on both Linux and Windows
```bash
$ pip install dostoevsky
```## Social network model [FastText]
This model was trained on [RuSentiment dataset](https://github.com/text-machine-lab/rusentiment) and achieves up to ~0.71 F1 score.
### Usage
First of all, you'll need to download binary model:
```bash
$ python -m dostoevsky download fasttext-social-network-model
```Then you can use sentiment analyzer:
```python
from dostoevsky.tokenization import RegexTokenizer
from dostoevsky.models import FastTextSocialNetworkModeltokenizer = RegexTokenizer()
tokens = tokenizer.split('всё очень плохо') # [('всё', None), ('очень', None), ('плохо', None)]model = FastTextSocialNetworkModel(tokenizer=tokenizer)
messages = [
'привет',
'я люблю тебя!!',
'малолетние дебилы'
]results = model.predict(messages, k=2)
for message, sentiment in zip(messages, results):
# привет -> {'speech': 1.0000100135803223, 'skip': 0.0020607432816177607}
# люблю тебя!! -> {'positive': 0.9886782765388489, 'skip': 0.005394937004894018}
# малолетние дебилы -> {'negative': 0.9525841474533081, 'neutral': 0.13661839067935944}]
print(message, '->', sentiment)
```## Articles
* 🇷🇺 [Dostoevsky — анализ тональности в Python за 5 минут](https://egorovegor.ru/analiz-tonalnosti-s-python-i-dostoevsky/)
* 🇷🇺 [Семантический анализ мнений о поправках к Конституции на основе данных ВКонтакте ](https://leftjoin.ru/all/constitution-sentiment-analysis/)
* 🇷🇺 [Как писать посты в стиле Артемия Лебедева? Подробный анализ телеграм-канала и кое-что еще](https://habr.com/ru/post/596035/)Feel free to extend this list with your article! ✨
## Related projects
* [David Dale](https://github.com/avidale) BERT-based models for [emotion detection](https://huggingface.co/cointegrated/rubert-tiny2-cedr-emotion-detection?text=%D0%93%D1%80%D1%83%D1%81%D1%82%D1%8C-%D1%82%D0%BE%D1%81%D0%BA%D0%B0+%D0%BC%D0%B5%D0%BD%D1%8F+%D1%81%D1%8A%D0%B5%D0%B4%D0%B0%D0%B5%D1%82) and [classification of toxicity](https://huggingface.co/cointegrated/rubert-tiny-toxicity)
## Citation
If you use the library in a research project, please include the following citation for the RuSentiment data:
```
@inproceedings{rogers-etal-2018-rusentiment,
title = "{R}u{S}entiment: An Enriched Sentiment Analysis Dataset for Social Media in {R}ussian",
author = "Rogers, Anna and
Romanov, Alexey and
Rumshisky, Anna and
Volkova, Svitlana and
Gronas, Mikhail and
Gribov, Alex",
booktitle = "Proceedings of the 27th International Conference on Computational Linguistics",
month = aug,
year = "2018",
address = "Santa Fe, New Mexico, USA",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/C18-1064",
pages = "755--763",
}```