Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/esceptico/toxic
End-to-end toxic Russian comment classification
https://github.com/esceptico/toxic
captum convolutional-neural-networks deep-learning hydra nlp pytorch streamlit text-classification visualization
Last synced: about 1 month ago
JSON representation
End-to-end toxic Russian comment classification
- Host: GitHub
- URL: https://github.com/esceptico/toxic
- Owner: esceptico
- License: mit
- Created: 2020-12-19T22:56:24.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2022-08-23T09:05:19.000Z (over 2 years ago)
- Last Synced: 2024-10-19T17:30:36.328Z (4 months ago)
- Topics: captum, convolutional-neural-networks, deep-learning, hydra, nlp, pytorch, streamlit, text-classification, visualization
- Language: Python
- Homepage: https://share.streamlit.io/esceptico/toxic/app.py
- Size: 3.14 MB
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Toxic Comment Classification
Fullstack end-to-end toxic comment classification with result interpretationYou can visit the online demo on [this](https://share.streamlit.io/esceptico/toxic/app.py) page
## Data
The dataset is available at [kaggle](https://www.kaggle.com/alexandersemiletov/toxic-russian-comments) and contains labelled comments from the popular Russian social network ok.ru## Train
```
python run_training.py
```## Inference
```python
from src.toxic.inference import Toxicmodel = Toxic.from_checkpoint('path_to_model_or_name')
model.infer('привет, придурок')
```Result:
```
{
'predicted': [
{'class': 'insult', 'confidence': 0.99324},
{'class': 'threat', 'confidence': 0.002},
{'class': 'obscenity', 'confidence': 0.00225}
],
'interpretation': {
'spans': [(0, 7), (7, 16)],
'weights': {
'insult': [-0.34299, 0.93934],
'threat': [-0.97362, 0.22819],
'obscenity': [-0.99579, 0.09168]
}
}
}
```### Pretrained model
We provide pretrained model at release page
To download the model execute:
```
wget https://github.com/esceptico/toxic/releases/download/v0.1.0/model.pth.zip
unzip model.pth.zip
```You can download and cache pretrained model by model name as well:
```python
model = Toxic.from_checkpoint('cnn')
```**List of supported pretrained models**:
* `cnn`: Wide CNN encoder with Feed Forward classification head## Serving
### Streamlit
```
streamlit run ui/app.py -- --model=models/model.pth
```
![Example](docs/images/ui.png)## Powered By
* [captum](https://github.com/pytorch/captum)
* [hydra](https://github.com/facebookresearch/hydra)
* [tokenizers](https://github.com/huggingface/tokenizers)
* [torchmetrics](https://github.com/PyTorchLightning/metrics)
* [streamlit](https://github.com/streamlit/streamlit)## License
The code is released under MIT license.The models are distributed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) according to the dataset license.