{"id":15503017,"url":"https://github.com/esceptico/toxic","last_synced_at":"2025-04-22T23:21:33.784Z","repository":{"id":51117988,"uuid":"322952923","full_name":"esceptico/toxic","owner":"esceptico","description":"End-to-end toxic Russian comment classification","archived":false,"fork":false,"pushed_at":"2022-08-23T09:05:19.000Z","size":3292,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-10-19T17:30:36.328Z","etag":null,"topics":["captum","convolutional-neural-networks","deep-learning","hydra","nlp","pytorch","streamlit","text-classification","visualization"],"latest_commit_sha":null,"homepage":"https://share.streamlit.io/esceptico/toxic/app.py","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/esceptico.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-19T22:56:24.000Z","updated_at":"2024-01-17T22:05:52.000Z","dependencies_parsed_at":"2022-09-04T01:22:39.648Z","dependency_job_id":null,"html_url":"https://github.com/esceptico/toxic","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esceptico%2Ftoxic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esceptico%2Ftoxic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esceptico%2Ftoxic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/esceptico%2Ftoxic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/esceptico","download_url":"https://codeload.github.com/esceptico/toxic/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250338497,"owners_count":21414196,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["captum","convolutional-neural-networks","deep-learning","hydra","nlp","pytorch","streamlit","text-classification","visualization"],"created_at":"2024-10-02T09:11:54.334Z","updated_at":"2025-04-22T23:21:33.757Z","avatar_url":"https://github.com/esceptico.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Toxic Comment Classification\nFullstack end-to-end toxic comment classification with result interpretation\n\nYou can visit the online demo on [this](https://share.streamlit.io/esceptico/toxic/app.py) page\n## Data\nThe dataset is available at [kaggle](https://www.kaggle.com/alexandersemiletov/toxic-russian-comments) and contains labelled comments from the popular Russian social network ok.ru\n\n## Train\n```\npython run_training.py\n```\n\n## Inference\n```python\nfrom src.toxic.inference import Toxic\n\nmodel = Toxic.from_checkpoint('path_to_model_or_name')\nmodel.infer('привет, придурок')\n```\n\nResult:\n```\n{\n    'predicted': [\n        {'class': 'insult', 'confidence': 0.99324},\n        {'class': 'threat', 'confidence': 0.002},\n        {'class': 'obscenity', 'confidence': 0.00225}\n    ],\n    'interpretation': {\n        'spans': [(0, 7), (7, 16)],\n        'weights': {\n            'insult': [-0.34299, 0.93934],\n            'threat': [-0.97362, 0.22819],\n            'obscenity': [-0.99579, 0.09168]\n        }\n    }\n}\n```\n\n### Pretrained model\n\nWe provide pretrained model at release page\n\nTo download the model execute:\n```\nwget https://github.com/esceptico/toxic/releases/download/v0.1.0/model.pth.zip\nunzip model.pth.zip\n```\n\nYou can download and cache pretrained model by model name as well:\n```python\nmodel = Toxic.from_checkpoint('cnn')\n```\n\n**List of supported pretrained models**:\n* `cnn`: Wide CNN encoder with Feed Forward classification head\n\n\n## Serving\n### Streamlit\n```\nstreamlit run ui/app.py -- --model=models/model.pth\n```\n![Example](docs/images/ui.png)\n\n## Powered By\n* [captum](https://github.com/pytorch/captum)\n* [hydra](https://github.com/facebookresearch/hydra)\n* [tokenizers](https://github.com/huggingface/tokenizers)\n* [torchmetrics](https://github.com/PyTorchLightning/metrics) \n* [streamlit](https://github.com/streamlit/streamlit)\n\n## License\nThe code is released under MIT license.\n\nThe models are distributed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) according to the dataset license.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fesceptico%2Ftoxic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fesceptico%2Ftoxic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fesceptico%2Ftoxic/lists"}