{"id":18646909,"url":"https://github.com/jonad/toxicity_comments","last_synced_at":"2026-03-01T05:09:17.057Z","repository":{"id":39727055,"uuid":"269482398","full_name":"jonad/Toxicity_comments","owner":"jonad","description":null,"archived":false,"fork":false,"pushed_at":"2022-12-27T15:35:12.000Z","size":3196,"stargazers_count":3,"open_issues_count":11,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-08T21:13:44.331Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jonad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-06-04T22:57:17.000Z","updated_at":"2023-12-07T13:40:10.000Z","dependencies_parsed_at":"2023-01-31T04:15:50.277Z","dependency_job_id":null,"html_url":"https://github.com/jonad/Toxicity_comments","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jonad/Toxicity_comments","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonad%2FToxicity_comments","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonad%2FToxicity_comments/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonad%2FToxicity_comments/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonad%2FToxicity_comments/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jonad","download_url":"https://codeload.github.com/jonad/Toxicity_comments/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jonad%2FToxicity_comments/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29960269,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T01:47:18.291Z","status":"online","status_checked_at":"2026-03-01T02:00:07.437Z","response_time":124,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T06:23:26.391Z","updated_at":"2026-03-01T05:09:17.040Z","avatar_url":"https://github.com/jonad.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Recurrent Neural Network for sentence-level Text Classification\n\nThis project is about building and evaluating recurrent neural network models\nfor sentence-level text classification. The final models detect toxicity in\nshort texts as well as the type of toxicity, which include the following\ncategories: severe toxicity, obscene, identity attack, insult, and threat.\nThe final models can be used for filtering online posts and comments,\nsocial media policing, and user education.\n\u003cbr\u003e\n### Links\n- [The deployed models](TODO)\n\n### Sections\n- [Dataset Summary](#dataset-summary)\n- [Exploratory Data Analysis](#exploratory-data-analysis)\n- [Models](#models)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [Testing](#testing)\n\n## Dataset Summary\n[back to top](#sections)\n\n-  1.8+ million user comments dataset was downloaded from the [Kaggle competition labeled 'Jigsaw Unintended Bias in Toxicity Classification'](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification).\n- The dataset consists of 1.8+ million user comments that have been hand-labeled by human raters for toxicity levels.\n- The dataset also includes the following toxicity types: severe toxicity,  obscene, threat, insult, and identity attack.\n\u003cbr\u003e\n\u003cbr\u003e\n\n## Exploratory Data Analysis\n[back to top](#sections)\n\n###  Toxicity class distribution\n![](./images/data_distribution.png)\n\n### Correlation heatmap of types\n\u003cbr\u003e\n\u003cbr\u003e\n\n![](./images/correlations.png)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## Models\n[back to top](#sections)\n\n### Long Short-Term Memory Model (LSTM)\n![](./images/lstm.jpg)\n\n\u003cbr \u003e\n\n### Bidirectional Long Short-Term Memory Model (BiLSTM)\n\n\u003cbr \u003e\n![](./images/bilstm.jpg)\n\n\u003cbr \u003e\n\n### BiLSTM with Attention Mechanism\n![](./images/attention.jpg)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## Training\n[back to top](#sections)\n\n### Learning Curves\n\n![](./images/training.png)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## Evaluation\n[back to top](#sections)\n\u003cbr \u003e\n\n### ROC-AUC Toxicity\n![](./images/toxicity.png)\n\n### ROC-AUC Severe Toxicity\n![](./images/severe_toxicity.png)\n### ROC-AUC Obscene\n![](./images/obscene.png)\n### ROC-AUC Identity Attack\n![](./images/identity_attack.png)\n### ROC-AUC Insult\n![](./images/insult.png)\n### ROC-AUC Threat\n![](./images/threat.png)\n\u003cbr \u003e\n## Testing\n[back to top](#sections)\n\u003cbr \u003e\n\n![](./images/test1.png)\n\n![](./images/t2.png)\n\n![](./images/t3.png)\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonad%2Ftoxicity_comments","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjonad%2Ftoxicity_comments","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjonad%2Ftoxicity_comments/lists"}