{"id":30848277,"url":"https://github.com/sanyambk/hate-speech-detection","last_synced_at":"2026-05-09T02:33:39.553Z","repository":{"id":313548560,"uuid":"1051805456","full_name":"SanyamBK/Hate-Speech-Detection","owner":"SanyamBK","description":"Multiclass hate-speech detection pipeline: data cleaning, neural model training, and serialized inference artifact.","archived":false,"fork":false,"pushed_at":"2025-09-06T19:04:28.000Z","size":8527,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-06T21:21:45.730Z","etag":null,"topics":["deep-learning","machine-learning","nlp","pytorch","tensorflow","text-classification"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SanyamBK.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-06T18:54:40.000Z","updated_at":"2025-09-06T19:07:26.000Z","dependencies_parsed_at":"2025-09-06T21:32:00.218Z","dependency_job_id":null,"html_url":"https://github.com/SanyamBK/Hate-Speech-Detection","commit_stats":null,"previous_names":["sanyambk/hate-speech-detection"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/SanyamBK/Hate-Speech-Detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SanyamBK%2FHate-Speech-Detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SanyamBK%2FHate-Speech-Detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SanyamBK%2FHate-Speech-Detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SanyamBK%2FHate-Speech-Detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SanyamBK","download_url":"https://codeload.github.com/SanyamBK/Hate-Speech-Detection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SanyamBK%2FHate-Speech-Detection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273990186,"owners_count":25203288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","machine-learning","nlp","pytorch","tensorflow","text-classification"],"created_at":"2025-09-07T03:08:02.169Z","updated_at":"2026-05-09T02:33:39.510Z","avatar_url":"https://github.com/SanyamBK.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hate Speech Detection\n\n### Overview\n\nThis project implements a text-classification pipeline to categorize short social media posts (tweets) into one of three classes:\n\n- Hate Speech (0)\n- Offensive Language (1)\n- Neither / Neutral (2)\n\nThis is a multiclass classification problem. The solution includes data exploration, cleaning and preprocessing, model training using a neural network, and a saved model artifact for inference.\n\n### Contest\n\nDeveloped for CodeChef Weekend Dev Challenge 14: \"DL Projects\" (attempted on 6 Sep 2025).\n\n## Repository layout\n\n- `main.ipynb` — exploratory analysis and experiment log.\n- Part 1/\n  - `main.py` — data loading and initial EDA.\n  - `hate_speech.csv` — original raw dataset (columns: `tweet`, `class`).\n- Part 2/\n  - `main.py` — data cleaning and preprocessing pipeline; outputs `cleaned_hate_dataset.csv`.\n  - `hate_dataset.csv` — intermediate dataset.\n- Part 3/\n  - `main.py` — model definition, training, evaluation, and inference utilities.\n  - `cleaned_hate_dataset.csv` — final cleaned dataset used for training.\n  - `hate_speech_model.pkl` — serialized trained model for deployment.\n\n## Key steps\n\n1. Data exploration (Part 1): inspect class balance, token distributions, and common tokens.\n2. Cleaning \u0026 preprocessing (Part 2): normalize text, remove noise, tokenize, and vectorize (TF-IDF or embeddings).\n3. Model training \u0026 evaluation (Part 3): train a neural classifier and evaluate using accuracy, precision/recall, F1, and confusion matrix.\n\n### Part 1 — Data Exploration \u0026 Analysis\n\nIn Part 1 we perform an exploratory data analysis to understand the dataset before cleaning and modeling. The dataset (`hate_dataset.csv` / `hate_speech.csv`) contains two columns:\n\n- `tweet`: raw tweet text\n- `class`: label (0 = Hate Speech, 1 = Offensive Language, 2 = Neither/Neutral)\n\nThe EDA includes:\n\n- Class distribution and imbalance checks\n- Token length and distribution plots\n- Frequent token and n-gram analysis per class\n\n## Quick start\n\n1. Create a virtual environment and install dependencies (from repo root):\n\n```powershell\npython -m venv venv\n.\\venv\\Scripts\\Activate.ps1\npip install -r requirements.txt\n```\n\n2. Run Part 1 (EDA):\n\n```powershell\nSet-Location -LiteralPath \"Hate Speech Detection project\\Part 1\"\npython main.py\n```\n\n3. Run final training \u0026 evaluation (Part 3):\n\n```powershell\nSet-Location -LiteralPath \"Hate Speech Detection project\\Part 3\"\npython main.py\n```\n\n### Inference example\n\n```python\nimport joblib\nmodel = joblib.load('Part 3/hate_speech_model.pkl')\ntext = \"This is a sample tweet to classify\"\npred = model.predict([text])\nprint(pred)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanyambk%2Fhate-speech-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsanyambk%2Fhate-speech-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsanyambk%2Fhate-speech-detection/lists"}