{"id":19947846,"url":"https://github.com/aarryasutar/hate_speech_detection","last_synced_at":"2026-04-09T17:05:51.125Z","repository":{"id":250435982,"uuid":"834471919","full_name":"aarryasutar/Hate_Speech_Detection","owner":"aarryasutar","description":"This project aims to detect hate speech on Twitter using advanced NLP and machine learning techniques, exploring feature extraction methods like TF-IDF and sentiment analysis, and evaluating models such as Logistic Regression and SVM.","archived":false,"fork":false,"pushed_at":"2024-07-27T11:38:40.000Z","size":1924,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-12T05:40:59.563Z","etag":null,"topics":["confusion-matrix","doc2vec","gensim","logistic-regression","matplotlib","naive-bayes","nltk","numpy","pandas","python","random-forest","scikit-learn","seaborn","stemming","stopwords-removal","svm","tf-idf-vectorizer","tokenization","vader","word-cloud"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aarryasutar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-27T11:20:57.000Z","updated_at":"2024-07-27T11:49:10.000Z","dependencies_parsed_at":"2024-07-27T12:57:41.055Z","dependency_job_id":null,"html_url":"https://github.com/aarryasutar/Hate_Speech_Detection","commit_stats":null,"previous_names":["aarryasutar/hate_speech_detection"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aarryasutar%2FHate_Speech_Detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aarryasutar%2FHate_Speech_Detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aarryasutar%2FHate_Speech_Detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aarryasutar%2FHate_Speech_Detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aarryasutar","download_url":"https://codeload.github.com/aarryasutar/Hate_Speech_Detection/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241375122,"owners_count":19952656,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["confusion-matrix","doc2vec","gensim","logistic-regression","matplotlib","naive-bayes","nltk","numpy","pandas","python","random-forest","scikit-learn","seaborn","stemming","stopwords-removal","svm","tf-idf-vectorizer","tokenization","vader","word-cloud"],"created_at":"2024-11-13T00:37:42.405Z","updated_at":"2025-12-30T20:12:15.443Z","avatar_url":"https://github.com/aarryasutar.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hate Speech Detection on Twitter\n\nThis project focuses on the detection of hate speech on Twitter using various natural language processing (NLP) and machine learning techniques. The project explores different feature extraction methods, including TF-IDF, sentiment analysis, and Doc2Vec, and evaluates multiple machine learning models to determine the most effective approach.\n\n## Table of Contents\n- [Installation](#installation)\n- [Dataset](#dataset)\n- [Data Preprocessing](#data-preprocessing)\n- [Feature Extraction](#feature-extraction)\n- [Model Training and Evaluation](#model-training-and-evaluation)\n- [Results](#results)\n- [Visualization](#visualization)\n- [Confusion Matrix](#confusion-matrix)\n- [Conclusion](#conclusion)\n\n### Installation\nTo run this project, you need to have Python installed along with the necessary libraries. You can install the required libraries using the provided command.\n\n### Dataset\nThe dataset used in this project is `HateSpeechData.csv`, which contains tweets labeled as hate speech, offensive speech, or neither. The dataset is loaded and a new column `text length` is added to represent the length of each tweet.\n\n### Data Preprocessing\nThe preprocessing steps include:\n1. Removal of punctuation and capitalization\n2. Tokenizing\n3. Removal of stopwords\n4. Stemming\n\n## Feature Extraction\n### Word Cloud\nVisualizing the most commonly used words in the dataset through a word cloud.\n\n### TF-IDF\nTF-IDF feature extraction to transform the text data into numerical features.\n\n### Sentiment Analysis\nUsing VADER sentiment analysis to extract sentiment-related features from the tweets.\n\n### Doc2Vec\nTraining a Doc2Vec model and extracting document vectors to represent the tweets in vector space.\n\n## Model Training and Evaluation\n### Logistic Regression\nTraining and evaluating a logistic regression model for hate speech detection.\n\n### Random Forest\nTraining and evaluating a random forest classifier for hate speech detection.\n\n### Naive Bayes\nTraining and evaluating a Naive Bayes classifier for hate speech detection.\n\n### Support Vector Machine (SVM)\nTraining and evaluating a Support Vector Machine for hate speech detection.\n\n### Comparison of Models\nVisualizing the accuracy of different models to compare their performance.\n\n## Results\nThe accuracy and performance of each model are presented in a comparison chart. The logistic regression and support vector machine models performed better than the others.\n\n## Visualization\n### Word Cloud\nWord clouds for the entire dataset and for hate and offensive speech specifically.\n\n## Confusion Matrix\nThe confusion matrix helps to understand the misclassifications made by the model. It provides insights into the performance of the model by showing the true and predicted values for each class.\n\n## Conclusion\nThis project demonstrates the effectiveness of various NLP techniques and machine learning models in detecting hate speech on Twitter. The results highlight the importance of feature extraction methods and model selection in achieving high accuracy and reliable performance in hate speech detection tasks.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faarryasutar%2Fhate_speech_detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faarryasutar%2Fhate_speech_detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faarryasutar%2Fhate_speech_detection/lists"}