{"id":23592728,"url":"https://github.com/tushar2704/twitter-toxicity-classification","last_synced_at":"2025-11-04T10:30:32.153Z","repository":{"id":180534292,"uuid":"662692044","full_name":"tushar2704/Twitter-Toxicity-Classification","owner":"tushar2704","description":" This project aims to solve the problem of identifying the toxicity of tweets using Natural Language Processing (NLP) techniques. Given the dataset `twitter_parsed_tweets`, I will implement a step-by-step NLP approach to classify tweets into binary values based on their toxicity.","archived":false,"fork":false,"pushed_at":"2023-07-11T22:18:45.000Z","size":2007,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-27T08:16:36.491Z","etag":null,"topics":["artificial-intelligence","data-science","nlp","nlp-machine-learning","text-classification"],"latest_commit_sha":null,"homepage":"https://tushar-aggarwal.com","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tushar2704.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-05T17:17:12.000Z","updated_at":"2023-08-07T04:16:32.000Z","dependencies_parsed_at":"2023-07-15T05:00:58.143Z","dependency_job_id":null,"html_url":"https://github.com/tushar2704/Twitter-Toxicity-Classification","commit_stats":null,"previous_names":["tushar2704/twitter-toxicity-classification"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tushar2704%2FTwitter-Toxicity-Classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tushar2704%2FTwitter-Toxicity-Classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tushar2704%2FTwitter-Toxicity-Classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tushar2704%2FTwitter-Toxicity-Classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tushar2704","download_url":"https://codeload.github.com/tushar2704/Twitter-Toxicity-Classification/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239432753,"owners_count":19637798,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","data-science","nlp","nlp-machine-learning","text-classification"],"created_at":"2024-12-27T08:15:31.976Z","updated_at":"2025-11-04T10:30:32.099Z","avatar_url":"https://github.com/tushar2704.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Twitter Toxicity Classification\n\nThis project aims to solve the problem of identifying the toxicity of tweets using Natural Language Processing (NLP) techniques. Given the dataset `twitter_parsed_tweets`, I will implement a step-by-step NLP approach to classify tweets into binary values based on their toxicity.\n\n## Problem Statement\n\nIn this project, we will tackle the task of toxicity classification for tweets. The main objective is to develop a model that can accurately identify toxic tweets and assign them a binary value indicating their toxicity level. The classification will enable us to flag potentially harmful content on social media platforms and take appropriate actions.\n\n## Dataset\n\nThe dataset `twitter_parsed_tweets` contains a collection of tweets along with their corresponding annotations. It consists of the following columns:\n\n- `index`: Index of the tweet\n- `id`: Unique identifier for each tweet\n- `Text`: The text content of the tweet\n- `Annotation`: Annotation indicating the toxicity level of the tweet\n- `oh_label`: Binary label indicating the toxicity (1 for toxic, 0 for non-toxic)\n\n## Approach\n\nThe NLP approach to toxicity classification involves several key steps:\n\n1. **Data Cleaning**: We will perform data cleaning techniques to handle missing values, remove duplicates, and preprocess the text data for further analysis.\n\n2. **Text Preprocessing**: In this step, we will apply techniques such as tokenization, removing stopwords, and normalizing the text to prepare it for feature extraction.\n\n3. **Feature Extraction**: We will extract relevant features from the preprocessed text data. This may involve methods such as creating a bag-of-words representation, using TF-IDF, or utilizing word embeddings.\n\n4. **Model Training and Evaluation**: We will train a machine learning model, such as Naive Bayes, Logistic Regression, or a deep learning model like LSTM or BERT, on the extracted features. The model will be evaluated using appropriate evaluation metrics and techniques like cross-validation.\n\n5. **Model Deployment**: Once a satisfactory model is obtained, it can be deployed to classify new, unseen tweets and provide toxicity predictions.\n\n## Installation\n\nTo run this project locally, please follow these steps:\n\n1. Clone this repository:\n   ```\n   git clone https://github.com/tushar2704/twitter-toxicity-classification.git\n   ```\n\n2. Install the required dependencies:\n   ```\n   pip install -r requirements.txt\n   ```\n\n3. Run the main script to perform data cleaning, preprocessing, feature extraction, and model training:\n   ```\n   python main.py\n   ```\n\n4. Once the model is trained, you can use it to classify new tweets by calling the relevant functions or scripts.\n\n## Contributing\n\nContributions to this project are welcome. If you have any suggestions or would like to contribute to improving the toxicity classification model, feel free to open an issue or submit a pull request.\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE). You are free to use, modify, and distribute the code for academic, commercial, or personal purposes.\n\n## Acknowledgments\n\nWe would like to express our gratitude to the creators of the `twitter_parsed_tweets` dataset for providing the data for this project. Their efforts enable us to tackle the important task of identifying toxic tweets and promote a safer online environment.\n\n## Contact\n\nFor any inquiries or further information about this project, please contact [Tushar Aggarwal](mailto:info@tushar-aggarwal.com).\n\nLet's work together to combat toxicity on Twitter and create a more positive and inclusive social media environment!","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftushar2704%2Ftwitter-toxicity-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftushar2704%2Ftwitter-toxicity-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftushar2704%2Ftwitter-toxicity-classification/lists"}