https://github.com/freref/toxic-comment-classification
https://github.com/freref/toxic-comment-classification
Last synced: 21 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/freref/toxic-comment-classification
- Owner: freref
- Created: 2025-12-10T23:12:57.000Z (6 months ago)
- Default Branch: master
- Last Pushed: 2025-12-11T00:17:04.000Z (6 months ago)
- Last Synced: 2025-12-30T18:54:41.783Z (5 months ago)
- Language: Jupyter Notebook
- Size: 146 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# toxic-comment-classification
This repository contains two notebooks demonstrating the differences between Naive Bayes and Bidirectional GRU for toxic comment classification.
## Data
The project uses the **Jigsaw Toxic Comment Classification Challenge** dataset from Kaggle.
* **Source**: [Kaggle Competition Data](https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data)
* **Setup**: Place `train.csv` and `test.csv` in the `data/` directory.
For the Deep Learning model, I used pre-trained **GloVe embeddings** (100d).
* **Source**: [GloVe 6B 100d on Kaggle](https://www.kaggle.com/datasets/danielwillgeorge/glove6b100dtxt)
* **Setup**: Place `glove.6B.100d.txt` in the `data/` directory.
## Models
The trained Bidirectional GRU model is hosted on Hugging Face:
* [freref/toxic\_comments](https://huggingface.co/freref/toxic_comments/tree/main)
## Installation
This project is initialized with **uv**. To install the required dependencies:
```bash
uv sync
```