https://github.com/freref/toxic-comment-classification

Last synced: 21 days ago
JSON representation

Host: GitHub
URL: https://github.com/freref/toxic-comment-classification
Owner: freref
Created: 2025-12-10T23:12:57.000Z (6 months ago)
Default Branch: master
Last Pushed: 2025-12-11T00:17:04.000Z (6 months ago)
Last Synced: 2025-12-30T18:54:41.783Z (5 months ago)
Language: Jupyter Notebook
Size: 146 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# toxic-comment-classification

This repository contains two notebooks demonstrating the differences between Naive Bayes and Bidirectional GRU for toxic comment classification.

## Data

The project uses the **Jigsaw Toxic Comment Classification Challenge** dataset from Kaggle.

* **Source**: [Kaggle Competition Data](https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data)
* **Setup**: Place `train.csv` and `test.csv` in the `data/` directory.

For the Deep Learning model, I used pre-trained **GloVe embeddings** (100d).

* **Source**: [GloVe 6B 100d on Kaggle](https://www.kaggle.com/datasets/danielwillgeorge/glove6b100dtxt)
* **Setup**: Place `glove.6B.100d.txt` in the `data/` directory.

## Models

The trained Bidirectional GRU model is hosted on Hugging Face:

* [freref/toxic\_comments](https://huggingface.co/freref/toxic_comments/tree/main)

## Installation

This project is initialized with **uv**. To install the required dependencies:

```bash
uv sync
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/freref/toxic-comment-classification

Awesome Lists containing this project

README