Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/saarthakkj/detoxify_yt

Detoxify helps users focus on educational content by filtering YouTube feeds into specific categories (Chess, Coding, Mathematics) using AI-powered classification, allowing users to selectively view content that matches their learning interests.
https://github.com/saarthakkj/detoxify_yt

chrome-extensions development llm-inference ml mlops

Last synced: 4 days ago
JSON representation

Detoxify helps users focus on educational content by filtering YouTube feeds into specific categories (Chess, Coding, Mathematics) using AI-powered classification, allowing users to selectively view content that matches their learning interests.

Awesome Lists containing this project

README

        

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
# Detoxify

Detoxify Logo

> Smart Content. Clear Categories. Better YouTube Feed.

Detoxify is an AI-powered Chrome extension and content classification system that declutters your YouTube feed by intelligently categorizing videos into Chess, Coding, Mathematics, and other categories using state-of-the-art BERT models.

## 📑 Table of Contents
- [Quick Start](#-quick-start)
- [Overview](#-overview)
- [System Architecture](#-system-architecture)
- [Features](#-features)
- [Performance Metrics](#-performance-metrics)
- [Technical Implementation](#-technical-implementation)
- [Workflow](#-workflow)
- [Security](#-security)
- [Future Enhancements](#-future-enhancements)
- [Contributing](#-contributing)
- [License](#-license)
- [Acknowledgments](#-acknowledgments)
- [Contact](#-contact)

## 🚀 Quick Start

### Local Setup

1. **Clone the Repository**
```bash
git clone https://github.com/Saarthakkj/detoxify_yt.git
cd detoxify_yt
```

2. **Install Dependencies**
```bash
npm install
```

## 🎯 Overview

Detoxify is a complete ecosystem that combines:
- Chrome Extension for user interaction
- FastAPI backend for processing
- BERT model for classification
- BrightData API for dataset generation

## 🏗️ System Architecture

### 1. Chrome Extension (Frontend)
- Real-time content scraping initiation
- Dynamic video filtering based on classifications

### 2. FastAPI Backend
- High-performance API endpoints
- Asynchronous processing
- Token-based authentication
- Real-time data handling

### 3. BERT Classification Model
- Fine-tuned on YouTube content
- Multi-category classification
- Real-time inference capabilities

### 4. BrightData Scraping API
- Efficient data collection
- High Quality Dataset-generation

## 🚀 Features

- **Intelligent Classification**: BERT-powered content categorization
- **High Accuracy**: 87.8% classification accuracy
- **Processing**: Content filtering
- **Three Categories**: Chess, Coding, Mathematics

## 📊 Performance Metrics

### Model Performance
- Overall Accuracy: 87.8%

### ROC-AUC Scores
- Chess: 0.976
- Coding: 0.971
- Mathematics: 0.949
- Other: 0.941

## 🛠️ Technical Implementation

### Model Training Configuration
```python
training_args = TrainingArguments(
learning_rate=2e-5,
per_device_train_batch_size=16,
num_train_epochs=10,
warmup_ratio=0.1,
weight_decay=0.01
)
```

### System Requirements
- Python 3.8+
- Chrome Browser (latest version)
- Internet connection for API access

### Core Dependencies
```
fastapi==0.104.1
uvicorn==0.23.2
pydantic>=2.0.0
transformers>=4.30.0
torch>=2.0.0
python-dotenv==1.0.1
requests>=2.31.0
```

## 🔄 Workflow

1. **User Interaction**
- Install Chrome extension
- Select content category

2. **Data Processing**
- Backend processes incoming data
- BERT model classifies content

3. **Content Filtering**
- Relevant videos are displayed
- Non-matching content is hidden

## 🔒 Security

- Token-based API authentication
- Secure data transmission
- Protected model endpoints

## 🔮 Future Enhancements

- Implementing faster models
- Better UI/UX for extension
- Additional content categories

## 🤝 Contributing

1. Fork the repository
2. Create feature branch (`git checkout -b feature/Enhancement`)
3. Commit changes (`git commit -m 'Add Enhancement'`)
4. Push to branch (`git push origin feature/Enhancement`)
5. Open Pull Request

## 📝 License

This project is licensed under the MIT License - see [LICENSE](LICENSE) file.

## 🙏 Acknowledgments

- BrightData API for YouTube content scraping
- FastAPI team for the web framework
- Hugging Face for transformer models
- Render for deploying

## 📧 Contact

Prakhar Agrawal
- Email: [email protected]

Saarthak Saxena
- Twitter: [@curlydazai](https://x.com/curlydazai)
- Email: [email protected]

Project: [GitHub Repository](https://github.com/Saarthakkj/detoxify_yt)

![Roadmap](roadmap.png)

---

Made with ❤️ for a cleaner YouTube experience