Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/saarthakkj/detoxify_yt
Detoxify helps users focus on educational content by filtering YouTube feeds into specific categories (Chess, Coding, Mathematics) using AI-powered classification, allowing users to selectively view content that matches their learning interests.
https://github.com/saarthakkj/detoxify_yt
chrome-extensions development llm-inference ml mlops
Last synced: 4 days ago
JSON representation
Detoxify helps users focus on educational content by filtering YouTube feeds into specific categories (Chess, Coding, Mathematics) using AI-powered classification, allowing users to selectively view content that matches their learning interests.
- Host: GitHub
- URL: https://github.com/saarthakkj/detoxify_yt
- Owner: Saarthakkj
- License: mit
- Created: 2024-12-14T16:24:04.000Z (about 2 months ago)
- Default Branch: master
- Last Pushed: 2025-02-01T00:01:44.000Z (5 days ago)
- Last Synced: 2025-02-01T00:25:48.079Z (5 days ago)
- Topics: chrome-extensions, development, llm-inference, ml, mlops
- Language: Python
- Homepage:
- Size: 2.32 MB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.MD
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Roadmap: roadmap.png
Awesome Lists containing this project
README
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
# Detoxify> Smart Content. Clear Categories. Better YouTube Feed.
Detoxify is an AI-powered Chrome extension and content classification system that declutters your YouTube feed by intelligently categorizing videos into Chess, Coding, Mathematics, and other categories using state-of-the-art BERT models.
## 📑 Table of Contents
- [Quick Start](#-quick-start)
- [Overview](#-overview)
- [System Architecture](#-system-architecture)
- [Features](#-features)
- [Performance Metrics](#-performance-metrics)
- [Technical Implementation](#-technical-implementation)
- [Workflow](#-workflow)
- [Security](#-security)
- [Future Enhancements](#-future-enhancements)
- [Contributing](#-contributing)
- [License](#-license)
- [Acknowledgments](#-acknowledgments)
- [Contact](#-contact)## 🚀 Quick Start
### Local Setup
1. **Clone the Repository**
```bash
git clone https://github.com/Saarthakkj/detoxify_yt.git
cd detoxify_yt
```2. **Install Dependencies**
```bash
npm install
```## 🎯 Overview
Detoxify is a complete ecosystem that combines:
- Chrome Extension for user interaction
- FastAPI backend for processing
- BERT model for classification
- BrightData API for dataset generation## 🏗️ System Architecture
### 1. Chrome Extension (Frontend)
- Real-time content scraping initiation
- Dynamic video filtering based on classifications### 2. FastAPI Backend
- High-performance API endpoints
- Asynchronous processing
- Token-based authentication
- Real-time data handling### 3. BERT Classification Model
- Fine-tuned on YouTube content
- Multi-category classification
- Real-time inference capabilities### 4. BrightData Scraping API
- Efficient data collection
- High Quality Dataset-generation## 🚀 Features
- **Intelligent Classification**: BERT-powered content categorization
- **High Accuracy**: 87.8% classification accuracy
- **Processing**: Content filtering
- **Three Categories**: Chess, Coding, Mathematics## 📊 Performance Metrics
### Model Performance
- Overall Accuracy: 87.8%### ROC-AUC Scores
- Chess: 0.976
- Coding: 0.971
- Mathematics: 0.949
- Other: 0.941## 🛠️ Technical Implementation
### Model Training Configuration
```python
training_args = TrainingArguments(
learning_rate=2e-5,
per_device_train_batch_size=16,
num_train_epochs=10,
warmup_ratio=0.1,
weight_decay=0.01
)
```### System Requirements
- Python 3.8+
- Chrome Browser (latest version)
- Internet connection for API access### Core Dependencies
```
fastapi==0.104.1
uvicorn==0.23.2
pydantic>=2.0.0
transformers>=4.30.0
torch>=2.0.0
python-dotenv==1.0.1
requests>=2.31.0
```## 🔄 Workflow
1. **User Interaction**
- Install Chrome extension
- Select content category2. **Data Processing**
- Backend processes incoming data
- BERT model classifies content3. **Content Filtering**
- Relevant videos are displayed
- Non-matching content is hidden## 🔒 Security
- Token-based API authentication
- Secure data transmission
- Protected model endpoints## 🔮 Future Enhancements
- Implementing faster models
- Better UI/UX for extension
- Additional content categories## 🤝 Contributing
1. Fork the repository
2. Create feature branch (`git checkout -b feature/Enhancement`)
3. Commit changes (`git commit -m 'Add Enhancement'`)
4. Push to branch (`git push origin feature/Enhancement`)
5. Open Pull Request## 📝 License
This project is licensed under the MIT License - see [LICENSE](LICENSE) file.
## 🙏 Acknowledgments
- BrightData API for YouTube content scraping
- FastAPI team for the web framework
- Hugging Face for transformer models
- Render for deploying## 📧 Contact
Prakhar Agrawal
- Email: [email protected]Saarthak Saxena
- Twitter: [@curlydazai](https://x.com/curlydazai)
- Email: [email protected]Project: [GitHub Repository](https://github.com/Saarthakkj/detoxify_yt)
![Roadmap](roadmap.png)
---
Made with ❤️ for a cleaner YouTube experience