Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/arv-anshul/yt-watch-history-v2

Analyse your YouTube watch history using ML with Graphs.
https://github.com/arv-anshul/yt-watch-history-v2

data-analysis docker fastapi machine-learning python streamlit youtube

Last synced: about 20 hours ago
JSON representation

Analyse your YouTube watch history using ML with Graphs.

Awesome Lists containing this project

README

        

# YouTube Watch History Analyser - V2

Analyse you YouTube Watch History using Machine Learning, plot graphs, etc.


GitHub
Material for MkDocs
Mermaid



## Working

### API (Backend)

- Used FastAPI to create backend APIs to interact with **MongoDB** database.
- Used **YouTube Data API v3** to fetch details about videos (you have watched).
- Used **Docker** to containerize the FastAPI application.

### ML (Models and API)

**Models**

1. **Video's Content Type Predictor**
- Multiclass Classification Problem
- Uses Video's title and tags to predict Content Type
- Planning to add Video's categoryId and duration for prediction but wants to sure about improvements
2. **Channel Recommender System**
- Recommender System
- Uses channel's videos title and tags to calculate similarity
- Uses `TfidfVectorizer` for text to vec convertion
- Uses user's channel subscriptions data to recommend channel

> \[!IMPORTANT\]
>
> By the way, I'm planning to upload the trained model to internet and model is download from URL to docker container
> once (if not exists).
>
> The model URL is provide through environment variable (`CTT_MODEL_URL`). If you want you can provide your model's URL.
>
> _This solution may works in short term_ 🤞

**API**

- Used **FastAPI** to serve model.
- Containerize FastAPI application and models using **Docker**.

### Frontend

- Uses **Streamlit** to create multipage web application where users can upload their required data and see analysis.
- Requires **YouTube API Key** to fetch video details from API for advance analysis.
- Uses **httpx** library to interact make requests to "Backend APIs" and "ML APIs".
- Uses **Polars** for data manipulation.

### Apps Composition

- Wrote [docker-compose.yaml] script to build and run all three containers in one go.
- Used `mongodb` docker image as database, see [docker-compose.yaml].

## Setup

Clone this GitHub Repository

```bash
git clone https://github.com/arv-anshul/yt-watch-history-v2
cd yt-watch-history-v2
```

Open **Docker Desktop** and run below command:

👀 See [docker-compose.yaml]

```bash
docker compose up --build # First build the container and then run it (for first time)
```

## Roadmap

- [ ] 🪠 Create a ETL pipeline to train models
- [ ] 📌 Integrate `mlflow`[^1] for ML Model monitoring
- [x] 🛠️ Build the basics from [yt-watch-history] project
- [x] 🎨 Draw diagrams for references
- [x] ⛓️ How to intergrate **pre-trained** ML Model
- [x] 🤖 Build **Channel Recommender System**
- [x] 👷 Better CTT Model pipeline

[docker-compose.yaml]: docker-compose.yaml
[yt-watch-history]: https://github.com/arv-anshul/yt-watch-history

[^1]: CampusX is going to launch a free course on MLFlow. Nitish Sir announce this in his recent video.