Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arv-anshul/yt-watch-history-v2
Analyse your YouTube watch history using ML with Graphs.
https://github.com/arv-anshul/yt-watch-history-v2
data-analysis docker fastapi machine-learning python streamlit youtube
Last synced: about 20 hours ago
JSON representation
Analyse your YouTube watch history using ML with Graphs.
- Host: GitHub
- URL: https://github.com/arv-anshul/yt-watch-history-v2
- Owner: arv-anshul
- Created: 2024-05-16T13:50:53.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-07-07T06:56:24.000Z (6 months ago)
- Last Synced: 2024-07-07T07:51:54.264Z (6 months ago)
- Topics: data-analysis, docker, fastapi, machine-learning, python, streamlit, youtube
- Language: Python
- Homepage: https://arv-anshul.github.io/project/yt-watch-history/v2-architecture
- Size: 542 KB
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# YouTube Watch History Analyser - V2
Analyse you YouTube Watch History using Machine Learning, plot graphs, etc.
## Working
### API (Backend)
- Used FastAPI to create backend APIs to interact with **MongoDB** database.
- Used **YouTube Data API v3** to fetch details about videos (you have watched).
- Used **Docker** to containerize the FastAPI application.### ML (Models and API)
**Models**
1. **Video's Content Type Predictor**
- Multiclass Classification Problem
- Uses Video's title and tags to predict Content Type
- Planning to add Video's categoryId and duration for prediction but wants to sure about improvements
2. **Channel Recommender System**
- Recommender System
- Uses channel's videos title and tags to calculate similarity
- Uses `TfidfVectorizer` for text to vec convertion
- Uses user's channel subscriptions data to recommend channel> \[!IMPORTANT\]
>
> By the way, I'm planning to upload the trained model to internet and model is download from URL to docker container
> once (if not exists).
>
> The model URL is provide through environment variable (`CTT_MODEL_URL`). If you want you can provide your model's URL.
>
> _This solution may works in short term_ 🤞**API**
- Used **FastAPI** to serve model.
- Containerize FastAPI application and models using **Docker**.### Frontend
- Uses **Streamlit** to create multipage web application where users can upload their required data and see analysis.
- Requires **YouTube API Key** to fetch video details from API for advance analysis.
- Uses **httpx** library to interact make requests to "Backend APIs" and "ML APIs".
- Uses **Polars** for data manipulation.### Apps Composition
- Wrote [docker-compose.yaml] script to build and run all three containers in one go.
- Used `mongodb` docker image as database, see [docker-compose.yaml].## Setup
Clone this GitHub Repository
```bash
git clone https://github.com/arv-anshul/yt-watch-history-v2
cd yt-watch-history-v2
```Open **Docker Desktop** and run below command:
👀 See [docker-compose.yaml]
```bash
docker compose up --build # First build the container and then run it (for first time)
```## Roadmap
- [ ] 🪠 Create a ETL pipeline to train models
- [ ] 📌 Integrate `mlflow`[^1] for ML Model monitoring
- [x] 🛠️ Build the basics from [yt-watch-history] project
- [x] 🎨 Draw diagrams for references
- [x] ⛓️ How to intergrate **pre-trained** ML Model
- [x] 🤖 Build **Channel Recommender System**
- [x] 👷 Better CTT Model pipeline[docker-compose.yaml]: docker-compose.yaml
[yt-watch-history]: https://github.com/arv-anshul/yt-watch-history[^1]: CampusX is going to launch a free course on MLFlow. Nitish Sir announce this in his recent video.