Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mynul-islam-madhurjo/anime-genre-classification
An Anime Genre Classifier capable of categorizing 50 different anime genres worldwide.
https://github.com/mynul-islam-madhurjo/anime-genre-classification
anime scraper
Last synced: about 1 month ago
JSON representation
An Anime Genre Classifier capable of categorizing 50 different anime genres worldwide.
- Host: GitHub
- URL: https://github.com/mynul-islam-madhurjo/anime-genre-classification
- Owner: mynul-islam-madhurjo
- Created: 2024-01-14T12:32:47.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-01-28T15:08:53.000Z (11 months ago)
- Last Synced: 2024-01-29T15:12:11.916Z (11 months ago)
- Topics: anime, scraper
- Language: Jupyter Notebook
- Homepage: https://anime-genre-classification.onrender.com/
- Size: 56.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Anime-Genre-Classification
A text classification model from data collection, model training, and deployment.
The model can classify 50 different types of anime genres
The keys of `Deployment\genre_types_encoded.json` shows the anime genres## Data Collection
Data was collected from a Anime Website Listing: https://myanimelist.net/topanime.php
The data collection process is divided into 2 steps:1. **Anime URL Scraping:** The anime urls were scraped with `anime_url_scraper.py` and the urls are stored along with anime title in `Data\anime_urls.csv`
2. **Anime Details Scraping:** Using the urls, anime description and genres are scraped with `anime_genre_scraper.py` and they are stored in `Data\anime_genre_details_merged.csv`In total, I scraped 8950 anime details
## Data Preprocessing
Initially there were *74* different genres in the dataset. After some analysis, I found out *50* of them are rare (probably custom genres by users). So, I removed those genres and then I have *50* genres. After that, I removed the description without any genres resulting in *8927* samples.
## Model Training
Finetuned a `distilrobera-base` model from HuggingFace Transformers using Fastai and Blurr. The model training notebook can be viewed [here](https://github.com/mynul-islam-madhurjo/Anime-Genre-Classification/blob/main/Notebooks/anime_multilabel_text_classification.ipynb)
## Model Compression and ONNX Inference
The trained model has a memory of 300+MB. I compressed this model using ONNX quantization and brought it under 80MB.
## Model Deployment
The compressed model is deployed to HuggingFace Spaces Gradio App. The implementation can be found in `deployment` folder or [here](https://huggingface.co/spaces/mynul-islam-madhurjo/Anime-Genre-Classifier)
## Web Deployment
Deployed a Flask App built to take description and show the genres as output. Check `live ` branch. The website is live [here](https://anime-genre-classification.onrender.com)