https://github.com/barrettotte/anilist-ml

Training a binary classifier model to predict if I would recommend an anime using my Anilist user data.
https://github.com/barrettotte/anilist-ml

anilist binary-classification data-visualization machine-learning scikit-learn

Last synced: about 2 months ago
JSON representation

Training a binary classifier model to predict if I would recommend an anime using my Anilist user data.

Host: GitHub
URL: https://github.com/barrettotte/anilist-ml
Owner: barrettotte
License: mit
Created: 2022-09-17T23:40:07.000Z (almost 4 years ago)
Default Branch: master
Last Pushed: 2022-10-05T18:25:26.000Z (almost 4 years ago)
Last Synced: 2025-07-16T22:18:59.496Z (12 months ago)
Topics: anilist, binary-classification, data-visualization, machine-learning, scikit-learn
Language: Jupyter Notebook
Homepage:
Size: 25.5 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # anilist-ml

Learning the basics of machine learning with Anilist data

## Summary

I originally set out to make a model trained on my Anilist data that was able

to predict the score (1-10) that I would probably give an anime.

At first I tried to make a regression model. I realized since I was only rating 1-10

I would probably have more accuracy using a classifier model. Classifier models were definitely more accurate.

But, the best I was able to pull off was a random forest classifier with an accuracy of ~38%.

I think my data is subpar since I only have ~500 data points and generally skewed data.

I tend to give an anime a score of 7 more than anything else and also tend to not be very critical.

So, entries with a score below 5 are very rare and I think it makes training more difficult.

After messing around for a while I decided to switch to a binary classification and change my target.

Would I recommend or not recommend an anime?

With this change, my final result was a random forest classifier with an accuracy of ~84-85%.

Someone knowledgeable in machine learning could probably point out what I was doing wrong immediately.

But, oh well this was my first ML project and I have a lot to learn. 

This wasn't exactly a win, but hopefully the next ML project goes better.

## Results

The final model is located in [model.ipynb](model.ipynb).

```txt

Log loss = 4.934196584711905

ROC AUC Score = 0.8524492234169653

Classification Report:

              precision    recall  f1-score   support

       False   0.826087  0.612903  0.703704        31

        True   0.865169  0.950617  0.905882        81

    accuracy                       0.857143       112

   macro avg   0.845628  0.781760  0.804793       112

weighted avg   0.854351  0.857143  0.849922       112

```

ROC Plot

![roc.png](roc.png)

### Notebooks

1. [fetch.ipynb](fetch.ipynb) - Fetches user and anime data using Anilist's GraphQL API.

2. [clean.ipynb](clean.ipynb) - Cleans fetched data to prepare for data visualization and model training.

3. [explore.ipynb](explore.ipynb) - Some data visualizations and general exploration of data.

4. [model-select-reg.ipynb](model-select-reg.ipynb) - Testing out different regression models.

5. [model-select-cls.ipynb](model-select-cls.ipynb) - Testing out different classifier models.

6. [model.ipynb](model-final.ipynb) - Final model training, verification, and evaluation.

## Data

- fetched

  - [data/anime-YYYYMMDD-raw.csv](data/anime-20220927-raw.csv) - raw Anilist anime data

  - [data/user-YYYYMMDD-raw.csv](data/user-20220927-raw.csv) - raw Anilist user data

- cleaned/enriched

  - [data/anime-YYYYMMDD-clean.csv](data/anime-20220927-clean.csv) - cleaned anime data; usesless columns dropped and missing data filled

  - [data/user-YYYYMMDD-clean.csv](data/user-20220927-clean.csv) - cleaned user data; useless columns dropped

  - [data/user-YYYYMMDD-enriched.csv](data/user-20220927-enriched.csv) - user data joined with anime data

- regression

  - [data/user-YYYYMMDD-reg-train.csv](data/user-20220927-reg-train.csv) - train data for regression models

  - [data/user-YYYYMMDD-reg-valid.csv](data/user-20220927-reg-valid.csv) - validation data for regression models

  - [data/user-YYYYMMDD-reg-test.csv](data/user-20220927-reg-test.csv) - test data for regression models

- classification

  - [data/user-YYYYMMDD-cls-train.csv](data/user-20220927-cls-train.csv) - train data for classifier models

  - [data/user-YYYYMMDD-cls-valid.csv](data/user-20220927-cls-valid.csv) - validation data for classifier models

  - [data/user-YYYYMMDD-cls-test.csv](data/user-20220927-cls-test.csv) - test data for classifier models

I also made the Anilist anime data (as of 09/27/2022) available on 

[Kaggle](https://www.kaggle.com/datasets/barrettotte/anilistanimedata).

## References

- [Anilist Interactive GraphQL Tool](https://anilist.co/graphiql)

- [Anilist GraphQL Documentation Explorer](https://anilist.github.io/ApiV2-GraphQL-Docs/)

- [Coursera Machine Learning Specialization (Andrew Ng)](https://www.coursera.org/specializations/machine-learning-introduction)

- [Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow. Geron](https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1492032646)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/barrettotte/anilist-ml

Awesome Lists containing this project

README