Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/natlee/myanimelist-comment-crawler

Crawl all reviews and infomation of Anime works on MyAnimeList. ;)
https://github.com/natlee/myanimelist-comment-crawler

anime crawler data-analysis data-mining data-science kaggle kaggle-dataset myanimelist python requests scrapy-crawler sqlite

Last synced: 2 days ago
JSON representation

Crawl all reviews and infomation of Anime works on MyAnimeList. ;)

Awesome Lists containing this project

README

        




# MyAnimeList Crawler
This crawler can crawl all work information and reviews from `MyAnimeList` by using `Scrapy`.

## Usage

1. Ensure your settings in `./setting.ini`.
2. Install the required dependencies.

```bash
pip install -r requirements.txt
```

## Usage

### Crawl Anime Infomation

```bash
scrapy runspider info_spider.py --nolog
```

- Example of crawling data
```json
{
"workId": "11979",
"url": "https://myanimelist.net/anime/11979/Mahou_Shoujo_Madoka%E2%98%85Magica_Movie_2__Eien_no_Monogatari",
"jpName": "劇場版 魔法少女まどか☆マギカ 永遠の物語",
"engName": "Puella Magi Madoka Magica the Movie Part 2: Eternal",
"synonymsName": "Mahou Shoujo Madoka Magika Movie 2, Magical Girl Madoka Magica Movie 2",
"workType": "Movie",
"episodes": "1",
"status": "Finished Airing",
"aired": "Oct 13, 2012",
"premiered": "",
"producer": "Aniplex, Mainichi Broadcasting System, Movic, Nitroplus, Houbunsha",
"broadcast": "",
"licensors": "Aniplex of America",
"studios": "Shaft",
"genres": "Drama",
"source": "Original",
"duration": "1 hr. 49 min.",
"rating": "PG-13 - Teens 13 or older",
"score": "8.37",
"allRank": "#197",
"popularityRank": "#1132",
"members": "195,001",
"favorites": "1,026",
"scoredByUser": "97097",
"lastUpdate": "2023-05-31 13:39:02"
}
```

### Crawl Reviews

> Need crawl information of works at first because we need the list of Anime works in `myanimelist`.

```bash
scrapy runspider review_spider.py --nolog
```

## Link

The dataset is put on Kaggle.

- Version 1 (2006/11 to 2019/06)

[MALCoD](https://www.kaggle.com/natlee/myanimelist-comment-dataset)

Contains 130K commnets.

After many years, the site updated. So I refactored this code to fit the new version of MyAnimeList.

- Version 2 (2006/11 to 2023/06)

[MALCoDv2](https://www.kaggle.com/natlee/myanimelist-comment-dataset-v2)

Contains 220K commnets.

You can obtain the data from your SQLite database by using the following command.
```bash
python db_to_kaggle.py
```

## Misc

I recommend using [VisiData](https://www.visidata.org/) to see the SQLite database and the CSV files.

![](https://d33wubrfki0l68.cloudfront.net/a2039fda848c76b90ee0270854cd417a82bbd60e/0b350/img/woq9dm5llq-590.webp)

It can see details of the structured data in CLI.

Just use `sudo apt install VisiData` to get the package.

In our case, you can use `vd anime.db` to get a view with the SQLite database.

## Contributor



Nat Lee
Nat Lee


## LICENSE

[MIT](LICENSE)