Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/natlee/myanimelist-comment-crawler
Crawl all reviews and infomation of Anime works on MyAnimeList. ;)
https://github.com/natlee/myanimelist-comment-crawler
anime crawler data-analysis data-mining data-science kaggle kaggle-dataset myanimelist python requests scrapy-crawler sqlite
Last synced: 2 days ago
JSON representation
Crawl all reviews and infomation of Anime works on MyAnimeList. ;)
- Host: GitHub
- URL: https://github.com/natlee/myanimelist-comment-crawler
- Owner: NatLee
- License: mit
- Created: 2019-08-03T19:51:46.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-06-21T02:23:46.000Z (over 1 year ago)
- Last Synced: 2024-11-21T03:39:41.642Z (2 months ago)
- Topics: anime, crawler, data-analysis, data-mining, data-science, kaggle, kaggle-dataset, myanimelist, python, requests, scrapy-crawler, sqlite
- Language: Python
- Homepage:
- Size: 50.8 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MyAnimeList Crawler
This crawler can crawl all work information and reviews from `MyAnimeList` by using `Scrapy`.## Usage
1. Ensure your settings in `./setting.ini`.
2. Install the required dependencies.```bash
pip install -r requirements.txt
```## Usage
### Crawl Anime Infomation
```bash
scrapy runspider info_spider.py --nolog
```- Example of crawling data
```json
{
"workId": "11979",
"url": "https://myanimelist.net/anime/11979/Mahou_Shoujo_Madoka%E2%98%85Magica_Movie_2__Eien_no_Monogatari",
"jpName": "劇場版 魔法少女まどか☆マギカ 永遠の物語",
"engName": "Puella Magi Madoka Magica the Movie Part 2: Eternal",
"synonymsName": "Mahou Shoujo Madoka Magika Movie 2, Magical Girl Madoka Magica Movie 2",
"workType": "Movie",
"episodes": "1",
"status": "Finished Airing",
"aired": "Oct 13, 2012",
"premiered": "",
"producer": "Aniplex, Mainichi Broadcasting System, Movic, Nitroplus, Houbunsha",
"broadcast": "",
"licensors": "Aniplex of America",
"studios": "Shaft",
"genres": "Drama",
"source": "Original",
"duration": "1 hr. 49 min.",
"rating": "PG-13 - Teens 13 or older",
"score": "8.37",
"allRank": "#197",
"popularityRank": "#1132",
"members": "195,001",
"favorites": "1,026",
"scoredByUser": "97097",
"lastUpdate": "2023-05-31 13:39:02"
}
```### Crawl Reviews
> Need crawl information of works at first because we need the list of Anime works in `myanimelist`.
```bash
scrapy runspider review_spider.py --nolog
```## Link
The dataset is put on Kaggle.
- Version 1 (2006/11 to 2019/06)
[MALCoD](https://www.kaggle.com/natlee/myanimelist-comment-dataset)Contains 130K commnets.
After many years, the site updated. So I refactored this code to fit the new version of MyAnimeList.
- Version 2 (2006/11 to 2023/06)
[MALCoDv2](https://www.kaggle.com/natlee/myanimelist-comment-dataset-v2)Contains 220K commnets.
You can obtain the data from your SQLite database by using the following command.
```bash
python db_to_kaggle.py
```## Misc
I recommend using [VisiData](https://www.visidata.org/) to see the SQLite database and the CSV files.
![](https://d33wubrfki0l68.cloudfront.net/a2039fda848c76b90ee0270854cd417a82bbd60e/0b350/img/woq9dm5llq-590.webp)
It can see details of the structured data in CLI.
Just use `sudo apt install VisiData` to get the package.
In our case, you can use `vd anime.db` to get a view with the SQLite database.
## Contributor
## LICENSE
[MIT](LICENSE)