Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ukrbublik/malscan

MyAnimeList scanner, for recommender systems
https://github.com/ukrbublik/malscan

Last synced: about 1 month ago
JSON representation

MyAnimeList scanner, for recommender systems

Awesome Lists containing this project

README

        

# MyAnimeList scanner #

### About
Scans MAL site for data need for recommeder system [You Can (Not) Recomend](https://github.com/ukrbublik/You-Can-Not-Recommend).

For speed-up scans in parallel:

- Each scanner instance can perform several http requests at time (in queue).
- Several scanner instances can safely run together from many processes and PCs.
- Can use different "data providers" - parsing MAL site directly and with proxies, unofficial MAL API servers (see `mal_api_server`). See classes `MalDataProvider` -> `MalParser`, `MalApiClient`.

Safe parallelization is implemented with help of redis transactions (see class `MalBaseScanner`).

Scanned data is saved to PostgreSQL db (see `data/db-schema.sql`, class `MalDataProcesser`).

### Using
Install PostgreSQL db schema `data/db-schema.sql`

Set options in `config/config-scanner.js`

Run `node index.js`

Add manually tasks to redis: `rpush mal.queuedTasks `

See progress at cmd logs

### Tasks
List of tasks to grab only new data:

- `GenresOnce` - grab genres, once
- `Animes_New` - grab new animes
- `AnimesUserrecs_New` - grab users' anime-to-anime recommendations
- `UserLogins_New` - grab user id <-> login pairs
- `UserLists_New` - grab user lists, only for users with never checked yet list
- `UserProfiles_New` - grab user profile data, only for users with never checked yet profile

List of tasks to check udpates:

- `UserListsUpdated_Active` - check updates of active user lists, run frequently
- `UserListsUpdated_WithoutList` - check appearing of user lists, run rarely
- `UserListsUpdated_NonActive` - check updates of nonactive user lists, run rarely
- `UserLists_Updated` - grab updated user lists, after `UserListsUpdated_*`
- `AnimesUserrecs_All` - regrab users' anime-to-anime recommendations, run it rarely, like once in week..
- `UserProfiles_All` - just to update favs, run it very rarely!
- `Animes_All` - just to check possible updates of genres, relations, run it very rarely!

Special tasks to fix possible problems with logins swaps, will be added automatically:

- `SpUserLogins_Re`
- `UserProfiles_Re`

### Todo
Adding tasks from timer