https://github.com/playshikiapp/parsers
Site parsers for PlayShikiApp
https://github.com/playshikiapp/parsers
Last synced: about 2 months ago
JSON representation
Site parsers for PlayShikiApp
- Host: GitHub
- URL: https://github.com/playshikiapp/parsers
- Owner: PlayShikiApp
- License: osl-3.0
- Created: 2019-06-07T18:49:16.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-04-30T19:59:31.000Z (about 2 years ago)
- Last Synced: 2025-01-14T08:52:26.960Z (4 months ago)
- Language: Python
- Homepage:
- Size: 321 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Parsers for PlayShikiApp
## Overview
This repository is aiming to provide an automated import of anime videos for [PlayShikiServer](https://github.com/PlayShikimoriApp/PlayShikiServer) or a compatible backend server. Everything below assumes you are already familiar with some basic REST stuff and running such server.## How exactly it's supposed to work
This repository tracks [shikimori](https://shikimori.one) ongoings and tries to find anime videos on other related resources.## Quick start
#### Clone this repo
This repo is configured as a submodule of PlayShikiServer:
```
cd PlayShikiServer
git clone https://github.com/PlayShikimoriApp/parsers
```
#### Install the dependencies:
```
cd parsers
pip3 install -r requirements.txt
pip3 install git+https://github.com/ChronoMonochrome/percache
cd ..
```Parsers use hardcoded pages containing ongoings:
```
mkdir -p ../ongoings
cp ongoings_07.06.2019.html ../ongoings/
```Above is the sample page for this guide.
It should be manually updated each day or each chosen period of time to track new ongoings.#### Start webscraping the stuff
In order not to hurt anyone, this repo tries to minimize amount of requests to the external anime-related sites by caching the scraped stuff. Cache is invalidated each day, but previously saved pages aren't removed automatically, so it can quickly grow in a size.From python3 interpreter shell run (assuming you're in a toplevel directory, e.g. PlayShikiServer):
```
>>> from parsers import playshikiapp
>>> playshikiapp.save(playshikiapp.find_animes(), format = "sql")
```This can take a while before all ongoings are fetched.
At finish, this script will produce "ongoings.sql" file in the current directory. Another supported format is pkl (a raw dump of Python object to a file).#### Retrieve some info about an ongoing:
```
>>> from parsers import ongoings
>>> ongoings.main()>>> ongoings.ONGOING_IDS[0]
38524
>>> ongoings.get_ongoing_info(38524)
{'Тип:': 'TV Сериал', 'Эпизоды:': '7 / 10', 'Следующий эпизод:': '16 июня 18:10', 'Длительность эпизода:': ['23 мин.'], 'Статус:': ['\xa0с 29 апр. 2019 г.'], 'Жанры:': ['Action', 'Military', 'Mystery', 'Super Power', 'Drama', 'Fantasy', 'Shounen'], 'Рейтинг:': ['R-17'], 'Альтернативные названия:': ['···'], 'episodes_available': 7, 'episodes_total': 10, 'next_episode': '16.06.2019', 'type': 'tv', 'date_created': '29.04.2019', 'anime_english': 'Shingeki no Kyojin Season 3 Part 2', 'anime_russian': 'Вторжение гигантов 3. Вторая часть'}
```### Supported external sites
For now this script only supports fetching episodes from smotret-anime-365.ru . It shouldn't be hard to add some other sites like sovetromantica.com . Some work on this already done by [AltWatcher](https://openuserjs.org/scripts/Lolec/Alt_Watcher_v3) extension (which makes a GET request to the internal sites' search engines).