Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/khanhtc1202/animu-crawling-system
animu crawling system
https://github.com/khanhtc1202/animu-crawling-system
anime crawling media-server rss-listener
Last synced: 11 days ago
JSON representation
animu crawling system
- Host: GitHub
- URL: https://github.com/khanhtc1202/animu-crawling-system
- Owner: khanhtc1202
- Created: 2018-01-12T09:50:33.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-06-02T07:29:57.000Z (over 6 years ago)
- Last Synced: 2024-10-28T07:31:28.359Z (about 2 months ago)
- Topics: anime, crawling, media-server, rss-listener
- Language: JavaScript
- Homepage:
- Size: 343 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Animu Crawling System
## System's features
1. Looking for new animu videos from rss
2. Auto clone animu videos via torrent
3. Media serve on browser## System's struct
```bash
.
├── README.md
├── dummy_files
├── pacman
├── logs
├── resources
├── scripts
├── server
└── media
```1. `pacman` => rss listener, config rss list on resources/feeds.json
2. `logs` => rss listener's log
3. `resources` => store rss list
4. `scripts` => system's scripts (such as download script)
5. `server` => media server
6. `media` => store videos
7. `dummy_files` => store *## How to run
### Require packages
#### For `pacman` rss listener
This service performed by python2.7 using libs:
1. requests (install via `pip`)
2. python-bs4 (install via package manger, such as `apt`)
3. logging (install via `pip`)Install python libs via old version of `pip` may harm security problems. Please update your `pip` to newest version to avoid that kind of errors.
#### For `server` media
This service performed by node js. Just run `npm install` on the first time run this project for installing dependencies packages.
For using `npm start` command, please install [nodemon](https://www.npmjs.com/package/nodemon) package globally to your host.
#### For `scripts` download videos via torrent
Install `aria2` package via your package manager such as `apt`. More informations go [here](https://aria2.github.io/)
### Add new rss
Rss list stored at `resources/feeds.json`.
Sample config:
```json
{
"title": "FEED LIST",
"data": [
{
"team": "fuyu",
"rss": "https://www.fuyufs.com/episode/feed",
"anchor": {
"tag": "a",
"css_selector": {"data-key":"quality_720p_torrent"}
}
},
{
"team": "HorribleSubs",
"rss": "https://nyaa.si/?page=rss&u=HorribleSubs",
"anchor": false
}
]
}
```For each rss, if `link` field from `xml` code not contain url for .torrent file, you must specific `anchor` field on rss config to point at tag that have .torrent file download link.
### Run system
Run 3 services one by one
> Pacman service - RSS listener
```bash
$ cd pacman
$ python main.py
```> Media service
If you installed `nodemon`
```bash
$ cd server
$ npm start
```else
```bash
$ cd server
$ node app.js
```> Monitoring service (optional)
```bash
$ cd scripts
$ watch -n 10 "./status.sh"
```### Endpoint list
With `xxx.yyy` is your host
> http://xxx.yyy:3000/
=> Greeting endpoint
> http://xxx.yyy:3000/videos
=> Cloned videos list
> http://xxx.yyy:3000/logs
=> Crawling service log
> http://xxx.yyy:3000/monitor
=> Crawling system status (for checking hard disk capacity, system status, etc)