https://github.com/tuvimen/wordpress-madara-scraper
A bash script for scraping image focused wordpress madara extension sites
https://github.com/tuvimen/wordpress-madara-scraper
bash image-hoarding json reliq scraper wordpress-madara
Last synced: about 2 months ago
JSON representation
A bash script for scraping image focused wordpress madara extension sites
- Host: GitHub
- URL: https://github.com/tuvimen/wordpress-madara-scraper
- Owner: TUVIMEN
- License: gpl-3.0
- Created: 2024-02-22T20:41:13.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2025-06-04T17:11:49.000Z (about 1 year ago)
- Last Synced: 2025-07-23T06:45:00.462Z (11 months ago)
- Topics: bash, image-hoarding, json, reliq, scraper, wordpress-madara
- Language: Shell
- Homepage:
- Size: 43.9 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# wordpress-madara-scraper
A bash script for scraping image focused madara wordpress in json.
## Requirements
- [reliq](https://github.com/TUVIMEN/reliq)
- [jq](https://github.com/stedolan/jq)
## Installation
install -m 755 wordpress-madara-scraper /usr/bin
## Json format
Here's example of [comics](comics-example.json).
## Structure
There are two sets of options that define what will be downloaded, and are divided into those that download metadata and those thad download images.
### Metadata
Is downloaded by `-p`, `-c`, `-l`, `--full-comic` and `--full-pages`. Files created by them are named by the md5 hash of their urls.
`-p` takes `LINK` argument and outputs a list of urls to comics. This might be used to get all of the comics from the website, category or an artist.
`-c` takes `FILE` argument from which it reads urls to comics and saves them in json files.
`-l` takes `FILE` argument from which it reads urls to chapters and saves the list of urls to their images to files.
`--full-comic` takes `LINK` argument and downloads comic and its chapters creating a directory for its chapters named with its name with '_' character at the end.
`--full-pages` takes `LINK` argument and downloads all comics from pages using `--full-comic`.
Example structure created by `--full-pages`:
0001c692d6cadaa3c692412bc0ac51fe
0001c692d6cadaa3c692412bc0ac51fe_/
02c8e3f630d0cd48f13515f65a91fe3e
0ba18e4d9db640693a8584b01983b451
0df4a828f07137e21f585aa29375b223
008216d512f75bcb86e2a08c4df7ae8c
008216d512f75bcb86e2a08c4df7ae8c_/
091bf018a3e41cb974c20be4901ba89a
4e35d40ad644114a17e2995b30aa52fb
### Images
These options are meant for consumption purposes only, and are just a practical simplification of Metadata. Files created by them are named by their names with `/` character translated to `|`.
`--download-chapter` takes `LINK` as argument and downloads the images of the chapter
`--download-comic` takes `LINK` as argument and downloads the comic, its chapters and their images.
`--download-pages` takes `LINK` as argument and downloads all comics from pages using `--download-comic`
Example structure created by `--download-pages`:
+99 Wooden stick manhwa
+99 Wooden stick manhwa_/
Chapter 1/
ch_0_1.jpg
ch_0_2.jpg
ch_0_3.jpg
Chapter 89.5/
45.webp
46.webp
My School Life Pretending To Be a Worthless Person
My School Life Pretending To Be a Worthless Person_/
Chapter 1/
ch_0_1.jpg
ch_0_2.jpg
ch_0_3.jpg
Chapter 59/
13.webp
14.webp
15.webp
## Tested sites
https://manhwatop.com/
https://www.nightcomic.com/
https://shibamanga.com/
https://topmanhua.com/
## Usage
wordpress-madara-scraper [OPTIONS]...
Download the images of the chapter, comic, genre and the whole site
wordpress-madara-scraper --download-chapter 'https://manhwatop.com/manga/love-hug/chapter-233/'
wordpress-madara-scraper --download-comic 'https://manhwatop.com/manga/love-hug/'
wordpress-madara-scraper --download-pages 'https://manhwatop.com/manga-genre/magical-genre/'
wordpress-madara-scraper --download-pages 'https://manhwatop.com/'
Download the metadata of comic and the whole page
wordpress-madara-scraper --full-comic 'https://nightcomic.com/manga/versatile-mage/'
wordpress-madara-scraper --full-pages 'https://nightcomic.com/new/'
Download links to comics into FILE
wordpress-madara-scraper -p 'https://www.topmanhua.com' > FILE
Download comics from links in FILE using 4 threads into DIR, it will create json files named by md5 hash of their links
wordpress-madara-scraper -d DIR -t 4 -c FILE
Download images links from chapters in comics FILE into FILES named by md5 hash of their links
wordpress-madara-scraper -l FILE
Get some help
wordpress-madara-scraper -h