https://github.com/tuvimen/wordpress-madara-scraper

A bash script for scraping image focused wordpress madara extension sites
https://github.com/tuvimen/wordpress-madara-scraper

bash image-hoarding json reliq scraper wordpress-madara

Last synced: about 2 months ago
JSON representation

A bash script for scraping image focused wordpress madara extension sites

Host: GitHub
URL: https://github.com/tuvimen/wordpress-madara-scraper
Owner: TUVIMEN
License: gpl-3.0
Created: 2024-02-22T20:41:13.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2025-06-04T17:11:49.000Z (about 1 year ago)
Last Synced: 2025-07-23T06:45:00.462Z (11 months ago)
Topics: bash, image-hoarding, json, reliq, scraper, wordpress-madara
Language: Shell
Homepage:
Size: 43.9 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# wordpress-madara-scraper

A bash script for scraping image focused madara wordpress in json.

## Requirements

- [reliq](https://github.com/TUVIMEN/reliq)
- [jq](https://github.com/stedolan/jq)

## Installation

install -m 755 wordpress-madara-scraper /usr/bin

## Json format

Here's example of [comics](comics-example.json).

## Structure

There are two sets of options that define what will be downloaded, and are divided into those that download metadata and those thad download images.

### Metadata

Is downloaded by `-p`, `-c`, `-l`, `--full-comic` and `--full-pages`. Files created by them are named by the md5 hash of their urls.

`-p` takes `LINK` argument and outputs a list of urls to comics. This might be used to get all of the comics from the website, category or an artist.

`-c` takes `FILE` argument from which it reads urls to comics and saves them in json files.

`-l` takes `FILE` argument from which it reads urls to chapters and saves the list of urls to their images to files.

`--full-comic` takes `LINK` argument and downloads comic and its chapters creating a directory for its chapters named with its name with '_' character at the end.

`--full-pages` takes `LINK` argument and downloads all comics from pages using `--full-comic`.

Example structure created by `--full-pages`:

0001c692d6cadaa3c692412bc0ac51fe
0001c692d6cadaa3c692412bc0ac51fe_/
02c8e3f630d0cd48f13515f65a91fe3e
0ba18e4d9db640693a8584b01983b451
0df4a828f07137e21f585aa29375b223
008216d512f75bcb86e2a08c4df7ae8c
008216d512f75bcb86e2a08c4df7ae8c_/
091bf018a3e41cb974c20be4901ba89a
4e35d40ad644114a17e2995b30aa52fb

### Images

These options are meant for consumption purposes only, and are just a practical simplification of Metadata. Files created by them are named by their names with `/` character translated to `|`.

`--download-chapter` takes `LINK` as argument and downloads the images of the chapter
`--download-comic` takes `LINK` as argument and downloads the comic, its chapters and their images.
`--download-pages` takes `LINK` as argument and downloads all comics from pages using `--download-comic`

Example structure created by `--download-pages`:

+99 Wooden stick manhwa
+99 Wooden stick manhwa_/
Chapter 1/
ch_0_1.jpg
ch_0_2.jpg
ch_0_3.jpg
Chapter 89.5/
45.webp
46.webp
My School Life Pretending To Be a Worthless Person
My School Life Pretending To Be a Worthless Person_/
Chapter 1/
ch_0_1.jpg
ch_0_2.jpg
ch_0_3.jpg
Chapter 59/
13.webp
14.webp
15.webp

## Tested sites

https://manhwatop.com/
https://www.nightcomic.com/
https://shibamanga.com/
https://topmanhua.com/

## Usage

wordpress-madara-scraper [OPTIONS]...

Download the images of the chapter, comic, genre and the whole site

wordpress-madara-scraper --download-chapter 'https://manhwatop.com/manga/love-hug/chapter-233/'
wordpress-madara-scraper --download-comic 'https://manhwatop.com/manga/love-hug/'
wordpress-madara-scraper --download-pages 'https://manhwatop.com/manga-genre/magical-genre/'
wordpress-madara-scraper --download-pages 'https://manhwatop.com/'

Download the metadata of comic and the whole page

wordpress-madara-scraper --full-comic 'https://nightcomic.com/manga/versatile-mage/'
wordpress-madara-scraper --full-pages 'https://nightcomic.com/new/'

Download links to comics into FILE

wordpress-madara-scraper -p 'https://www.topmanhua.com' > FILE

Download comics from links in FILE using 4 threads into DIR, it will create json files named by md5 hash of their links

wordpress-madara-scraper -d DIR -t 4 -c FILE

Download images links from chapters in comics FILE into FILES named by md5 hash of their links

wordpress-madara-scraper -l FILE

Get some help

wordpress-madara-scraper -h

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tuvimen/wordpress-madara-scraper

Awesome Lists containing this project

README