https://github.com/secretdeveloperisme/comic-crawler

comic crawler
https://github.com/secretdeveloperisme/comic-crawler

Last synced: 11 months ago
JSON representation

comic crawler

Host: GitHub
URL: https://github.com/secretdeveloperisme/comic-crawler
Owner: secretdeveloperisme
Created: 2024-12-29T06:58:03.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-01T14:02:57.000Z (over 1 year ago)
Last Synced: 2025-05-07T11:33:10.005Z (about 1 year ago)
Language: Java
Size: 15.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Comic Website Crawler

## Motivation

When reading comics on a website, I encounter numerous visible and hidden advertisements.
Additionally, each chapter has few images. To address these inconveniences, I created a small program that crawls content from multiple chapters and displays the output in a single HTML file.
## How to use the program
### 1. Starting Image Proxy server
Because the target server restricts requests from untrusted hosts, it uses the `Referer` header to validate the request. However, the browser has a policy preventing modification of the `Referer` header before sending a request.
Therefore, I used an HTTP proxy server to modify and forward the request to the target server.

#### Execute the command to run server
```bash
java -jar .\imageproxy-1.0.0.jar
```

### 2. Crawling content of chapters
The server uses a rate-limiting method to prevent large-scale crawling of comic content.
Therefore, in the program, I have to limit the number of requests sent and retry sending a request if it detects too many requests.
#### Execute the command to start crawler program
```bash
java -jar .\comic-crawler-1.0-jar-with-dependencies.jar --start 1 --end 10
```
**How to use the crawler program**
```bash
Usage: ComicCrawler
-e,--end end chapter number
-s,--start start chapter number
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/secretdeveloperisme/comic-crawler

Awesome Lists containing this project

README