https://github.com/harr1424/scrape_blogger

Recursively crawls and scrapes a Blogger site to archive post content
https://github.com/harr1424/scrape_blogger

blogger rust rust-scraping web-scraper

Last synced: 4 months ago
JSON representation

Recursively crawls and scrapes a Blogger site to archive post content

Host: GitHub
URL: https://github.com/harr1424/scrape_blogger
Owner: harr1424
Created: 2024-08-30T00:08:55.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-04-04T22:15:09.000Z (6 months ago)
Last Synced: 2025-06-01T01:08:59.774Z (4 months ago)
Topics: blogger, rust, rust-scraping, web-scraper
Language: Rust
Homepage:
Size: 182 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## scrape_blogger

```text
Usage: scrape_blogger [OPTIONS]

Options:
-t, --threads Sets the number of threads to use when scraping all post links [default: 4]
-r, --recent-only Scrapes only recent posts from the blog homepage without clicking 'Older Posts'
-h, --help Print help
-V, --version Print version
```

Recurisvely crawl and scrape a specific Blogger site in order to archive post content. This project may not generalize well to all Blogger sites. It is hardcoded to work with a specific site, but the source code may be modified to work with any English Blogger site where the site's homepage has a link to older posts.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/harr1424/scrape_blogger

Awesome Lists containing this project

README