https://github.com/harr1424/scrape_blogger
Recursively crawls and scrapes a Blogger site to archive post content
https://github.com/harr1424/scrape_blogger
blogger rust rust-scraping web-scraper
Last synced: 4 months ago
JSON representation
Recursively crawls and scrapes a Blogger site to archive post content
- Host: GitHub
- URL: https://github.com/harr1424/scrape_blogger
- Owner: harr1424
- Created: 2024-08-30T00:08:55.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-04T22:15:09.000Z (6 months ago)
- Last Synced: 2025-06-01T01:08:59.774Z (4 months ago)
- Topics: blogger, rust, rust-scraping, web-scraper
- Language: Rust
- Homepage:
- Size: 182 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## scrape_blogger
```text
Usage: scrape_blogger [OPTIONS]Options:
-t, --threads Sets the number of threads to use when scraping all post links [default: 4]
-r, --recent-only Scrapes only recent posts from the blog homepage without clicking 'Older Posts'
-h, --help Print help
-V, --version Print version
```Recurisvely crawl and scrape a specific Blogger site in order to archive post content. This project may not generalize well to all Blogger sites. It is hardcoded to work with a specific site, but the source code may be modified to work with any English Blogger site where the site's homepage has a link to older posts.