Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/benfoxall/scrape
Git Scraping Hacker News
https://github.com/benfoxall/scrape
Last synced: 11 days ago
JSON representation
Git Scraping Hacker News
- Host: GitHub
- URL: https://github.com/benfoxall/scrape
- Owner: benfoxall
- License: mit
- Created: 2024-04-30T14:43:42.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-01-19T08:25:36.000Z (18 days ago)
- Last Synced: 2025-01-19T08:31:19.173Z (18 days ago)
- Language: HTML
- Homepage: http://benjaminbenben.com/scrape/
- Size: 24.4 MB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Recent News
This pulls the [HN](https://news.ycombinator.com/) front page to [hacker-news.html](hacker-news.html) and uses git log/show to access a history of changes.
See [git scraping](https://simonwillison.net/2020/Oct/9/git-scraping/) & [Flat Data](https://githubnext.com/projects/flat-data) for more info about the approach.
### Updating the data
```bash
export TARGET="hacker-news.html"curl https://news.ycombinator.com > $TARGET
git add $TARGET
git commit -m ":robot: scraped to $TARGET"
```This is run automatically by [.github/workflows/scrape.yml](.github/workflows/scrape.yml)
### Extracting file history
```bash
git log --pretty=format:"%H %at" -- "$TARGET" | while read commit timestr
do
git show "$commit:$TARGET" > tmp_${timestr}_${commit}.html
done
```