https://github.com/agilecreativity/hn-scrapper
Collect the last 20 pages of Hacker News into one page
https://github.com/agilecreativity/hn-scrapper
clojure hacker-news news scraper
Last synced: 3 months ago
JSON representation
Collect the last 20 pages of Hacker News into one page
- Host: GitHub
- URL: https://github.com/agilecreativity/hn-scrapper
- Owner: agilecreativity
- License: epl-1.0
- Created: 2016-07-17T19:13:34.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2017-03-02T15:51:50.000Z (over 9 years ago)
- Last Synced: 2025-01-19T13:26:01.333Z (over 1 year ago)
- Topics: clojure, hacker-news, news, scraper
- Language: Clojure
- Homepage: https://github.com/agilecreativity/hn-scrapper
- Size: 1.69 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
## hn-scrapper
[](https://clojars.org/scrapper)
[](https://jarkeeper.com/agilecreativity/hn-scrapper)
Get all of the latest links from [Hacker News](https://news.ycombinator.com/) into a single page.
### Installation and basic usage as CLI
#### Pre-requisites
- [Java SDK](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)
- [Leiningen](http://leiningen.org/#install)
#### Installation
```sh
# Clone this repository locally
mkdir -p ~/projects
git clone https://github.com/agilecreativity/hn-scrapper.git ~/projects/hn-scrapper
cd ~/projects/hn-scrapper
# Create the `~/bin` folder to hold the executable
mkdir -p ~/bin
# Generate the standalone using `lein bin`
lein bin
```
#### Usage
To see the help just type
```sh
~/bin/hn-scrapper
```
This should give you the help like
```
Extract the lastest Hacker News index to a single file
Usage: hn-scrapper [options]
-p, --page-count PAGE-COUNT 20
-o, --output-file OUTPUT-FILE hacker-news.md
-h, --help
Options:
--p PAGE-COUNT the number of pages to be extracted default to 20
--o OUTPUT-FILE the output file name default to 'hacker-news.md'
```
Now get the list of all news from [Hacker News](https://news.ycombinator.com/news)
```
# Get only the first page from the site
~/bin/hn-scrapper --page-count 1 --output-file hacker-news-front-page.md
# Get all of the news (20 pages) using shorter option
~/bin/hn-scrapper -p 20 -o hacker-news-top-20-pages.md
```
## Example Sessions and Outputs
### Sample sessions

### Sample Markdown Output

### Sample Markdown Output view in Github's Gist

### The actual result in Markdown format
[Sample-markdown-output](doc/04-sample-markdown.md)
## Features idea
- Export/print first level content of hackernews to PDFs or Epubs
- Group the results in some ways (topics, keywords, link to YouTube?)
- Persist the result to html pages and store the link just once!
## Useful Links
- [reaver](https://github.com/mischov/reaver)
- [jsoup](https://github.com/jhy/jsoup/)
- [jsoup - selector syntax](https://jsoup.org/cookbook/extracting-data/selector-syntax)
- [record screen as animated gif image](https://www.maketecheasier.com/record-screen-as-animated-gif-ubuntu/)
## License
Copyright © 2016 Burin Choomnuan
Distributed under the Eclipse Public License either version 1.0 or (at
your option) any later version.