{"id":15010658,"url":"https://github.com/agilecreativity/hn-scrapper","last_synced_at":"2026-03-14T08:34:41.832Z","repository":{"id":62432896,"uuid":"63548239","full_name":"agilecreativity/hn-scrapper","owner":"agilecreativity","description":"Collect the last 20 pages of Hacker News into one page","archived":false,"fork":false,"pushed_at":"2017-03-02T15:51:50.000Z","size":1775,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-19T13:26:01.333Z","etag":null,"topics":["clojure","hacker-news","news","scraper"],"latest_commit_sha":null,"homepage":"https://github.com/agilecreativity/hn-scrapper","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/agilecreativity.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-07-17T19:13:34.000Z","updated_at":"2017-03-02T15:51:51.000Z","dependencies_parsed_at":"2022-11-01T21:01:00.620Z","dependency_job_id":null,"html_url":"https://github.com/agilecreativity/hn-scrapper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agilecreativity%2Fhn-scrapper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agilecreativity%2Fhn-scrapper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agilecreativity%2Fhn-scrapper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agilecreativity%2Fhn-scrapper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/agilecreativity","download_url":"https://codeload.github.com/agilecreativity/hn-scrapper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243290993,"owners_count":20267825,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","hacker-news","news","scraper"],"created_at":"2024-09-24T19:35:14.045Z","updated_at":"2025-12-25T08:22:49.650Z","avatar_url":"https://github.com/agilecreativity.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"## hn-scrapper\n\n[![Clojars Project](https://img.shields.io/clojars/v/scrapper.svg)](https://clojars.org/scrapper)\n[![Dependencies Status](https://jarkeeper.com/agilecreativity/hn-scrapper/status.svg)](https://jarkeeper.com/agilecreativity/hn-scrapper)\n\nGet all of the latest links from [Hacker News](https://news.ycombinator.com/) into a single page.\n\n### Installation and basic usage as CLI\n\n#### Pre-requisites\n\n- [Java SDK](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)\n- [Leiningen](http://leiningen.org/#install)\n\n#### Installation\n\n```sh\n# Clone this repository locally\nmkdir -p ~/projects\n\ngit clone https://github.com/agilecreativity/hn-scrapper.git ~/projects/hn-scrapper\n\ncd ~/projects/hn-scrapper\n\n# Create the `~/bin` folder to hold the executable\nmkdir -p ~/bin\n\n# Generate the standalone using `lein bin`\nlein bin\n```\n\n#### Usage\n\nTo see the help just type\n\n```sh\n~/bin/hn-scrapper\n```\n\nThis should give you the help like\n\n```\nExtract the lastest Hacker News index to a single file\n\nUsage: hn-scrapper [options]\n  -p, --page-count PAGE-COUNT    20\n  -o, --output-file OUTPUT-FILE  hacker-news.md\n  -h, --help\nOptions:\n\n--p PAGE-COUNT  the number of pages to be extracted default to 20\n--o OUTPUT-FILE the output file name default to 'hacker-news.md'\n```\n\nNow get the list of all news from [Hacker News](https://news.ycombinator.com/news)\n\n```\n# Get only the first page from the site\n~/bin/hn-scrapper --page-count 1 --output-file hacker-news-front-page.md\n\n# Get all of the news (20 pages) using shorter option\n~/bin/hn-scrapper -p 20 -o hacker-news-top-20-pages.md\n```\n\n## Example Sessions and Outputs\n\n### Sample sessions\n\n![](https://github.com/agilecreativity/hn-scrapper/raw/master/doc/01-sample-session.gif)\n\n### Sample Markdown Output\n\n![](https://github.com/agilecreativity/hn-scrapper/raw/master/doc/02-markdown-output.png)\n\n### Sample Markdown Output view in Github's Gist\n\n![](https://github.com/agilecreativity/hn-scrapper/raw/master/doc/03-markdown-as-gist.png)\n\n### The actual result in Markdown format\n\n[Sample-markdown-output](doc/04-sample-markdown.md)\n\n## Features idea\n\n- Export/print first level content of hackernews to PDFs or Epubs\n- Group the results in some ways (topics, keywords, link to YouTube?)\n- Persist the result to html pages and store the link just once!\n\n## Useful Links\n\n- [reaver](https://github.com/mischov/reaver)\n- [jsoup](https://github.com/jhy/jsoup/)\n- [jsoup - selector syntax](https://jsoup.org/cookbook/extracting-data/selector-syntax)\n- [record screen as animated gif image](https://www.maketecheasier.com/record-screen-as-animated-gif-ubuntu/)\n\n## License\n\nCopyright © 2016 Burin Choomnuan\n\nDistributed under the Eclipse Public License either version 1.0 or (at\nyour option) any later version.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagilecreativity%2Fhn-scrapper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagilecreativity%2Fhn-scrapper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagilecreativity%2Fhn-scrapper/lists"}