https://github.com/nighttrax/hackernews-scraper
Scrape hacker news comments and posts
https://github.com/nighttrax/hackernews-scraper
Last synced: 6 months ago
JSON representation
Scrape hacker news comments and posts
- Host: GitHub
- URL: https://github.com/nighttrax/hackernews-scraper
- Owner: NiGhTTraX
- License: bsd-2-clause
- Created: 2014-03-13T10:14:15.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2019-10-10T05:17:12.000Z (almost 6 years ago)
- Last Synced: 2025-04-11T03:08:02.933Z (6 months ago)
- Language: Python
- Size: 57.6 KB
- Stars: 14
- Watchers: 3
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
hackernews-scraper
==================Scrape [hacker news](https://news.ycombinator.com) comments and posts
using the [Algolia API](http://hn.algolia.com/api/).Usage
=====```python
from hackernews_scraper import CommentScraperCommentScraper.getComments(since=1394039447)
```The above will return a generator that will yield one comment at a time.
It will keep on going until there are no more comments to fetch, or until
it reaches the 50 pages limit set by hacker news. In the latter case, a
`TooManyItemsException` will be raised.If the hacker news API response is missing any required fields, the scraper
will raise `KeyError`.Response format
===============Comments:
```
{
'author': u'dhmholley',
'comment_id': u'7531026',
'comment_text': u'Are people still blowing this whistle?...',
'created_at': u'2014-04-04T12:57:38.000Z',
'parent_id': 7530853,
'points': 1,
'story_id': None,
'story_title': None,
'story_url': None,
'timestamp': 1396616258,
'title': None,
'url': None
}
```Stories:
```
{
'author': u'sethco',
'created_at': u'2014-04-04T12:56:23.000Z',
'objectID': None,
'points': 1,
'story_text': 1,
'timestamp': 1396616183,
'title': u'Opower IPO today',
'url': u'http://www.businesswire.com/news/home/20140403006541/en#.Uz4cbq1dVih'
}
```Testing
=======You need to have [httpretty](https://github.com/gabrielfalcao/HTTPretty)
and [factory-boy](https://github.com/rbarrois/factory_boy) installed.Run `nosetests` in the root folder or the `tests` folder.