Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/polyrabbit/hacker-news-digest

:newspaper: Let ChatGPT Summarize Hacker News for You
https://github.com/polyrabbit/hacker-news-digest

chatgpt chatgpt-api crawler data-extraction extract-summaries hacker-news hacker-news-digest hacker-news-reader machine-learning news-aggregator openai openai-api python rss spider

Last synced: 3 months ago
JSON representation

:newspaper: Let ChatGPT Summarize Hacker News for You

Awesome Lists containing this project

README

        

[Let ChatGPT Summarize Hacker News for You](https://hackernews.betacat.io/)
==================

[![Github Pages](https://github.com/polyrabbit/hacker-news-digest/actions/workflows/static.yml/badge.svg)](https://github.com/polyrabbit/hacker-news-digest/actions/workflows/static.yml)
[![license](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://github.com/polyrabbit/hacker-news-digest/blob/master/LICENSE)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/polyrabbit/hacker-news-digest/pulls)
[![Hacker News](https://camo.githubusercontent.com/73322cbcbf1c517bb5d3d8d4e724f81091fc767ccc278b44f1ee1a1179e9ad38/68747470733a2f2f736869656c64732e696f2f62616467652f4861636b65722532304e6577732d6630363532663f6c6f676f3d79253230636f6d62696e61746f72267374796c653d666c61742d737175617265266c6f676f436f6c6f723d7768697465)](https://hackernews.betacat.io/)

> [中文版](https://blog.betacat.io/post/2023/06/summarize-hacker-news-by-chatgpt/)

[Hacker News Summary](https://hackernews.betacat.io/) leverages AI technology to extract summaries
and illustrations from [Hacker News](https://news.ycombinator.com/)
articles, providing a seamless news scanning experience.

Summaries are primarily generated by
ChatGPT [gpt-3.5-turbo](https://platform.openai.com/docs/models/gpt-3-5) model, and fallback to
local [GoogleT5](https://huggingface.co/t5-large) model when ChatGPT is not available.

## Features

* Clear and easily understandable summaries generated by our advanced AI assistant
* Relevant illustrations make articles easily scannable and visually engaging
* Common video sites, PDFs, and GitHub gists are seamlessly embedded
* Flexibility to sort articles based on their points, comment count, or publication time
* Filter the topN articles based on their points.
* RSS feeds fully supported ([#14](https://github.com/polyrabbit/hacker-news-digest/issues/14), [#19](https://github.com/polyrabbit/hacker-news-digest/issues/19))
* Local translation (Chinese)

## Talk is cheap, show me the screenshot!

![hn-summary](https://github.com/polyrabbit/hacker-news-digest/assets/2657334/cc08f770-5154-4c7e-8ba8-13c89f394b1f)

Emoji explained:

* ❤️: point - upvotes received from the Hacker News community
* 👤: user - Hacker News user who submitted this post
* 🕘: submission time - a human-readable time indicating when the post was submitted
* 💬: comment count - comments posted by the community, click to visit this comment page
* 🔗: source of the news - where the news originated
* 📰: summary model - which model is used to generate the summary, options
are `OpenAI`, `GoogleT5` and `Prefix`

## How it works

[Hacker News Summary](https://hackernews.betacat.io/) is a static site hosted on GitHub Pages. It
performs the following periodic actions:

1. Parsing the Hacker News page to obtain a list of news articles
2. Extracting the main content from each news article using
a [score algorithm](%5Btutorial%5D%20How-to-extract-main-content-from-web-pages-using-Machine-Learning.ipynb)
3. Finding the most suitable illustration for each article and making a local copy
4. Generating summaries of the article's content using OpenAI API or invoking a local model as a
fallback when the API is unavailable
5. Rendering a template that incorporates the illustrations and summaries, and deploying it to
GitHub Pages

## Localization

Translation is also performed by ChatGPT, with a single extra step in the prompt. Currently supported languages:

* [中文翻译](https://hackernews.betacat.io/zh.html)

## TODO

- [ ] A better way to scrap websites (maybe PhantomJS & Selenium)
- [ ] Also summarize comments ([see discussions on Hacker News](https://news.ycombinator.com/item?id=36260140))
- [ ] Switch to [Hacker News API](https://github.com/HackerNews/API)
- [ ] A more beautiful home page (maybe in HTML9)
- [ ] Discover an alternative local models for generating summaries
- [X] Sort articles by points/comments/time
- [X] Filter topN articles by points
- [X] RSS
- [X] Deploy on github pages
- [X] Have a good sleep !important