https://github.com/ryanlewis/hn-summaries
AI-summarized RSS feed of Hacker News's best stories β each entry summarizes the article plus the HN discussion, with links to both. Live at hn.rlew.io
https://github.com/ryanlewis/hn-summaries
ai anthropic claude hacker-news hackernews llm news nodejs rss rss-feed summarization typescript
Last synced: about 19 hours ago
JSON representation
AI-summarized RSS feed of Hacker News's best stories β each entry summarizes the article plus the HN discussion, with links to both. Live at hn.rlew.io
- Host: GitHub
- URL: https://github.com/ryanlewis/hn-summaries
- Owner: ryanlewis
- Created: 2026-06-20T23:33:04.000Z (12 days ago)
- Default Branch: main
- Last Pushed: 2026-06-21T00:27:55.000Z (12 days ago)
- Last Synced: 2026-06-21T02:09:54.777Z (12 days ago)
- Topics: ai, anthropic, claude, hacker-news, hackernews, llm, news, nodejs, rss, rss-feed, summarization, typescript
- Language: TypeScript
- Homepage: https://hn.rlew.io
- Size: 49.8 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# hn-summaries
An AI-summarized RSS feed of [Hacker News's "best"](https://news.ycombinator.com/best) stories. Every entry is a short summary of the **article *and* the HN discussion**, with links to both β a drop-in upgrade over [`hnrss.org/best`](https://hnrss.org/best) that tells you what a story is about before you click.
**π Live:** **** (paste into your RSS reader) Β· landing page at ****
---
## Query parameters
| Param | Default | Notes |
|---|---|---|
| `?sort=date\|points` | `date` | `date` = newest summary first, a rolling stream that keeps stories for a few days after they leave the best list. `points` = the live HN best-list rank (on-list only); a story drops out the moment it leaves the list, and each item is labelled with its rank and flagged when near the bottom. |
| `?count=N` | `30` | How many stories to include (max `200`). |
| `?min_points=N` | `0` | Only include stories with at least N points. |
Examples: [`/feed?sort=points`](https://hn.rlew.io/feed?sort=points), [`/feed?count=10`](https://hn.rlew.io/feed?count=10), [`/feed?min_points=300`](https://hn.rlew.io/feed?min_points=300), `/feed?sort=points&count=15&min_points=200`.
## How it works
```mermaid
flowchart TD
HN["HN Firebase API"] --> Fetch["fetch best IDs + stories + top comments"]
Fetch --> Extract["fetch & extract article text
(Readability/jsdom)"]
Extract -->|"non-HTML / paywall / no URL"| Fallback["fall back to the discussion"]
Extract --> Summarize["summarize
(exe.dev ChatGPT/Codex proxy β gpt-5.5)"]
Fallback --> Summarize
Summarize --> Cache["JSON cache
(data/cache.json)"]
Cache --> Feed["/feed (RSS 2.0)"]
Cache --> Landing["/ (HTML landing)"]
```
A single long-running Bun process refreshes the best list **hourly**, summarizing only stories it hasn't seen before, and serves the feed from an in-memory + on-disk cache. A story that temporarily drops off the best list keeps its summary, so it isn't re-summarized when it bounces back; it's dropped once it's been off the list past the retention window (`OFFLIST_RETENTION_MS`). A hard ceiling (`MAX_CACHE_STORIES`) caps total cache size as a backstop β on-list stories are never evicted, the oldest off-list summaries go first.
Article text is extracted in tiers: a plain fetch + [Readability](https://github.com/mozilla/readability), then β only on a recoverable failure β a headless-browser render (Chromium via `Bun.WebView`) for JS-heavy pages, and finally a discussion-only fallback. Stories stuck on the fallback are re-extracted on later cycles (a bounded self-healing pass), so a page that was transiently down or needs JS recovers without a manual nudge.
Summaries are generated through the exe.dev internal proxies, which authenticate the VM automatically β **no API key is stored anywhere**. Two backends are selectable via `SUMMARY_PROVIDER`: the [ChatGPT/Codex proxy](https://exe.dev/docs/integrations-github) (`gpt-5.5`, default β draws on the ChatGPT subscription rather than the metered token allowance) or the [LLM gateway](https://exe.dev/docs/shelley/llm-gateway) (`claude-sonnet-4-6`).
### Endpoints
| Path | Description |
|---|---|
| `/feed` | RSS 2.0 feed (`?sort`, `?count`, `?min_points`). Also `/feed.xml`. |
| `/` | HTML landing page: usage + latest 5 stories, with a Newest/Top-by-points toggle (`?sort`). |
| `/healthz` | Liveness + cached story count. |
| `/status` | Last refresh time + duration, next-refresh ETA, cache size (total / on-list / off-list / cap), last prune + eviction counts, last error, and a fallback breakdown (count/percent + tally by reason). |
## Running locally
Requires [Bun](https://bun.sh) β₯1.3.12 (pinned to 1.3.14 β `Bun.WebView` powers the browser extraction tier). Bun runs the TypeScript directly: no build step, no bundler, no `tsx`. Summarization needs to run on an exe.dev VM (for the keyless proxies) β or point the endpoints at your own OpenAI/Anthropic-compatible services. The browser tier additionally needs a Chrome/Chromium binary β install one with `bun run install-browser` (Playwright's Chromium), put one on `$PATH`, or point `BUN_CHROME_PATH` at it; the app auto-resolves whichever it finds at startup. Disable the tier with `BROWSER_FALLBACK_ENABLED=false`.
```bash
bun install
bun start # bun index.ts β serves on :8000, runs the first refresh on boot
bun run typecheck # tsc --noEmit
```
The first boot summarizes the full best list (~200 stories, a few minutes); `/feed` returns `503` until the cache has entries. The cache persists to `data/cache.json` (gitignored), so restarts are instant.
### Configuration
Environment variables:
| Var | Default | Purpose |
|---|---|---|
| `PORT` | `8000` | Listen port. |
| `PUBLIC_URL` | `https://hn.rlew.io` | Canonical origin used in the feed's self-link and the landing page. |
| `SUMMARY_PROVIDER` | `openai-responses` | Backend: `openai-responses` (ChatGPT/Codex proxy) or `anthropic` (LLM gateway). |
| `OPENAI_ENDPOINT` / `OPENAI_MODEL` | ChatGPT proxy Β· `gpt-5.5` | Used when provider is `openai-responses`. |
| `LLM_ENDPOINT` / `LLM_MODEL` | LLM gateway Β· `claude-sonnet-4-6` | Used when provider is `anthropic`. |
Everything else β refresh interval, concurrency, article-size caps, per-refresh cost cap, off-list retention, cache size cap, comment count β lives in [`src/config.ts`](src/config.ts).
## Project layout
```
index.ts entrypoint: start server, refresh on boot, schedule hourly
src/config.ts all checked-in tunables
src/options.ts local (gitignored) per-deployment options, e.g. injection
src/hn.ts Hacker News Firebase API client
src/extract.ts article fetch (content-type/size guards) + Readability; HTMLβtext
src/extract-browser.ts headless-browser (Bun.WebView) extraction fallback tier
src/summarize.ts summarization backends (ChatGPT proxy + LLM gateway), prompts, retry
src/cache.ts JSON cache (in-memory singleton, atomic write, prune)
src/refresh.ts refresh pipeline (bounded concurrency, fallback-retry pass)
src/feed.ts RSS 2.0 rendering
src/page.ts HTML landing page
src/html.ts shared rendering helpers (escaping, domain, stats)
src/server.ts node:http server + static favicon assets
public/ favicons (orange "AI" mark)
hn-summaries.service systemd unit
```
## Deployment
Runs as a `systemd` service (`hn-summaries.service`) on an exe.dev VM, listening on `:8000`, published through the exe.dev HTTPS proxy with a `CNAME` for `hn.rlew.io` (TLS auto-issued). The hourly refresh runs in-process β no external cron.
```bash
sudo cp hn-summaries.service /etc/systemd/system/
sudo systemctl enable --now hn-summaries
journalctl -u hn-summaries -f
```
---
Story content Β© its respective authors; summaries are AI-generated and may contain errors.