{"id":50319150,"url":"https://github.com/solcreek/sunbreak","last_synced_at":"2026-05-29T02:06:28.498Z","repository":{"id":358586578,"uuid":"1241995162","full_name":"solcreek/sunbreak","owner":"solcreek","description":"Local-first keyword monitoring and Hacker News research service","archived":false,"fork":false,"pushed_at":"2026-05-18T03:20:49.000Z","size":113,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-18T05:39:47.478Z","etag":null,"topics":["agent-friendly","cli","docker","fts5","golang","hacker-news","keyword-monitoring","local-first","monitoring","research-tool","rss","self-hosted","social-listening","sqlite","systemd"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/solcreek.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-18T03:18:33.000Z","updated_at":"2026-05-18T03:21:01.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/solcreek/sunbreak","commit_stats":null,"previous_names":["solcreek/sunbreak"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/solcreek/sunbreak","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solcreek%2Fsunbreak","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solcreek%2Fsunbreak/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solcreek%2Fsunbreak/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solcreek%2Fsunbreak/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/solcreek","download_url":"https://codeload.github.com/solcreek/sunbreak/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solcreek%2Fsunbreak/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33633468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-friendly","cli","docker","fts5","golang","hacker-news","keyword-monitoring","local-first","monitoring","research-tool","rss","self-hosted","social-listening","sqlite","systemd"],"created_at":"2026-05-29T02:06:25.638Z","updated_at":"2026-05-29T02:06:28.490Z","avatar_url":"https://github.com/solcreek.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sunbreak\n\nSunbreak is a local-first keyword monitoring and research service inspired by\ntools like F5Bot. It collects supported public sources, matches keywords and\nregular expressions, stores the results locally, and exposes both an HTTP API\nand an agent-friendly CLI.\n\nThe first target is a cheap single VPS, such as Hetzner. The implementation\ntherefore favors simple primitives:\n\n- Go single binary\n- SQLite WAL for persistence\n- SQLite FTS5 for local search\n- config-file administration\n- local notification outbox\n- systemd-friendly long-running process\n- optional Docker Compose deployment\n\nSunbreak is an MVP. It is useful for local monitoring, Hacker News research,\nand validating source adapters. It is not a full social listening platform.\n\n## Why Sunbreak?\n\nSome sources already expose useful search APIs. Hacker News is the clearest\nexample: the public Algolia API is excellent for quick discovery, historical\nsearches, and ad hoc analysis.\n\nUsing a source API directly is often the right choice when you need a one-off\nanswer:\n\n- fewer moving parts\n- no local database to operate\n- fast experiments with `curl`, `jq`, or notebooks\n- direct access to the source's current search behavior\n\nSunbreak adds value when the work needs to become repeatable, local, and\noperational:\n\n- query plans are explicit and reproducible\n- historical probes can split large ranges before pagination caps hide data\n- results can be cached in SQLite and searched locally with FTS5\n- Hacker News stories can be enriched with complete nested comment trees\n- links, relations, matches, digests, and source checkpoints are persisted\n- monitoring can continue forward from the backfilled or probed baseline\n- automation can use stable JSON output instead of source-specific API shapes\n\nIn short:\n\n```text\nHN Algolia API = excellent raw discovery/search API\nSunbreak       = local monitoring and research memory built on top of sources\n```\n\nSunbreak should not hide the existence of good source APIs. It should make\nthem safer to use repeatedly, easier to audit, and more useful for long-running\nmonitoring and analysis.\n\n## What Works\n\n- RSS and Atom collection with `ETag` and `Last-Modified` support\n- Hacker News collection through Algolia discovery and the HN Firebase item API\n- Full Hacker News comment tree ingestion with nested relations preserved\n- Link extraction from Hacker News items and comments\n- Reddit adapter interface with mock mode when credentials are absent\n- Keyword and regex rules\n- Match persistence\n- Digest generation\n- Notification outbox with stdout dispatch\n- Basic HTTP API\n- Vite + React + Tailwind dashboard\n- Agent-friendly JSON CLI output\n- Read-only Hacker News backfill probe for historical range planning\n\n## Source Policy\n\nSunbreak should be used with official APIs, RSS/Atom feeds, webhooks, public\narchives, or other publisher-supported access paths.\n\nDo not use Sunbreak to bypass rate limits, paywalls, login walls, robots\npolicies, access controls, platform restrictions, or source terms. Do not add\nproxy rotation, credential sharing, stealth browser automation, or similar\nevasive behavior.\n\nFor every enabled source:\n\n- read the source's current terms, developer policy, and API documentation\n- identify the app honestly when credentials or user agents are required\n- respect rate limits, `Retry-After`, `429`, `403`, and transient `5xx`\n  responses\n- prefer API, feed, or push-based access before HTML crawling\n- poll conservatively and add jitter when a source is checked repeatedly\n- store only what is needed for monitoring, search, and auditability\n- avoid collecting private, deleted, gated, or otherwise restricted content\n- do not use collected user content for model training unless you have the\n  necessary rights and permissions\n\n### Reddit\n\nReddit support is intentionally conservative. Live Reddit ingestion should use\napproved OAuth credentials and Reddit's current Data API rules. The MVP keeps a\nReddit adapter interface and mock path so the rest of the pipeline can be\ntested without scraping Reddit.\n\nFor Reddit, Sunbreak should prefer:\n\n- explicit subreddit watchlists\n- read-only ingestion\n- conservative polling\n- rate-limit-aware backoff\n- official API access once approved\n- RSS only as low-frequency discovery, not as a complete historical source\n\nSunbreak should not use unofficial Reddit scrapers, browser-cookie sidecars, or\nHTML scraping as default data sources.\n\n## Quick Start\n\n```sh\ncp config.example.yaml config.yaml\nmkdir -p data\ngo mod download\ngo run -tags sqlite_fts5 ./cmd/sunbreak -config config.yaml\n```\n\nHealth check:\n\n```sh\ncurl http://localhost:8080/healthz\n```\n\nRun one collection pass:\n\n```sh\ngo run -tags sqlite_fts5 ./cmd/sunbreak -config config.yaml -collect-once\n```\n\n## Dashboard\n\nThe dashboard lives in `web/` and uses Vite, React, Tailwind CSS, and\nshadcn-style base components.\n\nRun the API:\n\n```sh\ngo run -tags sqlite_fts5 ./cmd/sunbreak -config config.yaml\n```\n\nRun the dashboard:\n\n```sh\ncd web\nnpm install\nnpm run dev\n```\n\nOpen `http://localhost:5173/`. The Vite dev server proxies `/api` and\n`/healthz` to `http://localhost:8080`.\n\n## CLI\n\n```sh\nsunbreak -config config.yaml\nsunbreak -config config.yaml -migrate\nsunbreak -config config.yaml -collect-once\nsunbreak -config config.yaml -digest-once\nsunbreak -config config.yaml -dispatch-outbox\nsunbreak -describe\nsunbreak -config config.yaml -collect-once -output json\n```\n\n### Hacker News Backfill Probe\n\nForward collection is intentionally conservative. If local data is too sparse\nfor historical analysis, use the read-only backfill probe before calling source\nAPIs directly:\n\n```sh\nsunbreak backfill probe hackernews --query cloudflare --since 1y --output json\nsunbreak backfill probe hackernews --keywords cloudflare,workers,pages --from 2024-01-01 --to 2026-05-17 --output json\n```\n\nThe probe estimates Hacker News Algolia hit counts and returns a time-slice\nplan. It does not write local state. A future `backfill run` should reuse the\nsame slicing strategy, support `--dry-run`, write through SQLite\nde-duplication, and optionally enrich results with full HN thread data.\n\n### Agent-Friendly Operation\n\nSunbreak is designed to be usable by automation and AI agents:\n\n- commands are non-interactive and flag-driven\n- `-output json` keeps stdout machine-readable for one-shot commands\n- diagnostics and logs go to stderr when JSON output is requested\n- `-describe` returns a runtime JSON schema for supported flags and examples\n- `backfill probe hackernews` covers the \"local data is too sparse\" path\n- future mutating commands should support `--dry-run`\n- large result commands should support limits, pagination, or NDJSON streaming\n\nSee [CONTEXT.md](CONTEXT.md) for the short agent operating contract.\n\n## HTTP API\n\n- `GET /healthz`\n- `GET /api/sources`\n- `GET /api/rules`\n- `POST /api/rules`\n- `GET /api/items?query=\u0026limit=`\n- `GET /api/matches?hours=\u0026limit=`\n- `GET /api/digests?limit=`\n- `POST /api/collect`\n- `POST /api/digest`\n- `POST /api/outbox/dispatch`\n\nCreate or update a rule:\n\n```sh\ncurl -X POST http://localhost:8080/api/rules \\\n  -H 'Content-Type: application/json' \\\n  -d '{\"name\":\"My Product\",\"type\":\"keyword\",\"pattern\":\"my product\",\"enabled\":true}'\n```\n\nSearch ingested items:\n\n```sh\ncurl 'http://localhost:8080/api/items?query=sqlite\u0026limit=20'\n```\n\nList recent matches:\n\n```sh\ncurl 'http://localhost:8080/api/matches?hours=24\u0026limit=20'\n```\n\n## Configuration\n\nStart from [config.example.yaml](config.example.yaml). The example includes:\n\n- one Hacker News source\n- one RSS source\n- one Reddit adapter source in mock mode\n- sample keyword and regex rules\n- stdout notification dispatch\n\nLocal database files live under `data/` by default and are ignored by git.\n\n## Testing\n\nRun Go tests with SQLite FTS5 enabled:\n\n```sh\ngo test -tags sqlite_fts5 ./...\n```\n\nRun the coverage gate:\n\n```sh\nscripts/test/coverage.sh\n```\n\nRun dashboard checks:\n\n```sh\ncd web\nnpm run lint\nnpm run build\n```\n\nRun a local Docker smoke test:\n\n```sh\nscripts/smoke/docker-local.sh\n```\n\nThe current test suite covers collectors, source checkpointing, Hacker News\nthread expansion, RSS parsing, SQLite persistence, FTS search, matching,\ndigests, outbox dispatch, HTTP endpoints, and CLI backfill probing.\n\n## Benchmarks\n\n```sh\ngo test -tags sqlite_fts5 -run '^$' -bench . -benchmem ./internal/matcher ./internal/storage\n```\n\nBenchmarks currently cover matcher compilation, keyword/regex matching,\nSQLite item insertion, FTS search, and recent match reads.\n\n## Deployment\n\nBuild a Linux binary:\n\n```sh\ngo build -tags sqlite_fts5 -o sunbreak ./cmd/sunbreak\n```\n\nRun with systemd using\n[deployments/systemd/sunbreak.service](deployments/systemd/sunbreak.service).\n\nOr use Docker Compose:\n\n```sh\ndocker compose up --build\n```\n\n## Roadmap\n\n- `backfill run hackernews` for historical import with dry-run support\n- topic aggregation API and dashboard view\n- HN opportunity-analysis recipes for recurring pain and market research\n- source presets for company blogs and changelogs\n- credentialed Reddit API adapter after approval-oriented design work\n- richer notification channels\n\n## License\n\nNo license file has been added yet.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolcreek%2Fsunbreak","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsolcreek%2Fsunbreak","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolcreek%2Fsunbreak/lists"}