{"id":50586807,"url":"https://github.com/peakstone-labs/sembr","last_synced_at":"2026-06-05T07:00:40.640Z","repository":{"id":360635299,"uuid":"1222063724","full_name":"Peakstone-Labs/sembr","owner":"Peakstone-Labs","description":"Self-hosted intent radar — Reverse RAG for any input stream.","archived":false,"fork":false,"pushed_at":"2026-05-27T07:38:39.000Z","size":5270,"stargazers_count":28,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-27T09:12:00.840Z","etag":null,"topics":["bge-m3","embeddings","fastapi","llm-tools","news-monitoring","qdrant","reverse-rag","rss","self-hosted","semantic-search"],"latest_commit_sha":null,"homepage":"https://peakstone-labs.github.io/sembr","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Peakstone-Labs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-27T02:24:17.000Z","updated_at":"2026-05-27T07:15:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Peakstone-Labs/sembr","commit_stats":null,"previous_names":["peakstone-labs/sembr"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/Peakstone-Labs/sembr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Peakstone-Labs%2Fsembr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Peakstone-Labs%2Fsembr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Peakstone-Labs%2Fsembr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Peakstone-Labs%2Fsembr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Peakstone-Labs","download_url":"https://codeload.github.com/Peakstone-Labs/sembr/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Peakstone-Labs%2Fsembr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33932048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-05T02:00:06.157Z","response_time":120,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bge-m3","embeddings","fastapi","llm-tools","news-monitoring","qdrant","reverse-rag","rss","self-hosted","semantic-search"],"created_at":"2026-06-05T07:00:30.513Z","updated_at":"2026-06-05T07:00:40.633Z","avatar_url":"https://github.com/Peakstone-Labs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/brand/logo-lockup.png\" alt=\"sembr\" width=\"320\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cb\u003eYour private intelligence analyst.\u003c/b\u003e\u003cbr\u003e\n  \u003ci\u003eSay what to watch — and how to analyze it. sembr scans your chosen feeds continuously, matches by meaning (not keywords), and delivers analyst-shaped digests on your terms.\u003c/i\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/Peakstone-Labs/sembr/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/Peakstone-Labs/sembr/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-Apache--2.0-blue.svg\" alt=\"License: Apache-2.0\"\u003e\u003c/a\u003e\n  \u003ca href=\"pyproject.toml\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.12-blue?logo=python\u0026logoColor=white\" alt=\"Python 3.12\"\u003e\u003c/a\u003e\n  \u003ca href=\"Dockerfile\"\u003e\u003cimg src=\"https://img.shields.io/badge/docker-compose-2496ED?logo=docker\u0026logoColor=white\" alt=\"Docker\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://panel.peakstone-labs.com/static/img/sembr-preview-qrcode.png\"\u003e\u003cimg src=\"https://img.shields.io/badge/WeChat-交流群-07C160?logo=wechat\u0026logoColor=white\" alt=\"WeChat 交流群\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://panel.peakstone-labs.com/#news\"\u003e\u003cb\u003eLive demo\u003c/b\u003e\u003c/a\u003e ·\n  \u003ca href=\"README.zh-CN.md\"\u003e中文\u003c/a\u003e ·\n  \u003ca href=\"https://peakstone-labs.github.io/sembr\"\u003eDocumentation\u003c/a\u003e ·\n  \u003ca href=\"#alternatives-and-why-sembr-exists\"\u003eAlternatives\u003c/a\u003e ·\n  \u003ca href=\"#quickstart\"\u003eQuickstart\u003c/a\u003e ·\n  \u003ca href=\"#for-ai-agents\"\u003eFor AI agents\u003c/a\u003e ·\n  \u003ca href=\"https://github.com/Peakstone-Labs/sembr/discussions\"\u003eDiscussions\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n**sembr** is a **self-hosted intent radar**. You describe what you care about once — _\"monitor Fed policy impact on emerging-market currencies\"_ — and it continuously scans RSS feeds, news APIs, and social streams, matches articles to your intent via semantic vectors, and generates reports through the analytical lens you configure.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/brand/hero.png\" alt=\"sembr — Reverse RAG\" width=\"720\"\u003e\n  \u003cbr\u003e\n  \u003csub\u003e\u003ci\u003eLive demo: sembr powers the News tab inside \u003ca href=\"https://panel.peakstone-labs.com\"\u003ePeakstone Labs' A股 panel\u003c/a\u003e.\u003c/i\u003e\u003c/sub\u003e\n\u003c/p\u003e\n\n\u003c!-- TODO: add product UI strip (intent editor / dashboard / digest email) once captured --\u003e\n\n## Why sembr\n\n- **Semantic, not keyword.** Your intent is an embedding, not an `OR`-list. *\"EM currency contagion\"* matches *\"Turkish lira plunges as Fed eyes another hike\"* with zero shared words.\n- **Bilingual out of the box.** [BGE-M3](https://huggingface.co/BAAI/bge-m3) was picked specifically for CJK + English mixed content. Write your intent in one language; mixed-language sources — Bloomberg, Reuters, Nature, 财联社, 华尔街见闻, 36氪 — all match against it.\n- **Per-intent analyst lens.** Each intent can bind its own analyst template (system + instruction, edited from the dashboard). The same article — under *\"macro asset allocator\"* outputs cross-asset rotation signals and rebalancing setups; under *\"short-term commodity desk\"* outputs supply-demand pivots and near-term catalysts — sembr isn't just *finding* matches, it's *analyzing them your way*. Templates are highly customizable; more bundled ones landing post-1.0.\n- **Free embeddings, pennies per digest.** The default embedder (BGE-M3 on [SiliconFlow](https://siliconflow.cn)) is free at any volume. The default LLM (DeepSeek-V4-Flash) is per-token (**input $0.14 / output $0.28 per 1M tokens**). A typical daily digest — dozens of full articles in plus the analysis out — usually runs around a cent. OpenAI-compatible protocol means you can swap to OpenAI / Together / Groq / Ollama / mlx-lm any time.\n- **Data sovereignty stays with you.** Your intents and match history live in local Qdrant — no third party sees them. The default embedder + LLM hit cloud APIs (SiliconFlow / DeepSeek) for zero-friction startup, but both are ABC seams — swap in Ollama / mlx-lm and the data never leaves the box.\n- **Cron or event.** Per-intent schedule: a fixed digest time (*\"every weekday 09:00 in Asia/Shanghai\"*) or event-mode (*\"fire when something moves\"*).\n- **Pluggable everywhere.** Source / channel / embedder / LLM are all ABC seams. Telegram, Discord, Slack channels, local LLM backends (mlx-lm, Ollama), and more source plugins (Reddit, HN, Mastodon) are scaffolded for post-1.0.\n- **Agent-friendly by design.** One-shot install by an AI agent, Agent Skills integration, and a synchronous fire endpoint built for orchestrators. See [For AI agents](#for-ai-agents).\n\n## How \"Reverse RAG\" works\n\n\u003e *Attention is all you need.* — Vaswani et al., 2017\n\u003e\n\u003e *AI is your attention.* — sembr\n\nClassic RAG: user types a query → app retrieves matching documents → LLM answers.\n\n**Reverse RAG (sembr):** user defines an intent → sembr embeds it once → every new article runs against every standing intent vector → matches get summarized and pushed.\n\nThe flip is small but its implications are big. Queries become first-class entities you can name, edit, schedule, and version. Retrieval becomes a long-running job, not a request-response round-trip. *\"Answer quality\"* becomes *\"how relevant were the last 10 things I was told about.\"*\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/screenshots/intents.jpeg\" alt=\"sembr Intents tab — five active intents written in natural Chinese, each with its own daily/weekly cron schedule, similarity threshold, language, and tags\" width=\"900\"\u003e\n  \u003cbr\u003e\n  \u003csub\u003eFive live intents from a working deployment. Each is a natural-language brief; the cron preset + threshold + tags fully define the matcher's behaviour. Live digests at \u003ca href=\"https://panel.peakstone-labs.com/#news\"\u003epanel.peakstone-labs.com\u003c/a\u003e.\u003c/sub\u003e\n\u003c/p\u003e\n\n→ Full architecture write-up: [docs/architecture.md](docs/architecture.md)\n\n## Alternatives, and why sembr exists\n\nHow the closest tools in the market compare on the dimensions that matter for sembr's use case:\n\n| | Price | Semantic | Custom sources | Self-host | Bilingual CN+EN | Per-intent lens | Agent API |\n| --- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n| **Feedly Pro+ AI** | ~$99 / yr | ✅ | ⚠️ ¹ | ❌ | ⚠️ ² | ⚠️ ³ | ❌ |\n| **Inoreader Pro** | $90 / yr | ❌ | ✅ | ❌ | ⚠️ | ⚠️ ⁴ | ⚠️ |\n| **Brand24 / Mention** | $199–$499 / mo | ❌ | ❌ ⁵ | ❌ | ⚠️ | ❌ | ✅ |\n| **Bloomberg Terminal** | ~$32k / yr / seat | ✅ ⁶ | ❌ | ❌ | ✅ | ❌ | ⚠️ ⁷ |\n| **FreshRSS / miniflux** | $0 (self-host) | ❌ | ✅ | ✅ | ❌ | ❌ | ⚠️ |\n| **Google Alerts** | $0 | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |\n| **Perplexity Pro** | $20 / mo | ✅ | ❌ | ❌ | ⚠️ | ⚠️ ⁸ | ✅ |\n| **sembr** | **Self-host + ~$0.014 / intent / day** | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |\n\n✅ comparable to sembr or better · ⚠️ partial / with caveats · ❌ not supported\n\n\u003csub\u003e\n¹ Limited to Feedly's curated index — you don't point it at arbitrary RSS / NewsAPI.\u003cbr\u003e\n² Translates non-English articles to English first; not native cross-lingual vectors.\u003cbr\u003e\n³ Natural-language filter, not a per-feed system+instruction prompt template.\u003cbr\u003e\n⁴ On-demand custom queries per article (GPT-4o-mini, 1M tokens / month); not a standing per-intent prompt.\u003cbr\u003e\n⁵ Vendor scans the public web for you; you don't get to point it at specific sources.\u003cbr\u003e\n⁶ ASKB conversational AI (beta in 2026); proprietary, terminal-only.\u003cbr\u003e\n⁷ B-Pipe data licensing priced separately (institutional only).\u003cbr\u003e\n⁸ Spaces persistent custom instructions are real, but apply per-query (pull); sembr applies them push-style on every match.\n\u003c/sub\u003e\n\n**DIY paths** — n8n / Huginn + LangChain + a vector DB + your own scheduler — could check ✅ on every column above. You'd be assembling 5+ moving parts and owning the long tail of feed parsing, embedding rate-limits, dedup, prompt management, and notification reliability yourself. sembr is the turnkey version of that stack.\n\nIf you're an institution with budget, run Bloomberg or Brand24. If you're happy with a hosted plan and your watchlist isn't sensitive, Feedly Pro+ is great. sembr is for the slice where you want all four of **semantic + bilingual + custom sources + self-host** at once, **and** the per-intent analyst lens applied push-style on every match. **No tool we've found today sits at that intersection.**\n\n### \"What if I just wrap Perplexity's API in a cron loop?\"\n\nThe table above covers head-to-head capabilities. The \"wrap the API in a script\" alternative is the one place readers most often think they can DIY past sembr. You can, for one or two low-frequency topics. Three structural gaps don't go away if you try at scale:\n\n1. **Cost shape** — every poll costs ~$0.005–0.02 vs sembr's \"free until matched\". 10 intents × 24 polls/day × 365 days ≈ 87k API calls; the math gets ugly fast.\n2. **Matching quality** — you'd hand-craft search queries every time, instead of writing one natural-language intent that BGE-M3 vectorises once. *\"EM contagion\"* won't return *\"Turkish lira plunges as Fed eyes another hike\"* through keyword ranking; semantic vectors do.\n3. **Watchlist leak** — every poll mails what you're monitoring to a third party. *What you're watching is itself signal* — sembr keeps it on your hardware.\n\n## Quickstart\n\n**Got an AI coding agent on this machine?** Jump to [For AI agents](#for-ai-agents) below — one-shot install + an Agent Skills bundle for the post-install API.\n\n**Manual install** (everything below, ~15 min). Requires Docker + Docker Compose. First run pulls Qdrant + RSSHub and builds the API image (Python 3.12 base + Docker CLI + pip wheels) — **about 1 GB total network download, 10–15 minutes on a typical home connection**. `/health` returns `503` until the embedder probe completes.\n\n```bash\ngit clone https://github.com/Peakstone-Labs/sembr.git\ncd sembr\ncp .env.example .env                 # 1. seed config\n# open .env, set EMBEDDER_API_KEY (free key at https://siliconflow.cn)\ndocker compose up --build            # 2. start everything\n\n# in another shell, 1–2 minutes later:\ncurl -i http://localhost:8000/health         # 200 once embedder probe completes\nopen http://localhost:8000/dashboard          # web UI\n```\n\nOut of the box: 53 pre-loaded sources across RSS / NewsAPI / Twitter (EN + CN), a live dashboard, and a working `/intents` API. Create your first intent:\n\n```bash\ncurl -X POST http://localhost:8000/intents \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"name\": \"fed-emerging-markets\",\n    \"text\": \"Fed policy impact on emerging-market currencies and capital flows\",\n    \"timezone\": \"America/New_York\",\n    \"schedule\": {\"mode\": \"cron\", \"preset\": \"daily\", \"hour\": 8, \"minute\": 0},\n    \"channels\": [{\"type\": \"email\", \"to\": [\"you@example.com\"]}]\n  }'\n```\n\nNext digest fires on schedule. Done.\n\n→ Step-by-step walkthrough: [docs/getting-started.md](docs/getting-started.md)\n→ Putting sembr on a public IP? Read [docs/deployment/public.md](docs/deployment/public.md) first — TL;DR keep the default `127.0.0.1` bind, put sembr behind a reverse proxy with TLS, and set a strong `DASHBOARD_TOKEN`.\n\n## What's in the box\n\n**53 pre-loaded sources across three source types** — curated for substantive body text or information-dense headlines:\n\n| Source type | Pre-loaded | Examples |\n| --- | --- | --- |\n| RSS feeds | 22 | The Guardian, SCMP, NPR, Washington Post, Bloomberg Markets, 华尔街见闻, 第一财经, 36氪, 虎嗅, 财联社电报, 澎湃, 国家统计局, Nature ×3, HelloGitHub |\n| Twitter | 1 | Elon Musk — extend with your own users / keyword searches via a `TWITTER_AUTH_TOKEN` cookie |\n| [NewsAPI.ai](https://newsapi.ai) aggregator | 30 | Reuters, BBC, NYT, WSJ, FT, Economist, Bloomberg, The Atlantic, NPR, TechCrunch, Wired, Ars Technica, Vox, … |\n\nRSS routes that need a JS-rendering origin (most CN sources, Twitter) go through the bundled **[RSSHub](https://rsshub.app)** sidecar — no extra setup. NewsAPI.ai's free signup token covers roughly 30 days of normal polling; get one at [newsapi.ai](https://newsapi.ai) and drop it into `.env`. Full per-feed list: [docs/getting-started.md](docs/getting-started.md).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/screenshots/feeds.jpeg\" alt=\"sembr Feeds tab — Reuters row expanded showing real article titles + URLs, with the rest of the 70-feed list visible below\" width=\"900\"\u003e\n  \u003cbr\u003e\n  \u003csub\u003eFeeds tab. Each row is a live source; expand to inspect the most recent ingests with their source URLs and timestamps.\u003c/sub\u003e\n\u003c/p\u003e\n\n- **BGE-M3 embeddings** via SiliconFlow (free), or any OpenAI-compatible `/v1/embeddings` endpoint\n- **[Qdrant](https://qdrant.tech) vector store** with scalar int8 quantization (10M vectors fit in ~600 MB RAM)\n- **LLM digest generation** via any OpenAI-compatible `/v1/chat/completions` — defaults to DeepSeek-V4-Flash on SiliconFlow\n- **Email delivery** (SMTP, multipart/related, per-intent timezone, matcher-score badges)\n- **Monitoring dashboard**: live feed health, embedder latency, container CPU / mem / uptime, Qdrant article browser with date / source / title filters, log SSE, one-click restart\n- **Runtime settings editor** that writes the host `.env` and recreates the affected containers in place — you can do everything from the UI\n- **Custom prompt templates** — system + instruction, with strict-placeholder validation and dashboard CRUD\n\n→ Module-by-module deep dives: [docs/modules/](docs/modules/index.md)\n\n## Configuration\n\n`pydantic-settings` with a four-level precedence chain (highest wins):\n\n1. Shell env vars\n2. `.env` file (project root)\n3. `sembr.yaml` (project root)\n4. Built-in defaults\n\nSensitive values (`EMBEDDER_API_KEY`, `LLM_API_KEY`, `DASHBOARD_TOKEN`, SMTP creds) belong in env vars or a properly-permissioned `.env` — never in committed files. Full surface: [docs/configuration.md](docs/configuration.md).\n\n\u003e ⚠️ **Set `DASHBOARD_TOKEN` whenever the host is reachable beyond `localhost`.** Without it, `/api/dashboard/*` and the settings editor are unauthenticated. The Settings editor also bind-mounts the host docker socket so it can recreate containers — that's a deliberate single-tenant trade-off (same model as Watchtower / Portainer); anyone with API access is effectively docker-root on the host. Don't run sembr on a multi-tenant host without accepting that. See [docs/deployment/public.md](docs/deployment/public.md) for the full hardening checklist.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/screenshots/settings.jpeg\" alt=\"sembr Settings tab — collapsible groups for Embedder / LLM / NewsAPI / RSSHub / Email / Dashboard / Maintenance / etc., with the LLM group expanded showing in-app .env edits with inline documentation\" width=\"900\"\u003e\n  \u003cbr\u003e\n  \u003csub\u003eSettings tab. Edit the host \u003ccode\u003e.env\u003c/code\u003e in the browser; secret fields are masked; saves are dry-run validated, then a \u003ccode\u003eRestartController\u003c/code\u003e recreates the affected container in place.\u003c/sub\u003e\n\u003c/p\u003e\n\n## For AI agents\n\nsembr is designed to be **installed**, **driven**, and **embedded** by AI coding agents. Three pieces of scaffolding ship in the repo:\n\n### 1. One-shot install\n\nIf you have an AI coding agent with shell access on the target machine (Claude Code, Cursor, Cline, Aider, Continue, Roo, OpenClaw, Hermes, …), paste this:\n\n\u003e Read https://github.com/Peakstone-Labs/sembr/blob/main/agent/INSTALL.md and follow it to install sembr on this machine.\n\n[`agent/INSTALL.md`](agent/INSTALL.md) is a 6-phase script the agent works through: hardware self-check → Docker setup → repo clone → key validation → access-mode choice (localhost / LAN / public) → bring up → first health round-trip. Image pulls run in the background while it asks you for API keys in parallel, so wall-clock is ~15 min of which ~10 are unattended.\n\nFor public-internet deployment the agent branches into [`agent/PUBLIC_INSTALL.md`](agent/PUBLIC_INSTALL.md) — DNS check, mandatory side-service port lockdown (qdrant/rsshub), reverse proxy + TLS via Caddy / nginx + certbot / Cloudflare Tunnel / trycloudflare, ufw, and an explicit docker.sock decision — then returns to Phase 5 to bring the stack up and run external verification.\n\n### 2. Skill bundle for post-install operation\n\nOnce sembr is running, [`agent/sembr/`](agent/sembr/) is an [Agent Skills](https://agentskills.io) bundle that teaches any agent how to drive the HTTP API:\n\n| File | Content |\n| --- | --- |\n| `SKILL.md` | Auth model, decision matrix for which `fire` endpoint to use, guardrails |\n| `references/endpoints.md` | Full surface — 31 endpoints across feeds / intents / fire / external-fire / settings / prompts / translate |\n| `references/schemas.md` | `IntentCreate` / `FeedCreate` / `ExternalFireRequest` body shapes, including the cron/event discriminated union and channel discriminator |\n| `references/recipes.md` | Copy-pasteable curl + Python `httpx` workflows |\n| `references/errors.md` | Status code table and scrubbed-detail error contract |\n\n**Claude Code**: `cp -r agent/sembr ~/.claude/skills/sembr` for auto-loading. **Other platforms**: hand your agent `agent/sembr/SKILL.md` directly, or consult your platform's skill-loading docs.\n\n### 3. The agent-callable fire endpoint\n\n`POST /api/external/intents/{id}/fire` is the orchestrator-facing diagnostic endpoint:\n\n- **Synchronous** — matches + LLM summary in the response, no polling, no `task_id` hand-off\n- **No notifier** — the intent's email recipients are not pinged; safe for \"what would this intent match right now?\" without spamming\n- **No state writes** — doesn't touch `match_seen`, idempotent under repeated calls\n- **Per-call overrides** — `lookback_seconds` (`300`–`2_592_000`), `threshold` (`0.20`–`0.95`, wider than the `0.60`–`0.95` at intent-create time, so you can sweep low during diagnostics), `feed_ids` (subset or `null` for all)\n\nDrop sembr into any orchestrator (Hermes, OpenClaw, LangGraph, your own) and let it decide when to look at the world. The response shape, error contract, rate-limit (1/intent/60 s), and the cron-mode-only constraint are documented in [`agent/sembr/references/endpoints.md`](agent/sembr/references/endpoints.md).\n\n## Tech stack\n\nPython 3.12 · FastAPI 0.115 · Pydantic v2 · APScheduler 3.11 · aiosqlite (WAL) · Qdrant 1.17 · httpx · BGE-M3 · DeepSeek-V4-Flash · Apache-2.0\n\nRuns comfortably on **4 GB RAM** (homelab / Mac mini / NAS / $10 VPS) — measured baseline is ~1 GB across the three containers at the default 53-source workload. Scale up `qdrant.mem_limit` to 4G+ if you ingest at the millions-of-articles tier.\n\n## Status\n\n**v1.0** — first stable release. Ships RSS ingestion, BGE-M3 embeddings, Qdrant dual-collection, intent CRUD (cron + event), LLM-summarized digests, email channel, monitoring dashboard, runtime settings editor, and a hardened public-deployment guide.\n\n**Post-1.0:** Telegram / Discord / Slack channels, local LLM backends (mlx-lm, Ollama), Reddit / HN / Mastodon source plugins, entry-points plugin discovery, notification retry / DLQ, multi-worker deployment.\n\n→ Versioning policy and changelog: [CHANGELOG.md](CHANGELOG.md)\n\n## Built by\n\n[Peakstone Labs](https://github.com/Peakstone-Labs) — AI-native quantitative research. sembr started as the news side of an internal alpha-research pipeline; opening it up makes it useful to a much wider set of people watching the same world we are.\n\nIf you have feedback, found a bug, or want a source / channel plugin: [Discussions](https://github.com/Peakstone-Labs/sembr/discussions) for ideas and questions, [Issues](https://github.com/Peakstone-Labs/sembr/issues) for bugs and concrete feature requests, [SECURITY.md](SECURITY.md) for vulnerability reports. Contributions welcome — see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\n[Apache-2.0](LICENSE). © 2025–2026 Peakstone Labs and sembr contributors.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeakstone-labs%2Fsembr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpeakstone-labs%2Fsembr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeakstone-labs%2Fsembr/lists"}