{"id":49895714,"url":"https://github.com/lonelycode/newsstack","last_synced_at":"2026-05-15T23:42:03.008Z","repository":{"id":353305155,"uuid":"1181267465","full_name":"lonelycode/newsstack","owner":"lonelycode","description":"News MCP","archived":false,"fork":false,"pushed_at":"2026-04-23T09:26:55.000Z","size":130,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-23T11:26:07.962Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lonelycode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-13T23:43:44.000Z","updated_at":"2026-04-23T09:26:59.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lonelycode/newsstack","commit_stats":null,"previous_names":["lonelycode/newsstack"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/lonelycode/newsstack","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonelycode%2Fnewsstack","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonelycode%2Fnewsstack/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonelycode%2Fnewsstack/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonelycode%2Fnewsstack/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lonelycode","download_url":"https://codeload.github.com/lonelycode/newsstack/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonelycode%2Fnewsstack/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33083989,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-15T20:25:35.270Z","status":"ssl_error","status_checked_at":"2026-05-15T20:25:34.732Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-15T23:41:59.270Z","updated_at":"2026-05-15T23:42:03.003Z","avatar_url":"https://github.com/lonelycode.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Newsstack\n\nMCP server for news intelligence. Aggregates articles from RSS feeds and GDELT, deduplicates, clusters related stories, and exposes structured news data to AI agents via MCP tools.\n\nAll ML inference runs locally — no external API keys needed for core functionality.\n\n## Architecture\n\n```\nRSS Feeds ──┐                                  ┌── get_latest_headlines\nGDELT API ──┤► Normalize ► Dedup ► Embed ► NER ├── search_news\n            │    (URL/SimHash/Vector)           ├── get_news_by_region\n            └──────────────────────────────────►├── get_topic_briefing\n                  ↕            ↕                └── get_trending_topics\n               SQLite       Qdrant\n              (articles,   (embeddings,\n              entities,     768-dim\n              clusters)     cosine)\n```\n\n- **Transport:** Streamable HTTP on port 8080\n- **Embedding:** `nomic-embed-text-v1.5` via local OpenAI-compatible server (port 9097)\n- **Summarization:** Local LLM via OpenAI-compatible server (port 9096)\n- **NER:** GLiNER (in-process, no external service)\n- **Clustering:** HDBSCAN over article embeddings\n\n## Prerequisites\n\n- Python 3.12+\n- [uv](https://docs.astral.sh/uv/)\n- Docker (for Qdrant, or run it natively)\n- Local embedding server on port 9097 (e.g., llama.cpp, Ollama, vLLM)\n- Local LLM server on port 9096 (e.g., llama.cpp, Ollama, vLLM)\n\n## Quickstart\n\n### With Docker (recommended)\n\n```bash\ndocker compose up\n```\n\nThis starts both the MCP server and Qdrant. The server is available at `http://localhost:8080/mcp`.\n\nYour embedding and LLM servers should be running on the host at ports 9097 and 9096 respectively.\n\n### Local development\n\n```bash\n# Install dependencies\nuv sync\n\n# Start Qdrant separately\ndocker run -p 6333:6333 -p 6334:6334 qdrant/qdrant\n\n# Run the server\nuv run python -m newsstack\n```\n\n## MCP Client Configuration\n\nAdd to your MCP client config (e.g., Claude Desktop `claude_desktop_config.json`):\n\n```json\n{\n  \"mcpServers\": {\n    \"newsstack\": {\n      \"url\": \"http://localhost:8080/mcp\"\n    }\n  }\n}\n```\n\n## Tools\n\n| Tool | Description |\n|------|-------------|\n| `get_latest_headlines` | Top story clusters from the last N hours, optionally filtered by category |\n| `search_news` | Semantic vector search with optional region/time filters |\n| `get_news_by_region` | Articles filtered by region code |\n| `get_topic_briefing` | LLM-generated intelligence briefing on a topic |\n| `get_trending_topics` | Trending story clusters ranked by article count |\n\n## Data Sources\n\n### RSS Feeds (fetched every 5 minutes)\n\nFeeds are defined in a YAML file. By default, newsstack ships a global news bundle (AP, NPR, NYT, BBC World, BBC Tech, Guardian World, Al Jazeera) at `src/newsstack/feeds.default.yaml`. To override, set `NEWSSTACK_FEEDS_FILE` to the path of your own YAML file with this schema:\n\n```yaml\nfeeds:\n  - id: bbc-world              # required, slug ([a-z0-9][a-z0-9_-]*)\n    name: BBC World            # required\n    url: https://feeds.bbci.co.uk/news/world/rss.xml  # required\n    region: global             # optional, default \"global\", free-form string\n    category: world            # optional, default \"general\", free-form string\n    enabled: true              # optional, default true\n```\n\nThe file is authoritative on every startup:\n\n- new ids are inserted\n- existing ids are updated (name/region/category/enabled)\n- ids no longer in the file are disabled (not deleted — preserves article references)\n- per-row ETag / Last-Modified cache is preserved when the URL is unchanged, and cleared when the URL changes\n\nBad YAML, malformed URLs, duplicate ids or URLs, or missing required fields all cause startup to fail with a clear error — prefer loud failure over silent partial-load.\n\n### GDELT (fetched every 10 minutes)\n\nThe [GDELT DOC API](https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/) is queried for recent English-language articles. No API key required. Set `NEWSSTACK_GDELT_ENABLED=false` to disable entirely (useful for single-region tenants).\n\n## Deduplication\n\nArticles pass through three dedup layers:\n\n1. **URL hash** — exact match against SQLite unique constraint\n2. **SimHash** — near-duplicate detection (hamming distance \u003c= 3)\n3. **Vector cosine** — semantic dedup (cosine similarity \u003e 0.95)\n\n## Scheduling\n\n| Job | Interval |\n|-----|----------|\n| RSS ingestion | 5 min |\n| GDELT ingestion | 10 min |\n| Clustering (HDBSCAN) | 15 min |\n| Retention cleanup | Daily 3:00 AM |\n\nData retention window is 180 days (configurable).\n\n## Configuration\n\nAll settings are configurable via environment variables with the `NEWSSTACK_` prefix. See `.env.example` for the full list.\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `NEWSSTACK_EMBEDDING_URL` | `http://localhost:9097/v1/embeddings` | Embedding server endpoint |\n| `NEWSSTACK_EMBEDDING_MODEL` | `nomic-embed-text-v1.5` | Embedding model name |\n| `NEWSSTACK_LLM_URL` | `http://localhost:9096/v1/chat/completions` | LLM server endpoint |\n| `NEWSSTACK_LLM_MODEL` | `qwen3.5` | LLM model name |\n| `NEWSSTACK_QDRANT_URL` | `http://localhost:6333` | Qdrant server URL |\n| `NEWSSTACK_DB_PATH` | `newsstack.db` | SQLite database path |\n| `NEWSSTACK_FEEDS_FILE` | _(packaged default)_ | Path to YAML feed config |\n| `NEWSSTACK_GDELT_ENABLED` | `true` | Toggle GDELT ingestion |\n| `NEWSSTACK_HOST` | `0.0.0.0` | Server bind host |\n| `NEWSSTACK_PORT` | `8080` | Server bind port |\n| `NEWSSTACK_RETENTION_DAYS` | `180` | Data retention window |\n\n### Running a different tenant\n\nMount your feed config and isolate state:\n\n```bash\nNEWSSTACK_FEEDS_FILE=/etc/newsstack/nz-feeds.yaml \\\nNEWSSTACK_GDELT_ENABLED=false \\\nNEWSSTACK_DB_PATH=nz.db \\\nNEWSSTACK_QDRANT_URL=http://qdrant-nz:6333 \\\nuv run python -m newsstack\n```\n\nTwo tenants must use different `NEWSSTACK_DB_PATH` and `NEWSSTACK_QDRANT_URL` so their articles and clusters don't bleed together.\n\n## Resetting data\n\nOn startup, newsstack checks that SQLite and Qdrant are in sync. If one has data and the other is empty (e.g., after a volume was deleted), it automatically resets the stale side so ingestion starts clean.\n\nTo fully reset all data:\n\n```bash\n# Docker\ndocker compose down -v   # removes both data volumes\ndocker compose up --build\n\n# Local development\nrm newsstack.db\ncurl -X DELETE http://localhost:6333/collections/news_articles\nuv run python -m newsstack\n```\n\n## Development\n\n```bash\n# Lint\nuv run ruff check src/\n\n# Test\nuv run pytest\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flonelycode%2Fnewsstack","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flonelycode%2Fnewsstack","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flonelycode%2Fnewsstack/lists"}