{"id":47827906,"url":"https://github.com/josepavese/needlex","last_synced_at":"2026-04-25T00:02:10.812Z","repository":{"id":348506659,"uuid":"1194006173","full_name":"Josepavese/needlex","owner":"Josepavese","description":"Local-first runtime that compiles noisy web pages into verified high-signal context for AI agents","archived":false,"fork":false,"pushed_at":"2026-04-01T12:53:25.000Z","size":32075,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-01T13:34:38.248Z","etag":null,"topics":["ai-agents","context-engineering","golang","local-first","mcp","retrieval","sqlite","web-scraping"],"latest_commit_sha":null,"homepage":"https://github.com/Josepavese/needlex/releases/latest","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Josepavese.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-27T20:17:19.000Z","updated_at":"2026-04-01T12:53:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Josepavese/needlex","commit_stats":null,"previous_names":["josepavese/needlex"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/Josepavese/needlex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Josepavese%2Fneedlex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Josepavese%2Fneedlex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Josepavese%2Fneedlex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Josepavese%2Fneedlex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Josepavese","download_url":"https://codeload.github.com/Josepavese/needlex/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Josepavese%2Fneedlex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31374056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T17:53:18.093Z","status":"ssl_error","status_checked_at":"2026-04-03T17:53:17.617Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","context-engineering","golang","local-first","mcp","retrieval","sqlite","web-scraping"],"created_at":"2026-04-03T20:02:48.618Z","updated_at":"2026-04-25T00:02:10.800Z","avatar_url":"https://github.com/Josepavese.png","language":"HTML","readme":"# Needle-X\n\n[![dist](https://github.com/Josepavese/needlex/actions/workflows/dist.yml/badge.svg)](https://github.com/Josepavese/needlex/actions/workflows/dist.yml)\n[![installer-smoke](https://github.com/Josepavese/needlex/actions/workflows/installer-smoke.yml/badge.svg)](https://github.com/Josepavese/needlex/actions/workflows/installer-smoke.yml)\n[![release](https://img.shields.io/github/v/release/Josepavese/needlex?display_name=tag)](https://github.com/Josepavese/needlex/releases/latest)\n\n\u003e [!WARNING]\n\u003e Alpha software. Needle-X is still in active development and test. Install flow, local state layout, CLI details, and output shape may still change.\n\n**Turn messy web pages into compact, proof-carrying context for AI agents.**\n\n**Smaller packets. Fewer hops. Real provenance.**\n\n![Needle-X Hero](docs/assets/readme-hero.png)\n\n## Why It Wins\n\n1. **Smaller output**\n   Needle-X returns much less context than extraction-heavy tools.\n2. **Source-backed**\n   It carries proof, not just extracted text.\n3. **Less cleanup**\n   A downstream agent does less work before it can act.\n\n## Live Comparison\n\n| Metric | Needle-X | Tavily | Jina | Firecrawl |\n| --- | ---: | ---: | ---: | ---: |\n| Avg packet bytes | **4436** | 6975 | 30565 | 72166 |\n| Claim-to-source steps | **1** | 2 | 2 | 2 |\n| Post-processing burden | **0.25** | 1.92 | 1.86 | 2.50 |\n| Proof usability | **1.0** | 0 | 0 | 0 |\n\nNeedle-X vs `Jina`:\n- about **85.5% smaller** packets\n\nThis is the current sweet spot:\n1. compact context\n2. direct verification\n3. low-friction agent consumption\n\n![Needle-X Metrics](docs/assets/readme-metrics-2.png)\n\n## Discovery Memory\n\nNeedle-X includes local `Discovery Memory` backed by SQLite.\n\nThe story is simple:\n1. first run observes and compiles\n2. later runs reuse local verified evidence\n3. repeated use improves local retrieval without hosted infra\n\nDiscovery Memory is enabled by default and stored in the PAL state root. If an external embeddings service is unavailable, Needle-X falls back to a native local semantic vectorizer so memory still accumulates and remains searchable.\n\nCurrent verified seeded result on `seeded-corpus-v2`:\n1. **100/100** selected-url correctness\n2. **100/100** proof usability\n3. **100/100** runtime success\n\nGuardrail:\n1. seeded-runtime claim\n2. not a blanket cold-state open-web seedless claim\n3. Discovery Memory warm-state stress is tracked separately from the seeded runtime score\n\n![Needle-X Discovery Memory](docs/assets/readme-memory.png)\n\n## What It Does\n\n1. `read`\n2. `query`\n3. `crawl`\n4. `proof`\n5. `replay`\n6. `diff`\n7. `memory stats/search/prune/export/import/rebuild-index`\n8. `analytics stats/recent/value-report/hosts/providers/failures/daily/export`\n9. `logs path/stats/tail`\n10. `support bundle`\n11. `doctor`\n\nDefault output is AI-first:\n1. compact packet first\n2. proof inline when useful\n3. full diagnostics only on demand\n4. browser-like fetch by default for real-world targets\n5. local memory is populated automatically by successful `read`, `query`, and `crawl` runs\n6. MCP server accepts both standard `Content-Length` framing and raw newline-delimited JSON\n\nMCP advertises 9 tools: 7 core `web_*` tools plus `memory` and `analytics`.\nThe non-core `memory` and `analytics` surfaces use an explicit `action` parameter to avoid bloating agent tool lists with maintenance and observability operations.\n\n## Tiny Demo\n\n```bash\nneedlex read https://example.com --json\nneedlex query https://example.com --goal \"pricing\" --json\nneedlex proof proof_1 --json\nneedlex analytics stats\nneedlex analytics value-report\nneedlex logs stats\nneedlex support bundle --out /tmp/needlex-support\nneedlex doctor\n```\n\n`analytics stats` gives quick operational counters plus saved chars/tokens. `analytics value-report` is the fuller value view with estimated cost scenarios.\n`logs stats` shows the PAL runtime log state used for clean CLI/MCP diagnostics.\n`support bundle` exports a maintainer-friendly diagnostic directory with doctor, analytics, and runtime logs.\n\n## Install\n\nLinux and macOS:\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/Josepavese/needlex/main/install/install.sh | bash\n```\n\nWindows:\n\n```powershell\nirm https://raw.githubusercontent.com/Josepavese/needlex/main/install/install.ps1 | iex\n```\n\nInstalled command:\n1. `needlex`\n\nThis installer downloads the right release binary. Full details:\n1. [Install](docs/wiki/Install.md)\n\n## Agent Skill\n\nNeedle-X also ships an optional Codex skill that tells agents when to use Needle-X for web retrieval, when to escalate to browser/raw fetch tools, and how to avoid treating compact context as full DOM coverage.\n\nSkill path:\n1. [skills/needlex-web-retrieval](skills/needlex-web-retrieval)\n\nCodex install helper:\n\n```bash\npython3 ~/.codex/skills/.system/skill-installer/scripts/install-skill-from-github.py --repo Josepavese/needlex --path skills/needlex-web-retrieval\n```\n\nAfter installing the skill, restart Codex so it can discover it.\n\n## What It Is Not\n\n1. browser agent\n2. search engine\n3. generic scraper\n4. LLM-first reader\n\n## Read More\n\n1. [Wiki Home](docs/wiki/README.md)\n2. [Install](docs/wiki/Install.md)\n3. [CLI](docs/wiki/CLI.md)\n4. [MCP And Tool Calling](docs/wiki/MCP-And-Tool-Calling.md)\n5. [Discovery Memory](docs/wiki/Discovery-Memory.md)\n6. [Benchmarks](docs/wiki/Benchmarks.md)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjosepavese%2Fneedlex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjosepavese%2Fneedlex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjosepavese%2Fneedlex/lists"}