{"id":47621348,"url":"https://github.com/barkain/agentlib","last_synced_at":"2026-04-05T17:02:12.704Z","repository":{"id":345943024,"uuid":"1187525357","full_name":"barkain/agentlib","owner":"barkain","description":"Agentic Knowledge Navigation — preprocessed books + a skill that teaches AI agents to navigate efficiently. No server, no vector DB, just files + instructions. Claude Code plugin.","archived":false,"fork":false,"pushed_at":"2026-03-28T16:55:43.000Z","size":22654,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-02T06:14:14.631Z","etag":null,"topics":["ai-agents","claude","claude-code","knowledge-management","llm","mcp","rag"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/barkain.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-20T20:36:40.000Z","updated_at":"2026-03-28T16:55:37.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/barkain/agentlib","commit_stats":null,"previous_names":["barkain/agentlib"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/barkain/agentlib","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barkain%2Fagentlib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barkain%2Fagentlib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barkain%2Fagentlib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barkain%2Fagentlib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/barkain","download_url":"https://codeload.github.com/barkain/agentlib/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/barkain%2Fagentlib/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31442924,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T15:22:31.103Z","status":"ssl_error","status_checked_at":"2026-04-05T15:22:00.205Z","response_time":75,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","claude","claude-code","knowledge-management","llm","mcp","rag"],"created_at":"2026-04-01T22:14:31.483Z","updated_at":"2026-04-05T17:02:12.698Z","avatar_url":"https://github.com/barkain.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/agentlib_hero.gif\" alt=\"AgentLib Demo\" width=\"800\"\u003e\n\u003c/p\u003e\n\n# AgentLib\n\n**Curate your knowledge library. Your agent works from sources you trust.**\n\nAI agents search the internet or re-read documents from scratch on every question. They have no persistent knowledge, no domain expertise, and no way to distinguish trusted sources from noise.\n\nAgentLib changes this. Ingest the books, papers, and documents that matter for your work — once. Your agent gets a structured, indexed library it can navigate autonomously: finding relevant content in seconds, citing exact sources, and proactively consulting your library while coding.\n\n- **Your sources, your curation** — choose which books, papers, standards, and internal docs your agent should know\n- **Always available** — ingested once, accessible across every session with no re-uploading\n- **Proactive, not reactive** — the agent checks the library automatically when working on domain-specific tasks\n- **Citable answers** — every response traces back to a specific book, chapter, and section\n\n## How it works\n\nAgentLib has three parts:\n\n1. **Ingestion pipelines** — preprocess books, scientific paper corpora, and databases into small, self-contained chunks with lightweight metadata at multiple layers.\n2. **MCP tools** — the plugin registers an MCP server with 6 tools: `browse_library`, `open_book`, `search_library`, `search_concepts`, `preview_chunks`, `read_chunks`. The agent calls these directly — no sub-agent needed.\n3. **Universal navigation skill** (`agentlib-knowledge`) — teaches the agent to search cheap metadata first, then drill into specific chunks via `search_library` → `preview_chunks` → `read_chunks`.\n\nThe agent navigates via MCP tool calls against preprocessed files in `~/.claude/plugins/agentlib/library/`.\n\n### How agents navigate the library\n\n```mermaid\ngraph LR\n    Q[\"User question\"] --\u003e SL[\"search_library\u003cbr/\u003econcepts + patterns\u003cbr/\u003elibrary_index.json\"]\n    SL --\u003e PC[\"preview_chunks\u003cbr/\u003echunk metadata\u003cbr/\u003enav.json\"]\n    PC --\u003e RC[\"read_chunks\u003cbr/\u003e2-3 best chunks\u003cbr/\u003e300-500 tok each\"]\n    RC --\u003e A[\"Answer with citations\"]\n```\n\n**Fast path (concept hit):** `search_library` → `preview_chunks` → `read_chunks` — **3 tool calls, ~1.5k tokens**\n\n**Pattern path (cross-domain):** `search_library` (pattern tags) → `preview_chunks` → `read_chunks` — **3 tool calls, ~2.5k tokens**\n\n**Recovery on miss:** related concepts → pattern traversal → `search_concepts` per book → Grep fallback\n\n#### Unified library index\n\n`library_index.json` is the single entry point for the entire library. One file, all books and corpora — queried via `search_library`. Each concept carries:\n\n- **aliases** — abbreviations, acronyms, synonyms (searching \"CDX\" matches \"CycloneDX\")\n- **related** — directly connected concepts in the same domain (\"OAuth 2.0\" → \"JWT\", \"access tokens\")\n- **patterns** — abstract structural fingerprints for cross-domain discovery (see below)\n- **sources** — which books/papers contain the concept and their chunk IDs\n\n#### Pattern fingerprints — associative recall\n\nEvery concept is tagged with 2-3 **pattern fingerprints**: abstract, domain-independent descriptors of its structural nature. These enable a \"this reminds me of...\" capability that keyword search can never provide.\n\nFor example, \"OAuth token rotation\", \"TLS certificate renewal\", and \"SSH key rotation\" all share the pattern `credential-cycling`. An agent reading about token rotation can discover structurally analogous solutions in completely different books — without any keyword overlap.\n\nPattern tags are integrated directly into `library_index.json` and searchable via `search_library`. A seed vocabulary of ~40 common patterns ensures consistency across books; fuzzy matching merges near-duplicates.\n\n#### Chunk preview via nav.json\n\nEach book's `nav.json` lets agents see what's inside each chunk *before* reading it: section title, concepts covered, token count, and prev/next chains. Queried via `preview_chunks`, this eliminates blind reads — the agent picks the 2-3 best chunks from a set of candidates instead of reading 5 and hoping.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/demo_proactive_query.png\" alt=\"AgentLib proactive library query\" width=\"800\"\u003e\n\u003c/p\u003e\n\n*The agent automatically consults the knowledge library when it detects a domain-specific question — no explicit command needed.*\n\n\u003cdetails\u003e\n\u003csummary\u003eExpanded: how the library-researcher navigates\u003c/summary\u003e\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/demo_library_researcher.png\" alt=\"Library researcher agent navigation\" width=\"800\"\u003e\n\u003c/p\u003e\n\u003c/details\u003e\n\n### Metadata layers\n\n```\nLx  \"What do I know?\"     →  library_index.json: concepts, patterns, sources  (search_library)\nLn  \"What's in a book?\"   →  nav.json: structure + chunk metadata + concepts  (preview_chunks)\nL2  \"Give me the content\" →  chunks: 300-500 tok each                         (read_chunks)\nLf  \"Full rebuild\"        →  manifest.json: complete archive per book         (offline)\n```\n\nThree files instead of six — `library_index.json` (1 file, entire library), `nav.json` (per book), and `manifest.json` (per book, full archive for rebuild).\n\nChunks are **content-aware**: tables and code fences are kept atomic (soft cap 500, hard cap 1 000 tokens). PDF tables are extracted via PyMuPDF and rendered as markdown pipe tables. Figures are extracted from PDFs with vision-based summarization, appearing as placeholders in chunks.\n\nThe concept index includes LLM-generated **aliases**, **related concepts**, and **pattern fingerprints** — turning keyword misses into graph traversals and enabling cross-domain discovery.\n\n### Library structure\n\n```\nlibrary/\n├── library_index.json                     ← Lx: unified concept + pattern discovery\n├── books/\n│   ├── catalog.json\n│   └── {book-id}/\n│       ├── nav.json                       ← Ln: structure + chunk metadata + concepts\n│       ├── manifest.json                  ← Lf: full archive for rebuild\n│       └── chunks/\n│           └── {chunk-id}.md              ← L2\n└── corpus/\n    └── {corpus-id}/\n        ├── corpus_catalog.json\n        ├── concept_index.json\n        ├── clusters/{cluster-id}.json\n        └── papers/{paper-id}/\n            ├── nav.json                   ← Ln\n            ├── manifest.json              ← Lf\n            └── chunks/{chunk-id}.md       ← L2\n```\n\n## Benchmarks\n\n### Agent delegation — context-efficient research\n\nThe `library-researcher` agent runs navigation in an isolated context window. Only the synthesized answer returns to the main conversation, keeping it clean for follow-up questions.\n\n**Query: \"What is the dimensionless constant η in Davidson's Planck area formula?\"**\n\n| Metric | AgentLib (agent) | AgentLib (direct) | Raw PDFs |\n|--------|-----------------|-------------------|----------|\n| Main context | **19k (9%)** | 30k (15%) | 19k (9%) |\n| Hidden agent tokens | 13.6k | — | 60.2k |\n| **Total tokens** | **~33k** | ~30k | **~79k** |\n| Time | **32s** | 38s | 1m 9s |\n| Correct answer | Yes | Yes | Yes |\n\nThe agent approach uses **58% fewer total tokens** than raw PDF reading, and keeps the main context at just **3.1k messages** — meaning you can ask many research questions in a single session without filling up the context window.\n\n**Multi-query session (2 questions in one session):**\n\n| Query | Agent tokens | Main context added |\n|-------|-------------|-------------------|\n| Davidson η constant (corpus) | 13.6k | ~3.1k |\n| Prompt injection defenses (book) | 20.5k | ~4.1k |\n| **Total** | **34.1k** | **7.2k** |\n\nWithout the agent, two direct queries would consume ~30k+ in messages. With it, only 7.2k.\n\n### Book queries — 47-82% token reduction\n\n**Question:** \"What specific actor frameworks does the book mention for multiagent communication?\"\n\n| Metric | AgentLib | Raw PDF | Reduction |\n|--------|----------|---------|-----------|\n| Content tokens | 6.9k | 38.6k | **82%** |\n| Answer quality | Correct — Ray, Orleans, Akka | Correct — Ray, Orleans, Akka | Same |\n| Source citations | Yes (chapter + chunk IDs) | No | — |\n\n**Question:** \"What are the maturity levels for SBOM according to the CycloneDX standard?\"\n\n| Metric | AgentLib | Raw PDF | Reduction |\n|--------|----------|---------|-----------|\n| Content tokens | 7.8k | 14.7k | **47%** |\n| Answer quality | Correct (5 dimensions table) | Correct (5 dimensions table) | Same |\n\n### Corpus queries — 57% token reduction\n\n**Question:** \"How does Davidson connect quantum mechanics to general relativity?\"\n\n| Metric | AgentLib | Raw PDFs | Reduction |\n|--------|----------|----------|-----------|\n| Total tokens | 36k | ~83k | **57%** |\n| Time | 43s | 1m 56s | **2.7x faster** |\n| Answer quality | 3 approaches with citations | 4 approaches | Same |\n\n### Cost simulations\n\nSimulated on realistic workloads (15-book library, 487-paper corpus, 80-table database):\n\n|                      | Books |       | Papers |       | Database |       |\n|----------------------|-------|-------|--------|-------|----------|-------|\n| **Metric**           | Base  | AL    | Base   | AL    | Base     | AL    |\n| Tool calls           | 5     | 2     | 6      | 5     | 7        | 4     |\n| Cumul. input tokens  | 25.9K | 4.5K  | 51.7K  | 23.4K | 23.3K    | 10.4K |\n| Wrong reads/queries  | 1     | 0     | 1      | 0     | 2        | 0     |\n| **Token reduction**  |       | **82%** |      | **55%** |        | **55%** |\n\nThe core principle: *no vector databases — just smart, interconnected metadata structures. Concepts link to related concepts, abstract patterns connect ideas across domains, and chunk previews eliminate blind reads.*\n\n## Install\n\n```bash\n# From GitHub\ngit clone https://github.com/barkain/agentlib.git\nclaude --plugin-dir ./agentlib\n\n# Or add as a marketplace plugin\n/plugin marketplace add barkain/agentlib\n/plugin install agentlib\n```\n\n## Usage\n\n### Ingest a book\n```bash\n/agentlib:agentlib-ingest-book ~/books/owasp-guide.pdf\n```\n\nIngestion runs chapter summarization in parallel and batches concept extraction in groups of 50 for large books. If ingestion fails partway through, re-run the same command — completed stages are skipped automatically. Stage 5 (concept extraction) retries up to 3 times on API failures.\n\n### Ingest a paper corpus\n```bash\n/agentlib:agentlib-ingest-corpus ~/papers/my-research-papers/\n```\n\n### Configure API key\n```bash\n/agentlib:agentlib-configure set-key \u003cyour-api-key\u003e\n```\n\n### Browse the library\n```bash\n/agentlib:agentlib-library\n```\n\n### Querying\n\n**Auto-trigger** — just ask naturally. The skill activates when it detects research/knowledge questions:\n\u003e \"What specific actor frameworks does the book mention for multiagent communication?\"\n\n**Explicit invocation** — prefix with `/agentlib-knowledge` when you want the library's answer, not Claude's training data:\n\u003e /agentlib-knowledge What defensive techniques protect against prompt injection?\n\nThe skill uses MCP tools directly: `search_library` → `preview_chunks` → `read_chunks`. Only the synthesized answer with citations returns to your conversation. Pattern tags integrated into `search_library` enable cross-domain analogies automatically.\n\n## LLM Providers\n\nAgentLib supports 5 LLM providers for ingestion and summarization (auto-detected from environment):\n\n| Provider | Model | Env var |\n|----------|-------|---------|\n| Anthropic | Claude Haiku 4.5 | `ANTHROPIC_API_KEY` |\n| OpenAI | GPT-4o Mini | `OPENAI_API_KEY` |\n| xAI | Grok-3 Mini | `XAI_API_KEY` |\n| Google | Gemini 2.0 Flash | `GOOGLE_API_KEY` |\n| DeepSeek | DeepSeek Chat | `DEEPSEEK_API_KEY` |\n\nSet `AGENTLIB_PROVIDER` to override auto-detection. Set `AGENTLIB_CONCURRENCY` to control parallel ingestion workers (default 10).\n\n## Examples\n\n- [Book walkthrough](examples/sbom-walkthrough.md) — ingesting the OWASP CycloneDX SBOM guide and querying it\n- [Corpus walkthrough](examples/corpus-walkthrough.md) — ingesting 8 physics papers by Prof. Aharon Davidson and querying specific formulas\n\n## Development\n\n```bash\nuv sync --dev        # Install dependencies\nuv run pytest        # Run tests\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbarkain%2Fagentlib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbarkain%2Fagentlib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbarkain%2Fagentlib/lists"}