{"id":48011479,"url":"https://github.com/xmpuspus/kb-arena","last_synced_at":"2026-05-02T03:14:13.825Z","repository":{"id":345530257,"uuid":"1182030516","full_name":"xmpuspus/kb-arena","owner":"xmpuspus","description":"Benchmark 7 retrieval strategies on your own docs — naive vector, contextual, QnA pairs, knowledge graph, RAPTOR, PageIndex, and hybrid. Find which KB architecture fits your data.","archived":false,"fork":false,"pushed_at":"2026-04-26T14:05:20.000Z","size":59498,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-26T16:09:53.459Z","etag":null,"topics":["benchmark","chromadb","cli","document-retrieval","evaluation","graphrag","hybrid-search","knowledge-graph","llm","neo4j","python","rag","rag-evaluation","retrieval","retrieval-augmented-generation","vector-search"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xmpuspus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-15T00:12:20.000Z","updated_at":"2026-04-26T14:05:05.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/xmpuspus/kb-arena","commit_stats":null,"previous_names":["xmpuspus/kb-arena"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/xmpuspus/kb-arena","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xmpuspus%2Fkb-arena","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xmpuspus%2Fkb-arena/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xmpuspus%2Fkb-arena/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xmpuspus%2Fkb-arena/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xmpuspus","download_url":"https://codeload.github.com/xmpuspus/kb-arena/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xmpuspus%2Fkb-arena/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32521285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-02T01:12:54.858Z","status":"online","status_checked_at":"2026-05-02T02:00:05.923Z","response_time":132,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","chromadb","cli","document-retrieval","evaluation","graphrag","hybrid-search","knowledge-graph","llm","neo4j","python","rag","rag-evaluation","retrieval","retrieval-augmented-generation","vector-search"],"created_at":"2026-04-04T13:39:28.894Z","updated_at":"2026-05-02T03:14:13.805Z","avatar_url":"https://github.com/xmpuspus.png","language":"Python","funding_links":[],"categories":["Evaluation \u0026 Benchmarking"],"sub_categories":["Frameworks \u0026 Tools"],"readme":"# KB Arena\n\n\u003e **Should you use Graph RAG, Vector RAG, or Hybrid?**\n\u003e KB Arena tells you — empirically, on your own docs.\n\nNine retrieval architectures. Your documentation. One winner.\n\nKB Arena is the only open-source benchmark that runs **architecturally distinct** retrieval strategies — naive vector, contextual vector, Q\u0026A pairs, knowledge graph, hybrid (RRF-fused), RAPTOR, PageIndex, BM25, and **rerank-vector** (cross-encoder reranking) — head-to-head on your own corpus, with auto-generated questions across 5 difficulty tiers, IR metrics (Recall@k, MRR, NDCG@k), RAGAS metrics, ELO arena voting, a CI gate, and a strategy plugin system.\n\nEmbeddings: pluggable across **OpenAI, Voyage-3, Cohere, Gemini, BGE (local), Ollama (local)** via `KB_ARENA_EMBEDDING_PROVIDER`. Rerankers: **BGE-v2-m3 (local), Cohere Rerank, Voyage Rerank** via `KB_ARENA_RERANKER_BACKEND`.\n\n![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue) ![Pydantic v2](https://img.shields.io/badge/pydantic-v2-green) ![Tests](https://img.shields.io/badge/tests-558-brightgreen) ![License](https://img.shields.io/badge/license-MIT-blue) ![PyPI](https://img.shields.io/pypi/v/kb-arena) ![CI](https://github.com/xmpuspus/kb-arena/actions/workflows/ci.yml/badge.svg)\n\n![KB Arena Demo](docs/demo.gif)\n\n---\n\n## Try It in 10 Seconds (no API keys)\n\n```bash\npip install kb-arena\nkb-arena demo\n```\n\nThis launches the dashboard with pre-computed results from the AWS Compute corpus (75 questions, 9 strategies, 5 difficulty tiers). The demo runs in **read-only mode** — chat, arena, and tools endpoints stay disabled until you set an API key. No Docker, no Neo4j, no surprises.\n\n![Dashboard walkthrough](docs/demo-ui-walkthrough.gif)\n\nTo enable live chat / arena voting / tools, set `KB_ARENA_ANTHROPIC_API_KEY` (or `KB_ARENA_OPENAI_API_KEY`, or use `KB_ARENA_LLM_PROVIDER=ollama` for free local inference).\n\n## No-API-Keys Quick Start (Ollama)\n\n```bash\n# Free local inference — no Anthropic/OpenAI keys needed\nollama pull llama3.1:8b\nexport KB_ARENA_LLM_PROVIDER=ollama\nkb-arena init-corpus my-docs \u0026\u0026 cp ~/my-docs/*.md datasets/my-docs/raw/\nkb-arena run --corpus my-docs        # one command, all stages, resumable\nkb-arena serve\n```\n\n---\n\n## How KB Arena Differs from Other RAG Evaluation Tools\n\nMost RAG evaluation tools answer \"how well does my pipeline work?\" KB Arena answers a different question: \"which retrieval architecture works best for my docs?\"\n\n| | KB Arena | RAGAS | MTEB / BEIR | GraphRAG | DeepEval |\n|---|---|---|---|---|---|\n| Compares multiple architectures | Yes - 9 strategies | No - evaluates your existing pipeline | No - compares embedding models | No - only their own approach | No |\n| Works on your own docs | Yes | Yes | No - fixed public datasets | No - fixed datasets | Yes |\n| Includes graph + vector + hybrid | Yes | Vector/hybrid only | Embeddings only | Graph only | Any |\n| Auto-generates benchmark questions | Yes - 5 difficulty tiers | Manual | Fixed | Fixed | Manual |\n| Interactive comparison UI | Yes - chatbot + benchmark explorer | No | Leaderboard only | No | Dashboard |\n| Chatbot per strategy | Yes | No | No | No | No |\n| Standard IR metrics (NDCG, MRR) | Yes - v0.5.0 Retriever Lab | Yes | Yes | Partial | No |\n\nIf you want to know whether a knowledge graph, Q\u0026A pairs, or plain vector search is the right architecture for your documentation, that's what KB Arena is for.\n\n---\n\n## What's New in v0.6.0 — Hardening, 9th strategy, embedding providers, public leaderboard\n\nA focused release that closes the four ship-blocker classes from a multi-dimension audit and adds three differentiated capabilities. Headline numbers in the README are now backed by code that does what it says.\n\n### New: 9th strategy — `rerank_vector`\n\nNaive Vector retrieval at top-`k`×4, rescored by a cross-encoder, regenerated on the post-rerank top-k. Three backends, all selected via `KB_ARENA_RERANKER_BACKEND`:\n\n- `bge` — BAAI/bge-reranker-v2-m3, **local, free, no key**, default\n- `cohere` — Cohere Rerank v3.5 / v4\n- `voyage` — Voyage Rerank 2.5\n\nThe 2026 RAG consensus is that a reranker is the highest-leverage production accuracy lever. KB Arena now lets you benchmark every architecture *with and without* one, on your own corpus.\n\n### New: embedding provider abstraction\n\n`KB_ARENA_EMBEDDING_PROVIDER` selects the embedding backend used by every vector strategy:\n\n| Provider | Why pick it |\n|---|---|\n| `openai` (default) | text-embedding-3-large |\n| `voyage` | Current MTEB retrieval leader (+10.58% over OpenAI at matched dims) |\n| `cohere` | Cohere embed-v4 |\n| `bge` | BAAI/bge-large-en-v1.5 — **local, no key**, on-prem-friendly |\n| `ollama` | Local via Ollama, no key |\n| `gemini` | text-embedding-004 |\n\nUnblocks privacy / on-prem teams (federal, healthcare, finance) and Gemini-shop / Bedrock-shop deployments.\n\n### New: `kb-arena run --resume`\n\nReplaces the seven-step pipeline with one resumable command. Each stage writes a checkpoint to `datasets/{corpus}/.pipeline_state.json`; a flaky LLM call no longer means starting over.\n\n```bash\nkb-arena run --corpus my-docs --docs ~/my-docs/   # one shot\nkb-arena run --corpus my-docs --resume            # pick up where it stopped\n```\n\n### New: public leaderboard\n\n`/api/leaderboard` aggregates every benchmark run in `results/run_*` per (corpus, strategy) with mean accuracy, Recall@5, NDCG@5, cost, and latency. Plus a Next.js `/leaderboard` page that consumes it. **No auth** — safe for hosted deploys; the static dashboard, leaderboard, benchmark results, and corpora endpoints stay available even when chat is locked down.\n\n### Hardened: never drains your credits\n\nThe hosted-demo cost-bomb path is closed. New defaults:\n\n- `KB_ARENA_API_TOKEN` — when set, every LLM endpoint requires `Authorization: Bearer …` (constant-time compared)\n- `KB_ARENA_DEMO_MODE` — auto-enabled when no API key is configured; chat / arena / tools return 503 while the static surfaces keep working\n- `Field(max_length=4000)` on every user input + Pydantic models for the arena endpoints\n- Default `KB_ARENA_BENCHMARK_COST_CAP_USD` flipped from 0 (unlimited) to **10.0**\n- Bounded-deque rate limiter, optional `KB_ARENA_TRUSTED_PROXY_HEADER` for nginx / Cloudflare deployments\n\n### Hardened: Cypher safety + SSRF\n\n- Every Neo4j read path now opens a session with `default_access_mode=READ_ACCESS` — defense in depth at the Bolt protocol level\n- Write regex tightened to also reject `apoc.create | merge | refactor | delete | remove | set | drop | iterate | cypher.runWrite | export | trigger`\n- `kb-arena ingest \u003curl\u003e` rejects `file://`, private/loopback/link-local IPs (post-DNS), and AWS / GCE metadata hostnames; auto-redirect off, per-hop validation\n- Dockerfile non-root user, HEALTHCHECK, `KB_ARENA_DEMO_MODE=true` baked in so a freshly built image cannot accidentally enable chat\n\n### Fixed: Hybrid actually fuses passages, not answers\n\nThe procedural branch used to rerank already-generated answer strings — that explained the embarrassing 8% Recall@5 in the v0.5.0 table. v0.6.0 retrieves real `RetrievedChunk` content from each sub-strategy at top-`k`×2, fuses with **Reciprocal Rank Fusion (k=60)** (which the README has always claimed), regenerates over the fused context, and runs the vector + graph queries with `asyncio.gather`. IntentRouter is now wired into `get_strategy(\"hybrid\")` so the advertised three-stage classification actually fires.\n\n### Fixed: cross-section graph edges\n\nKnowledge-graph extraction used to drop every relationship pointing outside its section's batch — multi-hop queries were structurally impossible. v0.6.0 keeps cross-section edges and validates them against the global FQN union *after* every section has been extracted, restoring multi-hop reasoning.\n\n### Fixed: ground-truth labelling no longer circular\n\n`expected_chunks.yaml` candidates are now drawn from the union of BM25, naive vector, and contextual vector top-N (when those indexes exist). The previous BM25-only pool biased Recall@5 in favour of keyword-overlap strategies — closes the methodological critique.\n\n### Fixed: cross-tenant data leak\n\nStrategy `last_*` instance fields were stomped by concurrent SSE consumers — two simultaneous chat requests could see each other's sources. Per-call metrics now ride the streamed token sequence as a `_kb_arena_meta` packet; the legacy fields stay only for plugin back-compat.\n\n### Demo polish\n\n- `kb-arena demo` truly zero-config — `LLMClient` init is tolerant of missing keys, `demo_mode` auto-enables, dashboard loads instantly\n- `aws-compute_bm25.json` is bundled (was missing in v0.5.0 — the 8th strategy showed empty in fresh installs)\n- README hero rewritten with the question-frame pitch, plus a No-API-Keys Quick Start using Ollama\n- Re-recorded hero GIF + UI walkthrough GIF + retriever-lab CLI GIF, all driven by checked-in `vhs` tape scripts in `docs/tapes/`\n- `kb-arena --version` flag\n\n### Roadmap (Phase 3 — prepped, not yet published)\n\n- Public Arena Mode with ELO at `kb-arena.dev/arena` — \"Chatbot Arena, but for RAG architectures\"\n- Hosted demo at `kb-arena.dev` (Vercel + fly.io configs in `deploy/`)\n- BEIR / MTEB native dataset adapter, ColBERTv2 strategy via RAGatouille, Self-RAG / CRAG agentic strategies, OpenTelemetry tracing\n\n---\n\n## What's New in v0.5.0 — Retriever Lab\n\nClassical IR metrics computed at the chunk level. See exactly which chunks each strategy surfaced, which it missed, and why one strategy beats another at a metric level — not just at the answer level.\n\n![Retriever Lab Demo](docs/demo-retriever-lab.gif)\n\n### Metrics\n\n`Recall@k`, `Precision@k`, `Hit@k`, `MRR`, `NDCG@k` — computed for every benchmark query, aggregated per strategy, rendered in the Markdown report.\n\n### `kb-arena retriever-lab`\n\nRetrieval-only benchmark. Skips LLM generation, runs ~10x cheaper than `kb-arena benchmark`. Streams a live Rich table of metrics as each strategy completes. Writes per-question chunk-level results to `results/run_{id}/retriever_lab.json`.\n\n```bash\nkb-arena label-chunks --corpus aws-compute     # Generate ground truth (BM25 + Haiku judge)\nkb-arena retriever-lab --corpus aws-compute    # Live IR metrics, no LLM cost\n```\n\n### `/retriever-lab` web page\n\nAggregate metrics card per strategy plus per-question drill-down. Click a question, see the chunks each strategy surfaced with rank, score, and HIT/MISS badges so you can tell at a glance where retrieval breaks down.\n\n![Retriever Lab UI](docs/retriever-lab-ui.png)\n\n### Real numbers — aws-compute corpus, run `855aac4e`\n\n35 of 75 questions have chunk-level ground truth (the corpus only covers Lambda, API Gateway, ECS Fargate; the other 40 questions reference services not in the demo corpus, so their metrics fall to 0 — a useful coverage signal in itself).\n\n| Strategy | Recall@5 | Precision@5 | Hit@5 | MRR | NDCG@5 |\n|---|---|---|---|---|---|\n| **contextual_vector** | **35.5%** | **24.5%** | 46.7% | **0.433** | **0.388** |\n| naive_vector | 35.2% | 23.2% | 46.7% | 0.414 | 0.367 |\n| raptor | 35.2% | 23.2% | 46.7% | 0.414 | 0.367 |\n| bm25 | 27.5% | 17.1% | 44.0% | 0.352 | 0.278 |\n| hybrid | 8.0% | 4.8% | 9.3% | 0.093 | 0.086 |\n| pageindex | 6.1% | 5.0% | 14.7% | 0.111 | 0.076 |\n| qna_pairs | 0.0% | 0.0% | 0.0% | 0.000 | 0.000 |\n| knowledge_graph | 0.0% | 0.0% | 0.0% | 0.000 | 0.000 |\n\nContextual Vector edges out Naive Vector on ranking quality (MRR / NDCG) thanks to heading-path prefixes; Hybrid drops because the knowledge_graph leg is mocked when Neo4j isn't connected; QnA Pairs operates on Q-A identity, not section identity, so it needs doc-level labels (see `docs/retriever-lab.md` for interpretation).\n\n### Roadmap\n\n- v1.1: reranker comparison (cross-encoder vs. cohere-rerank vs. bge-reranker)\n\n---\n\n## What's New in v0.4.0\n\n### RAGAS Metrics\n\nIndustry-standard evaluation metrics alongside the existing LLM judge. Enable with `--ragas` or `KB_ARENA_BENCHMARK_ENABLE_RAGAS=true`.\n\n![RAGAS Metrics](docs/demo-ragas.png)\n\nAdds four metrics per question: **faithfulness** (answer grounded in context), **context precision** (retrieved chunks are relevant), **context recall** (context covers the reference), and **answer relevancy** (answer addresses the question).\n\n### Reference-Free Evaluation\n\nBenchmark without pre-written ground truth -- useful for quick evaluation of new corpora before investing in question generation.\n\n![Reference-Free Evaluation](docs/demo-reference-free.png)\n\nScores on faithfulness and answer relevancy only (no accuracy/completeness since there's no reference to compare against).\n\n### Strategy Plugin System\n\nBring your own retrieval strategy without forking. Your module exports a single `Strategy` subclass with `build_index()` and `query()` methods.\n\n![Strategy Plugin](docs/demo-plugin.png)\n\n### CI/CD Eval Command\n\nGate merges on retrieval quality. Exits non-zero if any strategy falls below thresholds. Pair with `--format json` for machine-readable output.\n\n![CI/CD Eval](docs/demo-eval-ci.png)\n\n### Cost Cap\n\nHalt a benchmark run automatically if cumulative cost exceeds your budget. Set via `KB_ARENA_BENCHMARK_COST_CAP_USD`.\n\n![Cost Cap](docs/demo-cost-cap.png)\n\n### Dry-Run Cost Estimates\n\nPreview query counts, estimated cost, and estimated time before committing to a full benchmark run.\n\n![Dry-Run Estimates](docs/demo-dry-run.png)\n\n### Debug Endpoint\n\nTrace the full retrieval pipeline -- intent classification, retrieved sources, latency breakdown, and cost -- without generating a final answer.\n\n![Debug Endpoint](docs/demo-debug.png)\n\n### Readiness Probe\n\nThe `/ready` endpoint returns 503 if Neo4j is configured but unreachable. Use as a k8s readiness probe or Docker healthcheck.\n\n![Ready Endpoint](docs/demo-ready.png)\n\n### Side-by-Side Strategy Comparison\n\nNew \"Compare\" view in the benchmark UI lets you pick two strategies and see tier-by-tier accuracy, latency, and cost differences side by side.\n\n![Compare View](docs/compare-view.png)\n\n### Other Reliability Improvements\n\n- **Exponential backoff** -- benchmark retries use `1s, 2s, 4s` instead of linear `1s, 2s, 3s`\n- **Embedding retry** -- OpenAI embedding API calls retry 3x with exponential backoff and 30s timeout\n- **Eval memoization** -- identical answer+reference pairs are scored once and cached\n- **Arena JSONL** -- append-only vote log at `results/arena_votes.jsonl` survives state resets\n- **Corpus validation** -- tightened from denylist to regex allowlist `^[a-zA-Z0-9_-]+$`\n\n---\n\n## What's New in v0.3.1\n\n### Production Hardening\n\nSession management, error handling, and API configuration improvements for real deployments:\n\n- **Session ID support** -- pass `X-Session-ID` header instead of relying on IP-based sessions. Fixes shared proxy and network-switching issues.\n- **Session TTL** -- idle sessions are automatically evicted (default 30 min, configurable via `KB_ARENA_SESSION_TTL_MINUTES`)\n- **CORS configuration** -- set allowed origins via `KB_ARENA_CORS_ORIGINS` env var instead of hardcoded localhost\n- **Corpus validation** -- graph build API validates corpus exists with processed documents before starting\n- **Specific exception handling** -- Neo4j connection errors, graph extraction failures, and stream errors now catch specific types instead of bare `except Exception`\n\n### Streaming Cost Tracking\n\nOpenAI and Ollama providers now capture token usage after streaming completes -- previously only Anthropic tracked streaming costs. The chatbot demo now reports accurate `cost_usd` for all three providers.\n\n### Faster QnA Index Building\n\nQ\u0026A pair generation during `build-vectors` is now parallelized with `asyncio.gather()` (5 concurrent). Building QnA indexes on large corpora is up to 5x faster.\n\n### Custom Exception Hierarchy\n\nNew `kb_arena.exceptions` module with typed exceptions (`IngestError`, `GraphError`, `StrategyError`, `EvaluationError`, `LLMError`) for better error handling and debugging.\n\n### Frontend Error Boundary\n\nReact error boundary wraps all page content -- API failures and render errors now show a recovery UI instead of a blank page.\n\n### Graph Schema Cleanup\n\nRemoved dead Cypher templates that referenced non-existent relationship types (`DEPRECATED_BY`, `INHERITS`, `REQUIRES`, `EXAMPLE_OF`). Remaining templates now use only valid universal schema types.\n\n---\n\n## What's New in v0.3.0\n\n### Multi-LLM Provider Support\n\nNo longer locked to Anthropic. Choose your LLM backend:\n\n```bash\n# Anthropic (default)\nexport KB_ARENA_LLM_PROVIDER=anthropic\nexport KB_ARENA_ANTHROPIC_API_KEY=sk-ant-...\n\n# OpenAI\nexport KB_ARENA_LLM_PROVIDER=openai\nexport KB_ARENA_OPENAI_API_KEY=sk-...\n\n# Ollama (free local inference)\nexport KB_ARENA_LLM_PROVIDER=ollama\n```\n\nEach provider has its own model mapping -- GPT-4o for generation, GPT-4o-mini for classification when using OpenAI; any local model when using Ollama.\n\n### Strategy Arena - Blind A/B Comparison\n\nA new **Arena mode** for blind head-to-head strategy battles. Ask a question, two random strategies answer it, you vote for the better response. ELO ratings emerge over time.\n\n```bash\nkb-arena serve  # then open /arena in your browser\n```\n\n![Arena Mode](docs/demo-arena.gif)\n\n### BM25 Baseline Strategy\n\nStrategy #8: classic BM25 keyword matching. The pre-neural baseline that answers \"do I even need embeddings for my docs?\" Uses BM25Okapi scoring with LLM answer generation.\n\n### Parallel Benchmark Execution\n\nStrategies now run concurrently instead of sequentially. A full 8-strategy benchmark that took 60-90 minutes now completes in 15-25 minutes.\n\n```bash\nkb-arena benchmark --corpus my-docs              # parallel by default\nkb-arena benchmark --corpus my-docs --no-parallel # sequential if needed\n```\n\n### Accurate Token Counting\n\nReplaced whitespace tokenization with tiktoken (cl100k_base BPE). Chunk sizes are now measured in real tokens, not word counts. Previous \"512-token chunks\" were actually ~370 real tokens - now they're exactly 512.\n\n### Cost Tracking Fixed\n\nFixed cost propagation across all multi-call strategies:\n- Knowledge graph: Text-to-Cypher generation cost now tracked\n- PageIndex: beam traversal LLM cost now accumulated\n- Streaming: token usage captured via `get_final_message()`\n- Latency decomposition: retrieval vs generation timing now measured separately\n\n### Run Comparison\n\nBenchmark runs now have unique IDs and timestamps. Results are preserved across runs instead of overwritten:\n\n```bash\nkb-arena benchmark --corpus my-docs\n# Run ID: a1b2c3d4\n# Results: results/run_a1b2c3d4/my-docs_naive_vector.json\n\nkb-arena report --run-id a1b2c3d4\n```\n\n### CI/CD Integration\n\nFail your pipeline if retrieval quality drops:\n\n```bash\nkb-arena benchmark --corpus my-docs --fail-below 0.7\n# Exit code 1 if any strategy's accuracy falls below 70%\n```\n\n### Export Formats\n\nGenerate reports in CSV or self-contained HTML:\n\n```bash\nkb-arena report --format csv    # spreadsheet-ready\nkb-arena report --format html   # shareable dashboard\n```\n\n### Bundled Frontend\n\n`kb-arena demo` now serves a complete dashboard -- no separate Next.js dev server needed. The static frontend is bundled with the pip package.\n\n### Changelog\n\n| Version | Date | Changes |\n|---------|------|---------|\n| 0.6.1 | 2026-05-02 | Docs-only — adds the missing \"What's New in v0.6.0\" section, bumps strategy count to 9 across the README, lists `rerank_vector` in the strategy table. No code changes from v0.6.0. |\n| 0.6.0 | 2026-05-02 | Hardening + 9th strategy + embedding providers + public leaderboard. New `rerank_vector` strategy with BGE/Cohere/Voyage backends. Embedding provider abstraction (OpenAI/Voyage/Cohere/BGE/Ollama/Gemini). One-shot `kb-arena run --resume` orchestrator. Public read-only `/api/leaderboard` + `/leaderboard` page. Bearer-token auth + demo-mode + 4000-char input cap on every LLM endpoint. Default cost cap 0 → 10 USD. Cypher sessions opened READ_ACCESS, APOC write regex tightened. WebParser SSRF guard. Hybrid procedural rewritten to fuse real passages via RRF (was reranking answer strings). IntentRouter actually wired in. Cross-section graph edges no longer dropped. Ground-truth pool widened from BM25-only. Cross-tenant strategy state leak fixed. BM25 result file bundled. Re-recorded demo GIFs. 558 tests. |\n| 0.5.0 | 2026-04-26 | Retriever Lab — classical IR metrics (Recall@k, Precision@k, Hit@k, MRR, NDCG@k) computed per query, `RetrievalTrace` exposes retrieved chunks per strategy with rank+score, `kb-arena retriever-lab` retrieval-only command (~10x cheaper than `benchmark`), `kb-arena label-chunks` BM25+Haiku ground-truth generator, `--top-k` flag on `benchmark`, `/retriever-lab` web page with HIT/MISS drill-down, hierarchical chunk-id matching, 558 tests |\n| 0.4.0 | 2026-04-04 | RAGAS metrics (faithfulness, context precision/recall, answer relevancy), reference-free eval mode, LLM eval memoization, cost cap enforcement, strategy plugin system (`--strategy-module`), CI/CD eval command (`kb-arena eval --ci`), debug/explain endpoint, `/ready` health probe, exponential backoff on retries, embedding API retry+timeout, arena ELO JSONL persistence, side-by-side strategy comparison UI, benchmark dry-run cost estimates, tightened corpus validation |\n| 0.3.1 | 2026-03-26 | Production hardening (session IDs, TTL, CORS config, corpus validation), streaming cost for OpenAI/Ollama, parallel QnA build, custom exceptions, error boundary, graph schema cleanup, 494 tests |\n| 0.3.0 | 2026-03-20 | Multi-LLM providers (Anthropic/OpenAI/Ollama), Strategy Arena, BM25 strategy, parallel benchmarks, tiktoken chunking, cost tracking fixes, run comparison, CI fail-below, CSV/HTML export, bundled frontend |\n| 0.2.1 | 2026-03-03 | PageIndex strategy, verbose mode, retry logic, parallel extraction, eval independence |\n| 0.2.0 | 2026-02-28 | RAPTOR strategy, hybrid improvements, 7 strategies total |\n| 0.1.0 | 2026-02-20 | Initial release: 5 strategies, benchmark engine, chatbot demo |\n\n---\n\n## Quick Start -- I Just Have My Docs\n\nYou have documentation files (markdown, HTML, text, PDFs). You want to know which retrieval strategy works best. Here's everything from zero.\n\n### Prerequisites\n\n1. **Python 3.11+** and **pip**\n2. **Docker** (for Neo4j — the knowledge graph strategy needs it)\n3. **API keys** for your LLM provider (Anthropic, OpenAI, or Ollama) and OpenAI (embeddings)\n\nThat's it. No Neo4j expertise needed. No graph database experience required. KB Arena handles the schema, extraction, and querying.\n\n### Step 1: Install\n\n```bash\npip install kb-arena\n\n# Optional: install format-specific parsers\npip install kb-arena[pdf]        # PDF support (PyMuPDF)\npip install kb-arena[docx]       # Word documents (mammoth)\npip install kb-arena[web]        # Web scraping (httpx)\npip install kb-arena[all-formats] # All of the above\n```\n\n### Step 2: Set API keys\n\nCreate a `.env` file or export directly:\n\n```bash\nexport KB_ARENA_ANTHROPIC_API_KEY=sk-ant-...    # Claude for generation + evaluation\nexport KB_ARENA_OPENAI_API_KEY=sk-...           # OpenAI for text-embedding-3-large\n```\n\n### Step 3: Start Neo4j\n\nKB Arena uses Neo4j for the knowledge graph strategy. One command:\n\n```bash\ndocker compose up neo4j -d\n```\n\nThis starts Neo4j on `localhost:7687` with default credentials (`neo4j` / `kbarena1`). No configuration needed — KB Arena creates the schema automatically.\n\nIf you don't have the `docker-compose.yml`, create one:\n\n```yaml\nservices:\n  neo4j:\n    image: neo4j:5-community\n    ports:\n      - \"7474:7474\"\n      - \"7687:7687\"\n    environment:\n      - NEO4J_AUTH=neo4j/kbarena1\n      - NEO4J_PLUGINS=[\"apoc\"]\n    volumes:\n      - neo4j_data:/data\n\nvolumes:\n  neo4j_data:\n```\n\n**Don't want Docker?** KB Arena still works — the vector strategies, RAPTOR, and PageIndex run without Neo4j. Only the knowledge graph and hybrid strategies need it.\n\n### Step 4: Run the pipeline\n\n```bash\n# Scaffold a new corpus\nkb-arena init-corpus my-docs\n\n# Drop your docs into the raw/ directory\ncp ~/my-documentation/*.md datasets/my-docs/raw/\n# Supports: .md, .html, .txt, .pdf, .docx, .csv, .tsv — auto-detected\n\n# Parse into the unified Document model (JSONL intermediate files)\nkb-arena ingest datasets/my-docs/raw/ --corpus my-docs\n\n# Or ingest directly from a URL or GitHub repo\nkb-arena ingest https://docs.example.com --corpus my-docs\nkb-arena ingest github:owner/repo --corpus my-docs\n\n# Build the knowledge graph in Neo4j (entities + relationships)\nkb-arena build-graph --corpus my-docs\n\n# Build vector indexes in ChromaDB (local, no server needed)\nkb-arena build-vectors --corpus my-docs\n\n# Auto-generate benchmark questions from your docs (10 per difficulty tier)\nkb-arena generate-questions --corpus my-docs --count 50\n\n# Run the benchmark (each question x 9 strategies, 4-pass evaluation)\nkb-arena benchmark --corpus my-docs\n\n# Launch the web UI to explore results\nkb-arena serve\n```\n\nOpen `http://localhost:8000` for the API, `http://localhost:3000` for the dashboard.\n\n### Step 5: Read the results\n\nThe benchmark produces:\n- **Accuracy by tier** — which strategy handles simple lookups vs multi-hop architecture questions\n- **Latency percentiles** — p50, p95, p99 per strategy\n- **Cost per query** — token usage and API cost\n- **Composite ranking** — 0.5 * accuracy + 0.3 * reliability + 0.2 * latency\n\nResults are saved to `results/` as JSON and displayed in the web dashboard.\n\n---\n\n## Full Stack with Docker Compose\n\nRun everything — Neo4j, the API server, and the frontend — in one command:\n\n```bash\n# Set your API keys\nexport ANTHROPIC_API_KEY=sk-ant-...\nexport OPENAI_API_KEY=sk-...\n\n# Start all services\ndocker compose up -d\n\n# Open the dashboard\nopen http://localhost:3000\n```\n\nThe compose file starts Neo4j (port 7474/7687), the FastAPI backend (port 8000), and the Next.js frontend (port 3000).\n\n---\n\n## Using the Built-in AWS Example\n\nThe AWS Compute corpus ships ready to use (75 questions across 5 difficulty tiers):\n\n```bash\nkb-arena ingest ./datasets/aws-compute/raw/ --corpus aws-compute\nkb-arena build-graph --corpus aws-compute\nkb-arena build-vectors --corpus aws-compute\nkb-arena benchmark --corpus aws-compute\nkb-arena label-chunks --corpus aws-compute        # v0.5.0: ground truth for IR metrics\nkb-arena retriever-lab --corpus aws-compute       # v0.5.0: classical IR metrics, no LLM cost\nkb-arena serve\n```\n\n---\n\n## Screenshots\n\n**Home** — Overview of the 9 strategies, difficulty tiers, and evaluation methodology.\n\n![Home page](docs/screenshot-home.png)\n\n**Strategy comparison** — Ask the same question to all 9 strategies simultaneously. Compare answers, sources, latency, and cost side-by-side.\n\n![Strategy comparison demo](docs/screenshot-demo.png)\n\n**Benchmark results** — Accuracy table by tier with grouped bar chart.\n\n![Benchmark results](docs/screenshot-benchmark.png)\n\n**Knowledge graph** — Interactive force-directed visualization of entities extracted from your docs.\n\n![Knowledge graph viewer](docs/screenshot-graph.png)\n\n**Live graph build** — Watch entities and relationships stream in as the extractor runs.\n\n![Live graph animation](docs/demo-live-graph.gif)\n\n---\n\n## Documentation Tools\n\nBeyond benchmarking, KB Arena includes three standalone tools that work on any documentation corpus.\n\nAll three tools are available as CLI commands and through the web UI at `/tools`.\n\n### Q\u0026A Generator\n\nGenerate Q\u0026A pairs from your docs — use them for chatbot training, FAQ pages, or search indexes. Only needs an Anthropic key (no embeddings, no vector DB).\n\n```bash\nkb-arena generate-qa --corpus my-docs\n# Outputs: datasets/my-docs/qa-pairs/qa_pairs.jsonl\n```\n\n**CLI**\n\n![Q\u0026A Generator CLI](docs/demo-generate-qa.gif)\n\n**Web UI**\n\n![Q\u0026A Generator Web UI](docs/demo-tools-generate.gif)\n\n### Docs Gap Analyzer\n\nFind what's missing in your documentation before your users complain about it. Generates Q\u0026A pairs per section, self-evaluates them, and classifies each section as strong (\u003e=70%), weak (30-70%), or gap (\u003c30%).\n\n```bash\nkb-arena audit --corpus my-docs\n```\n\n**CLI**\n\n![Docs Audit CLI](docs/demo-audit.gif)\n\n**Web UI**\n\n![Docs Audit Web UI](docs/demo-tools-audit.gif)\n\n### Fix My Docs\n\nGet actionable recommendations with draft content to improve your docs. Runs the audit internally, then generates prioritized fixes for weak and gap sections.\n\n```bash\nkb-arena fix --corpus my-docs --max-fixes 5\n```\n\n**CLI**\n\n![Fix My Docs CLI](docs/demo-fix.gif)\n\n**Web UI**\n\n![Fix My Docs Web UI](docs/demo-tools-fix.gif)\n\nPipeline: `generate-qa` → `audit` → `fix` — each command builds on the previous. Or run them independently via CLI or the web UI.\n\n---\n\n## Benchmark Results (AWS Compute Corpus)\n\nReal numbers from 75 questions across 5 difficulty tiers, evaluated with a 4-pass system (structural checks + LLM-as-judge):\n\n| Strategy | Overall | T1 Lookup | T2 How-To | T3 Comparison | T4 Integration | T5 Architecture | Avg Latency | Cost |\n|---|---|---|---|---|---|---|---|---|\n| **Q\u0026A Pairs** | **79.2%** | 79% | 85% | **83%** | **84%** | 66% | 9.0s | $0.48 |\n| Knowledge Graph | 71.5% | 72% | 69% | 61% | 77% | **79%** | 20.3s | $1.37 |\n| Hybrid | 64.7% | 39% | 81% | 61% | 80% | 62% | 41.5s | $3.02 |\n| RAPTOR | 25.3% | 30% | 16% | 15% | 36% | 30% | 7.2s | $0.69 |\n| Naive Vector | 20.7% | 27% | 15% | 14% | 26% | 22% | 6.4s | $0.33 |\n| Contextual Vector | 16.5% | 25% | 11% | 9% | 26% | 11% | 5.1s | $0.29 |\n| BM25 | 14.0% | 24% | 11% | 9% | 16% | 10% | 4.5s | $0.26 |\n| PageIndex | 14.3% | 19% | 12% | 7% | 21% | 12% | 10.9s | $0.29 |\n\n**Key takeaway:** Q\u0026A pairs dominate overall because pre-generated answers sidestep retrieval failures. Knowledge graph leads on architecture questions (T5: 79%) where structured graph traversal shines. The hybrid strategy adapts per question type but pays a latency/cost premium. BM25 -- the pre-neural keyword baseline -- scores 14.0% overall at the lowest cost ($0.26), confirming that embeddings add meaningful value for this corpus. PageIndex -- the vectorless, reasoning-based approach -- scores comparably to contextual vector at $0.29, demonstrating that LLM tree traversal is a viable alternative to embeddings on well-structured docs. RAPTOR's hierarchical retrieval shows strength at T4/T5 but needs a larger corpus. Pure vector RAG -- what most teams ship -- scores under 21%. Cost ranges from $0.26 (BM25) to $3.02 (hybrid) for the full 75-question benchmark.\n\nThese are results from the built-in AWS Compute corpus. Your mileage will vary — that's the whole point of running it on your own docs.\n\n---\n\n## The 9 Strategies\n\n| # | Strategy | How it works | Best at |\n|---|----------|-------------|---------|\n| 1 | **Naive Vector** | Chunk → embed → cosine similarity → generate | Fast lookups, simple factoid questions |\n| 2 | **Contextual Vector** | Chunk + parent context → embed → rank | Disambiguating domain-specific terms |\n| 3 | **Q\u0026A Pairs** | LLM pre-generates Q\u0026A at index time → match | Common questions with known answers |\n| 4 | **Knowledge Graph** | Entities → Neo4j → Cypher templates → generate | Multi-hop dependencies, cross-topic queries |\n| 5 | **Hybrid** | Intent routing → vector + graph fused via RRF | Adapts per question type |\n| 6 | **RAPTOR** | Cluster chunks → LLM topic summaries → recursive tree → query all levels | Cross-document synthesis, broad topic questions |\n| 7 | **PageIndex** | Build tree index from doc structure → LLM beam search traversal → no vectors | Well-structured docs, reasoning over hierarchy |\n| 8 | **BM25** | Classic keyword matching (BM25Okapi) → LLM generation | Pre-neural baseline — \"do I even need embeddings?\" |\n| 9 | **Rerank Vector** | Naive Vector at top-k×4 → cross-encoder rerank (BGE / Cohere / Voyage) → top-k → generate | The 2026 reranker question: \"is the upgrade worth it on my docs?\" |\n\n---\n\n## Question Tiers\n\nQuestions are organized into 5 difficulty tiers:\n\n| Tier | Type | Hops | What it tests |\n|------|------|------|---------------|\n| 1 | Lookup | 1 | Single-fact lookup from one document |\n| 2 | How-To | 1-2 | Multi-step processes, configuration sequences |\n| 3 | Comparison | 2-3 | Comparing alternatives, trade-offs between options |\n| 4 | Integration | 3-4 | Dependencies and connections between concepts |\n| 5 | Architecture | 3-5 | Cross-document synthesis, transitive reasoning |\n\nUse `kb-arena generate-questions` to auto-generate questions from your docs, or write them by hand in YAML.\n\n---\n\n## Supported Formats\n\n| Format | Extensions / Input | Optional Dep | Notes |\n|--------|-------------------|--------------|-------|\n| Markdown | `.md`, `.markdown`, `.rst` | — | Heading hierarchy, code blocks, tables |\n| HTML | `.html`, `.htm` | — | Strips nav/footer, extracts structure |\n| Plain text | `.txt`, `.text` | — | ALL CAPS heading detection |\n| PDF | `.pdf` | `kb-arena[pdf]` | Font-size heading detection, table extraction |\n| Word | `.docx` | `kb-arena[docx]` | Converts to HTML, then extracts structure |\n| CSV / TSV | `.csv`, `.tsv` | — | Auto-detects delimiter, groups rows into sections |\n| Web URL | `https://...` | `kb-arena[web]` | Crawls same-domain pages; uses `/llms.txt` if available |\n| GitHub | `github:owner/repo` | — | Shallow clones and ingests all doc files |\n| SEC EDGAR | `--format sec-edgar` | — | 10-K/10-Q filing parser |\n\n---\n\n## Universal Documentation Schema\n\nKB Arena extracts entities and relationships using a universal schema that works for any documentation domain:\n\n**5 node types:** Topic, Component, Process, Config, Constraint\n**7 relationship types:** DEPENDS_ON, CONTAINS, CONNECTS_TO, TRIGGERS, CONFIGURES, ALTERNATIVE_TO, EXTENDS\n\nNo per-domain configuration needed. The LLM maps your documentation concepts to these types automatically.\n\n---\n\n## CLI Reference\n\n| Command | Description |\n|---|---|\n| `demo` | Launch dashboard with pre-computed results (no API keys needed) |\n| `init-corpus \u003cname\u003e` | Scaffold `datasets/{name}/` directories |\n| `ingest \u003cpath\u003e` | Parse docs into JSONL. Accepts files, dirs, URLs, `github:owner/repo`. Options: `--corpus`, `--format`, `--dry-run` |\n| `build-graph` | Extract entities/rels into Neo4j. Options: `--corpus` |\n| `build-vectors` | Build vector indexes + PageIndex tree. Options: `--corpus`, `--strategy` |\n| `generate-questions` | Auto-generate benchmark questions. Options: `--corpus`, `--count` |\n| `benchmark` | Run evaluation. Options: `--corpus`, `--strategy`, `--tier`, `--dry-run` |\n| `generate-qa` | Generate Q\u0026A pairs from your docs as JSONL. Options: `--corpus`, `--output` |\n| `audit` | Find documentation gaps — classifies sections as strong/weak/gap. Options: `--corpus`, `--output`, `--max-sections` |\n| `fix` | Generate fix recommendations with draft content. Options: `--corpus`, `--max-fixes`, `--output` |\n| `report` | Generate report. Options: `--corpus`, `--output`, `--format` (rich/json) |\n| `serve` | Launch API + frontend. Options: `--host`, `--port` |\n| `health` | Pipeline status. Options: `--format` (rich/json) |\n\nAll commands are independently re-runnable. Each stage writes to disk so you can re-run any step without repeating earlier ones.\n\n### CLI Features\n\n**Dry run** — Preview what a command will do before committing to expensive LLM calls:\n\n```bash\nkb-arena ingest datasets/my-docs/raw/ --corpus my-docs --dry-run\n# Shows: file count by extension, parser assignment, output path\n\nkb-arena benchmark --corpus my-docs --dry-run\n# Shows: question count, strategy list, total queries, concurrency settings\n```\n\n![Dry Run Preview](docs/demo-dry-run.gif)\n\n**JSON output** — Pipe structured data to `jq`, scripts, or CI pipelines:\n\n```bash\nkb-arena report --corpus my-docs --format json | jq '.corpora'\nkb-arena health --format json | jq '.services'\n```\n\n![JSON Output](docs/demo-format-json.gif)\n\n**Pipeline hints** — After every command, see what to run next:\n\n```\n$ kb-arena ingest datasets/my-docs/raw/ --corpus my-docs\nDone. 12 documents, 47 sections → datasets/my-docs/processed/documents.jsonl\n\nNext: kb-arena build-graph --corpus my-docs \u0026\u0026 kb-arena build-vectors --corpus my-docs\n```\n\n**Progress bars** — Every long-running command shows real-time progress (extraction sections, Neo4j batch loading, vector index building, question generation tiers).\n\n**Cost tracking** — Benchmark runs display cumulative API cost in the progress bar and print per-strategy cost/accuracy summaries after completion.\n\n**Verbose mode** — Add `--verbose` / `-v` to any command for debug logging:\n\n```bash\nkb-arena benchmark --corpus my-docs --verbose\n```\n\n---\n\n## Reliability and Performance\n\n**LLM retry with backoff** — All Anthropic API calls retry up to 3 times with exponential backoff (1s, 2s, 4s) on rate limits, timeouts, and server errors. Every call has a 60-second hard timeout.\n\n**Parallel graph extraction** — Entity/relationship extraction from document sections runs up to 5 sections concurrently (5-10x faster than sequential for large corpora).\n\n**Parallel hybrid reranking** — The hybrid strategy's passage re-ranking runs all scoring calls concurrently using Haiku instead of sequential Sonnet calls (~50s to ~5s on procedural queries).\n\n**Evaluation independence** — The LLM-as-judge evaluator uses a different model (`JUDGE_MODEL`, defaults to Opus) than the generation model (Sonnet) to avoid same-model scoring bias.\n\n**Cypher safety** — LLM-generated Cypher queries are checked for write operations (`CREATE`, `MERGE`, `DELETE`, etc.) and blocked before execution. Only read queries reach Neo4j.\n\n---\n\n## Environment Variables\n\nAll prefixed with `KB_ARENA_`. Loaded from `.env` or environment.\n\n| Variable | Default | Required | Description |\n|----------|---------|----------|-------------|\n| `ANTHROPIC_API_KEY` | — | Yes | Claude for generation, evaluation, extraction |\n| `OPENAI_API_KEY` | — | Yes | OpenAI for text-embedding-3-large |\n| `NEO4J_URI` | `bolt://localhost:7687` | No | Neo4j connection |\n| `NEO4J_USER` | `neo4j` | No | Neo4j username |\n| `NEO4J_PASSWORD` | — | No | Neo4j password (set to match `NEO4J_AUTH` in docker-compose) |\n| `JUDGE_MODEL` | `claude-opus-4-6` | No | Model used for LLM-as-judge evaluation (default differs from generate model to avoid self-evaluation bias) |\n| `CHROMA_PATH` | `./chroma_data` | No | ChromaDB storage path |\n| `EMBEDDING_MODEL` | `text-embedding-3-large` | No | OpenAI embedding model |\n| `EMBEDDING_DIMENSIONS` | `3072` | No | Embedding vector dimensions |\n| `GENERATE_MODEL` | `claude-sonnet-4-6` | No | Generation model |\n| `FAST_MODEL` | `claude-haiku-4-5-20251001` | No | Classification model |\n| `HOST` | `0.0.0.0` | No | Server bind address |\n| `PORT` | `8000` | No | Server port |\n| `DEBUG` | `false` | No | Debug mode |\n| `BENCHMARK_TEMPERATURE` | `0.0` | No | LLM temperature for benchmarks |\n| `BENCHMARK_MAX_CONCURRENT` | `5` | No | Parallel benchmark queries |\n| `BENCHMARK_QUERY_TIMEOUT_S` | `120` | No | Per-query timeout (seconds) |\n| `BENCHMARK_MAX_RETRIES` | `2` | No | Retry count on failures |\n| `PAGEINDEX_BEAM_WIDTH` | `3` | No | Branches to explore per tree level |\n| `PAGEINDEX_MAX_DEPTH` | `4` | No | Maximum tree traversal depth |\n| `DATASETS_PATH` | `./datasets` | No | Datasets directory |\n| `RESULTS_PATH` | `./results` | No | Results output directory |\n\n---\n\n## Development\n\n```bash\n# Install with dev dependencies\npip install -e '.[dev]'\n\n# Run tests\npytest tests/ -v --ignore=tests/live  # 514 tests\n\n# Lint + format\nruff check . \u0026\u0026 ruff format --check .\n\n# Frontend\ncd web \u0026\u0026 npm install \u0026\u0026 npx next build\n```\n\n---\n\n## Tech Stack\n\n| Component | Technology |\n|-----------|-----------|\n| LLM | Claude Sonnet 4.6 (generation) + Haiku 4.5 (classification) |\n| Embeddings | text-embedding-3-large (3072-dim) |\n| Vector store | ChromaDB 0.5 (local, no server) |\n| Graph store | Neo4j 5 Community |\n| Backend | FastAPI + SSE streaming |\n| Frontend | Next.js 14 + Tailwind + Recharts |\n| Models | Pydantic v2 |\n| CLI | Typer + Rich |\n| Testing | pytest (514 tests) |\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxmpuspus%2Fkb-arena","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxmpuspus%2Fkb-arena","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxmpuspus%2Fkb-arena/lists"}