{"id":50559147,"url":"https://github.com/chatman-media/rag","last_synced_at":"2026-06-04T10:30:34.515Z","repository":{"id":357918835,"uuid":"1239111757","full_name":"chatman-media/rag","owner":"chatman-media","description":"Production-grade RAG engine — hybrid retrieval (pgvector + BM25), sales personas, hallucination guard, pluggable LLM providers","archived":false,"fork":false,"pushed_at":"2026-05-14T22:13:11.000Z","size":202,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-14T22:38:02.949Z","etag":null,"topics":["bun","chatbot","hybrid-search","llm","ollama","openai","pgvector","rag","retrieval-augmented-generation","typescript"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chatman-media.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-14T19:17:56.000Z","updated_at":"2026-05-14T22:13:16.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/chatman-media/rag","commit_stats":null,"previous_names":["chatman-media/chatbot_rag"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/chatman-media/rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chatman-media%2Frag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chatman-media%2Frag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chatman-media%2Frag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chatman-media%2Frag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chatman-media","download_url":"https://codeload.github.com/chatman-media/rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chatman-media%2Frag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33901305,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-04T02:00:06.755Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bun","chatbot","hybrid-search","llm","ollama","openai","pgvector","rag","retrieval-augmented-generation","typescript"],"created_at":"2026-06-04T10:30:33.512Z","updated_at":"2026-06-04T10:30:34.509Z","avatar_url":"https://github.com/chatman-media.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ca name=\"top\"\u003e\u003c/a\u003e\n\n# @chatman-media/rag\n\n**Production-grade RAG engine for conversational bots**\n\n[![npm version](https://img.shields.io/npm/v/@chatman-media/rag?logo=npm\u0026color=22c55e)](https://www.npmjs.com/package/@chatman-media/rag)\n[![CI](https://github.com/chatman-media/rag/actions/workflows/ci.yml/badge.svg)](https://github.com/chatman-media/rag/actions/workflows/ci.yml)\n[![TypeScript](https://img.shields.io/badge/TypeScript-5.x-3178c6?logo=typescript\u0026logoColor=white)](https://www.typescriptlang.org/)\n[![Bun](https://img.shields.io/badge/Bun-compatible-fbf0df?logo=bun\u0026logoColor=black)](https://bun.sh/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n[![used by @chatman-media/sales](https://img.shields.io/badge/used%20by-@chatman--media%2Fsales-6366f1)](https://github.com/chatman-media/sales)\n[![pgvector](https://img.shields.io/badge/pgvector-hybrid%20search-336791?logo=postgresql\u0026logoColor=white)](https://github.com/pgvector/pgvector)\n[![OpenAI Compatible](https://img.shields.io/badge/OpenAI-compatible-412991?logo=openai\u0026logoColor=white)](https://platform.openai.com/docs/api-reference)\n[![Ollama](https://img.shields.io/badge/Ollama-local%20LLM-black?logo=ollama)](https://ollama.com/)\n\nHybrid retrieval · Sales-style personas · Hallucination guard · Zero framework dependencies\n\n---\n\n🌐 **Language / Язык / 语言**\n\n🇬🇧 **English** \u0026nbsp;·\u0026nbsp; [🇷🇺 Русский](README.ru.md) \u0026nbsp;·\u0026nbsp; [🇨🇳 中文](README.zh.md)\n\n\u003c/div\u003e\n\n---\n\n## Why @chatman-media/rag?\n\nMost RAG demos stop at \"embed → search → prompt\". This package ships what **production** looks like:\n\n| Feature | Details |\n|---------|---------|\n| 🔍 **Hybrid retrieval** | pgvector cosine + BM25 full-text, fused via Reciprocal Rank Fusion |\n| 🧠 **Hallucination guard** | Single LLM call checks KB grounding _and_ domain-specific facts |\n| ✏️ **Query rewriting** | Resolves pronouns \u0026 elliptical follow-ups before retrieval |\n| 🎭 **Sales personas** | NEPQ / AIDA / PAS / SPIN frameworks, A/B-ready style configs |\n| 🏷️ **Topic routing** | Deterministic regex classifier, zero latency, zero cost |\n| 🔌 **Pluggable backends** | Any storage via `IKbStore`; any LLM via `ChatClient` |\n| 📄 **Ingest pipeline** | `.md` / `.txt` / `.pdf` with overlap chunking and SHA-256 dedup |\n| 💬 **Memory** | Cross-session user-facts extraction + conversation summarization |\n\n## Install\n\n```bash\nbun add @chatman-media/rag     # Bun\nnpm install @chatman-media/rag # npm / pnpm / yarn\n```\n\n**Peer requirements:** Node 18+ or Bun 1.x. No native modules — pure TypeScript.\n\n## Quick start\n\n```ts\nimport { answerWithRag, OpenAIChatClient, OpenAIEmbeddingClient } from \"@chatman-media/rag\";\n\nconst chat = new OpenAIChatClient({\n  apiKey: process.env.OPENAI_API_KEY!,\n  baseUrl: \"https://api.openai.com/v1\",\n  model: \"gpt-4o-mini\",\n});\n\nconst embedder = new OpenAIEmbeddingClient({\n  apiKey: process.env.OPENAI_API_KEY!,\n  baseUrl: \"https://api.openai.com/v1\",\n  model: \"text-embedding-3-small\",\n  dim: 1536,\n});\n\nconst result = await answerWithRag({\n  question: \"What are the working conditions in Dubai?\",\n  kb: myKbStore,       // your IKbStore implementation — see below\n  chat,\n  embedder,\n  hybridSearch: true,  // vector + BM25 fusion\n  topicRouting: true,  // free topic-scoped retrieval\n  reflect: true,       // hallucination guard\n});\n\nconsole.log(result.text);       // bot reply\nconsole.log(result.telemetry);  // retrieval_ms, generation_ms, path, factCheck, ...\n```\n\n## Architecture\n\n```\nanswerWithRag(question, kb, chat, embedder, options?)\n│\n├─ 🚀 Persona shortcuts (regex, no LLM call)\n│     smalltalk · bot-presence · personal-facts\n│\n├─ ✏️  [optional] rewriteQuery\n│     LLM resolves \"а там?\" / \"это сколько?\" into full question\n│\n├─ 🔢 embedder.embed(question) → float32[]\n│\n├─ 🔍 Retrieval\n│     ├─ vector: kb.search(embedding, k, topic?)\n│     ├─ BM25:   kb.searchBm25(query, k, topic?)      ← hybrid mode\n│     └─ RRF fusion → KbSearchHit[]\n│\n├─ 📝 Prompt composition\n│     composeSystemPrompt(style, stage, kbContext)     ← sales mode\n│     buildSystemPrompt(persona, context)              ← legacy mode\n│\n├─ 🤖 chat.complete(messages) → raw string\n│\n├─ 🧹 sanitizeLlmOutput\n│     strips \u003cthink\u003e · markdown · em-dashes · AI lead-ins\n│\n└─ 🛡️  [optional] checkFacts\n      KB grounding + domain-specific fact verification\n      → grounded=false → return NO_CONTEXT_MARKER\n```\n\n## Implement IKbStore\n\nThe engine is storage-agnostic. Implement `IKbStore` for your backend:\n\n```ts\nimport type { IKbStore, KbSearchHit } from \"@chatman-media/rag\";\n\nclass MyKbStore implements IKbStore {\n  async search(embedding: number[], k: number, topic?: string | null): Promise\u003cKbSearchHit[]\u003e {\n    return db.query(`\n      SELECT chunk_id, text, source, title,\n             (embedding \u003c=\u003e $1::vector) AS distance\n      FROM kb_chunks\n      ORDER BY embedding \u003c=\u003e $1::vector ASC\n      LIMIT $2\n    `, [JSON.stringify(embedding), k]);\n  }\n\n  async hybridSearch(input: {\n    embedding: number[]; query: string; k?: number; topic?: string | null;\n  }): Promise\u003cKbSearchHit[]\u003e {\n    const vec = await this.search(input.embedding, (input.k ?? 5) * 2, input.topic);\n    const bm25 = await this.searchBm25(input.query, (input.k ?? 5) * 2, input.topic);\n    return reciprocalRankFusion(vec, bm25, input.k ?? 5);\n  }\n\n  async prioritySearch(input: {\n    embedding: number[]; query: string; k?: number; vectorOnly?: boolean;\n  }): Promise\u003cKbSearchHit[]\u003e {\n    const books = await this.searchTopic(input.embedding, \"books\", input.k ?? 5);\n    if (books.length \u003e 0) return books;\n    return input.vectorOnly\n      ? this.search(input.embedding, input.k ?? 5)\n      : this.hybridSearch(input);\n  }\n\n  async getDocumentBySource(source: string) { ... }\n  async countChunksForDocument(documentId: number) { ... }\n  async deleteDocument(id: number) { ... }\n  async upsertDocument(input: { source; title; contentHash; topic? }) { ... }\n  async insertChunkWithEmbedding(input: { documentId; chunkIndex; text; tokenCount; embedding }) { ... }\n}\n```\n\n## LLM providers\n\n```ts\nimport {\n  OpenAIChatClient,          // OpenAI, Together, Groq, any OpenAI-compatible\n  OllamaChatClient,          // local models via Ollama\n  OpenRouterChatClient,      // 100+ models behind one API key\n  OpenAIEmbeddingClient,\n  OllamaEmbeddingClient,\n} from \"@chatman-media/rag\";\n\n// Local Ollama (qwen3, llama3, mistral, …)\nconst chat = new OllamaChatClient({\n  host: \"http://localhost:11434\",\n  model: \"qwen3:latest\",\n  disableThinking: true,  // strip \u003cthink\u003e…\u003c/think\u003e blocks\n  timeoutMs: 5 * 60_000,\n});\n\n// OpenRouter — swap models without changing code\nconst chat = new OpenRouterChatClient({\n  apiKey: process.env.OPENROUTER_API_KEY!,\n  model: \"anthropic/claude-haiku-4-5\",\n});\n\n// Custom endpoint (Together, Groq, Azure, local vLLM…)\nconst chat = new OpenAIChatClient({\n  apiKey: process.env.TOGETHER_API_KEY!,\n  baseUrl: \"https://api.together.xyz/v1\",\n  model: \"meta-llama/Llama-3-8b-chat-hf\",\n});\n```\n\n## Ingest documents\n\n```ts\nimport { ingestFile, ingestDirectory, ingestText } from \"@chatman-media/rag\";\n\n// Single file\nawait ingestFile(\"./docs/faq.md\", { kb, embedder });\n\n// Directory — auto-derives topic tag from subdirectory name\nawait ingestDirectory(\"./knowledge-base\", { kb, embedder });\n// knowledge-base/visa/requirements.md    → topic: \"visa\"\n// knowledge-base/payment/salaries.md    → topic: \"payment\"\n// knowledge-base/intro.md               → topic: null (global)\n\n// Raw text (admin paste, API upload, …)\nawait ingestText(\n  { title: \"Product FAQ\", body: markdownString },\n  { kb, embedder, topic: \"faq\" }\n);\n```\n\nContent is SHA-256 deduped — re-ingesting unchanged files is a no-op.\n\n## Sales personas\n\nFor bots that need to convert, not just answer:\n\n```ts\nimport { answerWithRag } from \"@chatman-media/rag\";\nimport type { Style } from \"@chatman-media/rag\";\n\nconst style: Style = {\n  slug: \"alina-nepq\",\n  displayName: \"Alina NEPQ\",\n  persona: {\n    name: \"Alina\",\n    role: \"human\",          // poses as a real person, never admits to being AI\n    company: \"My Agency\",\n    facts: { city: \"Istanbul\", age: \"26\" },\n  },\n  voice: {\n    tone: \"warm, curious, brief — like a real WhatsApp conversation\",\n    language: \"ru\",\n    forbid: [\"certainly!\", \"of course!\", \"as an AI\"],\n  },\n  framework: \"NEPQ\",        // AIDA | PAS | SPIN | NEPQ | straight_line\n  hooks: [\n    { kind: \"social_proof\", text: \"Most of our girls hit their income target within 2 weeks\" },\n    { kind: \"scarcity\",     text: \"Only 3–5 spots left on the next flight\" },\n  ],\n  stages: {\n    qualify: { goal: \"Understand motivation and readiness\", groundingRequired: false },\n    pitch:   { goal: \"Present specific vacancy conditions\",  groundingRequired: true },\n  },\n  fewShot: [\n    { stage: \"qualify\", user: \"how much do they pay?\", assistant: \"Depends on the city — where are you thinking?\" },\n  ],\n  guardrails: {\n    noMinors: true,\n    botDisclosureOnDirectQuestion: true,\n    forbiddenTopics: [],\n  },\n  model: { id: \"qwen3:latest\", temperature: 0.8, maxTokens: 256 },\n};\n\nconst result = await answerWithRag({\n  question, kb, chat, embedder,\n  style,\n  stage: \"qualify\",         // opener | qualify | pitch | objection | close\n  hybridSearch: true,\n  skills: activeSkills,     // persuasion techniques loaded from your DB\n});\n```\n\n## AnswerInput options\n\n| Option | Type | Default | Description |\n|--------|------|---------|-------------|\n| `topK` | `number` | `5` | KB chunks to retrieve |\n| `maxDistance` | `number` | — | Drop vector hits above this cosine distance |\n| `hybridSearch` | `boolean` | `false` | Fuse vector + BM25 via RRF |\n| `topicRouting` | `boolean` | `false` | Route retrieval to a topic slice first |\n| `booksPriority` | `boolean` | `false` | Search \"books\" topic first, global fallback |\n| `rewriteQueryBeforeRetrieval` | `boolean` | `false` | Resolve pronouns/ellipsis with LLM |\n| `reflect` | `boolean` | `false` | Hallucination guard (1 extra LLM call) |\n| `vacanciesBlock` | `string` | — | Pre-rendered vacancies prepended to context |\n| `vacancyGuard` | `boolean` | `true` | Check vacancy accuracy when `vacanciesBlock` is set |\n| `includeFewShot` | `boolean` | `true` | Include style few-shot examples |\n| `numPredict` | `number` | — | Hard cap on output tokens |\n| `userFacts` | `Record\u003cstring,string\u003e` | — | Cross-session user memory injected into prompt |\n| `conversationSummary` | `string` | — | Compressed older turns injected into prompt |\n| `skills` | `SkillForPrompt[]` | — | Persuasion techniques attached to the active style |\n\n## Telemetry\n\nEvery call returns structured telemetry — no setup required:\n\n```ts\nconst { text, telemetry } = await answerWithRag({ ... });\n\n// telemetry shape:\n{\n  path: \"ok\",              // ok | smalltalk | persona_fact | no_context | ungrounded\n  retrieval_ms: 38,\n  generation_ms: 1240,\n  top_distances: [0.18, 0.22, 0.31, 0.35, 0.42],\n  hybrid: true,\n  topic: \"visa\",           // null when classifier was inconclusive\n  original_query: \"а там?\",\n  rewritten_query: \"what are the visa requirements in Dubai?\",\n  factCheck: {\n    grounded: true,\n    vacancyOk: true,\n  }\n}\n```\n\nStore it in your messages table for later analysis: retrieval quality trends, hallucination rate by model, A/B experiment outcomes.\n\n## Roadmap\n\n### ✅ Done\n- [x] Hybrid retrieval — pgvector + BM25 + Reciprocal Rank Fusion\n- [x] Hallucination guard (`reflect`, `vacancyGuard`)\n- [x] Query rewriting before retrieval\n- [x] Sales personas — NEPQ / AIDA / PAS / SPIN\n- [x] Topic routing — zero-latency regex classifier\n- [x] Document ingestion — `.md` / `.txt` / `.pdf` with SHA-256 dedup\n- [x] Cross-session memory — user-facts extraction + conversation summarization\n- [x] Streaming — `answerWithRagStream()`, `ChatClient.stream()`\n- [x] `onTelemetry` callback — zero-setup metrics on every call\n- [x] `InMemoryKbStore` — database-free store for tests and prototypes\n- [x] Retry + exponential backoff — `withRetryChatClient()`, `withRetryEmbeddingClient()`\n- [x] Semantic cache — `SemanticCache` with cosine similarity threshold\n- [x] Section-aware chunking — `chunkBySections()` splits by Markdown headings\n\n### ✅ Also Done\n- [x] **Reranker** — optional cross-encoder stage after RRF (`CohereReranker`, `JinaReranker`)\n- [x] **Evaluation utilities** — `evalRetrieval()` → recall@k, MRR, NDCG\n- [x] **`IConversationStore`** — unified interface for session history + summary persistence\n- [x] **A/B test router** — randomise styles by `userId`, log conversion via `onTelemetry`\n- [x] **SSE server** — `createRagServer()` on Bun.serve() with token streaming\n- [x] **Multi-cycle tool calling** — agentic tool loop with parallel tool execution, bounded by `maxToolCycles` (works in `answerWithRag` and `answerWithRagStream`)\n\n### 🚧 Planned\n- [ ] **`PgVectorKbStore`** — ready-made pgvector `IKbStore` adapter shipped out of the box\n- [ ] **More store adapters** — Qdrant and Pinecone backends\n- [ ] **OpenTelemetry exporter** — bridge `onTelemetry` events to OTel spans and metrics\n- [ ] **Token usage \u0026 cost tracking** — per-call token counts and cost in telemetry\n- [ ] **Contextual retrieval** — prepend chunk-level context before embedding for higher recall\n- [ ] **Embedding cache** — cache embeddings keyed by text hash to cut redundant API calls\n\n## Contributing\n\nPRs and issues welcome. See [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\n[MIT](LICENSE) — Alexander Kireev / [chatman-media](https://github.com/chatman-media)\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n🇬🇧 **English** \u0026nbsp;·\u0026nbsp; [🇷🇺 Русский](README.ru.md) \u0026nbsp;·\u0026nbsp; [🇨🇳 中文](README.zh.md)\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchatman-media%2Frag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchatman-media%2Frag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchatman-media%2Frag/lists"}