{"id":26292071,"url":"https://github.com/krlabsorg/verbatim-rag","last_synced_at":"2026-04-01T23:32:46.193Z","repository":{"id":280323797,"uuid":"925203146","full_name":"KRLabsOrg/verbatim-rag","owner":"KRLabsOrg","description":null,"archived":false,"fork":false,"pushed_at":"2025-12-07T18:38:22.000Z","size":17995,"stargazers_count":156,"open_issues_count":2,"forks_count":16,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-08T23:53:00.988Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KRLabsOrg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-31T12:36:44.000Z","updated_at":"2025-12-08T13:53:01.000Z","dependencies_parsed_at":"2025-03-02T19:19:50.383Z","dependency_job_id":"188b316c-fc77-4f1c-94fb-14efcc82e0ff","html_url":"https://github.com/KRLabsOrg/verbatim-rag","commit_stats":null,"previous_names":["krlabsorg/verbatim-rag"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/KRLabsOrg/verbatim-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2Fverbatim-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2Fverbatim-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2Fverbatim-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2Fverbatim-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KRLabsOrg","download_url":"https://codeload.github.com/KRLabsOrg/verbatim-rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KRLabsOrg%2Fverbatim-rag/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28578952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-19T17:42:58.221Z","status":"ssl_error","status_checked_at":"2026-01-19T17:40:54.158Z","response_time":67,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-15T01:19:50.999Z","updated_at":"2026-03-16T13:11:54.161Z","avatar_url":"https://github.com/KRLabsOrg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Verbatim RAG\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/KRLabsOrg/verbatim-rag/blob/main/assets/chiliground.png?raw=true\" alt=\"ChiliGround Logo\" width=\"400\"/\u003e\n  \u003cbr\u003e\u003cem\u003eChill, I Ground! 🌶 ️\u003c/em\u003e\n\u003c/p\u003e\n\nA minimalistic approach to Retrieval-Augmented Generation (RAG) that prevents hallucination by ensuring all generated content is explicitly derived from source documents.\n\n[![PyPI](https://img.shields.io/pypi/v/verbatim-rag)](https://pypi.org/project/verbatim-rag/)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1IACXwo3ezgA1yXarxVOC4yXjdUPmOI1H?usp=sharing)\n[![ACL 2025](https://img.shields.io/badge/ACL%20Anthology-2025.bionlp--share.8-blue)](https://aclanthology.org/2025.bionlp-share.8/)\n\n## Concept\n\nTraditional RAG systems retrieve relevant documents and then allow an LLM to freely generate responses based on that context. This can lead to hallucinations where the model invents facts not present in the source material.\n\nVerbatim RAG solves this by extracting verbatim text spans from documents and composing responses entirely from these exact passages, with direct citations linking back to sources.\n\nFor extraction, we can use LLM-based span extractors or fine-tuned encoder-based models like ModernBERT. We've trained our own ModernBERT model for this purpose, which is available on [HuggingFace](https://huggingface.co/KRLabsOrg/verbatim-rag-modern-bert-v1) (we've trained it on the [RAGBench](https://huggingface.co/datasets/galileo-ai/ragbench) dataset).\n\nWith this approach, **the whole RAG pipeline can be run without any usage of LLMs**, and with using SPLADE embeddings, the pipeline can be run entirely on CPU, making it lightweight and efficient.\n\n## Installation\n\n```bash\n# Install the package\npip install verbatim-rag\n```\n\nFor local development:\n\n```bash\npip install -e packages/core/\npip install -e .\n```\n\n## Lightweight Core\n\nIf you only need the reusable verbatim core without the full RAG pipeline (no torch, transformers, or Milvus):\n\n```bash\npip install verbatim-core\n```\n\n```python\nfrom verbatim_core import VerbatimTransform\n\nvt = VerbatimTransform()\nresponse = vt.transform(\n    question=\"What is the main finding?\",\n    context=[\n        {\"content\": \"The study found that X leads to Y.\", \"title\": \"Paper A\"},\n        {\"content\": \"Results show Z is significant.\", \"title\": \"Paper B\"},\n    ],\n)\nprint(response.answer)\n```\n\nDependencies: only `openai`, `pydantic`, `rapidfuzz`, and `jinja2`.\n\n## Quick Start\n\n```python\nfrom verbatim_rag import VerbatimIndex, VerbatimRAG\nfrom verbatim_rag.ingestion import DocumentProcessor\nfrom verbatim_rag.vector_stores import LocalMilvusStore\nfrom verbatim_rag.embedding_providers import SpladeProvider\n\n# Process documents with intelligent chunking\nprocessor = DocumentProcessor()\n\n# Process PDFs from URLs\ndocument = processor.process_url(\n    url=\"https://aclanthology.org/2025.bionlp-share.8.pdf\",\n    title=\"KR Labs at ArchEHR-QA 2025: A Verbatim Approach for Evidence-Based Question Answering\",\n    metadata={\"authors\": [\"Adam Kovacs\", \"Paul Schmitt\", \"Gabor Recski\"]}\n)\n\n# Create embedding provider and vector store\nsparse_provider = SpladeProvider(\n    model_name=\"opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill\",\n    device=\"cpu\"\n)\nvector_store = LocalMilvusStore(\n    db_path=\"./index.db\",\n    collection_name=\"verbatim_rag\",\n    enable_dense=False,\n    enable_sparse=True,\n)\n\n# Create index with providers\nindex = VerbatimIndex(\n    vector_store=vector_store,\n    sparse_provider=sparse_provider\n)\nindex.add_documents([document])\n\n# Then query the index\nrag = VerbatimRAG(index)\n\nresponse = rag.query(\"What is the main contribution of the paper?\")\nprint(response.answer)\n```\n\n\n### Environment Setup\n\nSet your OpenAI API key before using the system:\n\n```bash\nexport OPENAI_API_KEY=your_api_key_here\n```\n\n## How It Works\n\n1. **Document Processing**: Documents are processed using docling for format conversion and chonkie for chunking\n2. **Document Indexing**: Documents are indexed using vector embeddings (both dense and sparse)\n3. **Template Management**: Response templates are created and stored for common question types\n4. **Query Processing**: \n   - Relevant documents are retrieved\n   - Key passages are extracted verbatim using either LLM-based or fine-tuned span extractors\n   - Responses are structured using templates\n   - Citations link back to source documents\n\nThis ensures all responses are grounded in the source material, preventing hallucinations.\n\n## Architecture\n\n### Core Components\n\n- **VerbatimRAG** (`verbatim_rag/core.py`): Main orchestrator that coordinates document retrieval, span extraction, and response generation\n- **VerbatimIndex** (`verbatim_rag/index.py`): Vector-based document indexing and retrieval\n- **SpanExtractor** (`verbatim_rag/extractors.py`): Abstract interface for extracting relevant text spans from documents\n  - **LLMSpanExtractor**: Uses OpenAI models to identify relevant spans\n  - **ModelSpanExtractor**: Uses fine-tuned BERT-based models for span classification\n- **DocumentProcessor** (`verbatim_rag/ingestion/`): Docling + Chonkie integration for intelligent document processing\n- **Document** (`verbatim_rag/document.py`): Core document representation with metadata\n\n### Data Flow\n1. Documents are processed and chunked using docling and chonkie\n2. Documents are indexed using vector embeddings\n3. User queries retrieve relevant documents\n4. Span extractors identify verbatim passages that answer the question\n5. Response templates structure the final answer with citations\n6. All responses include exact text spans and document references\n\n## Web Interface\n\nThe package includes a full web interface with React frontend and FastAPI backend:\n\n```bash\n# Start API server\npython api/app.py\n\n# Start React frontend (in another terminal)\ncd frontend/\nnpm install\nnpm start\n```\n\n## ModernBERT Based Span Extractor\n\nWe've trained our own encoder model based on ModernBERT for sentence classification. This model is designed to classify text spans as relevant or not, providing a robust alternative to LLM-based extractors.\n\nYou can find our model on HuggingFace: [KRLabsOrg/verbatim-rag-modern-bert-v1](https://huggingface.co/KRLabsOrg/verbatim-rag-modern-bert-v1).\n\nYou can use it with the defined index as follows:\n\n```python\nfrom verbatim_rag.core import VerbatimRAG\nfrom verbatim_rag.index import VerbatimIndex\nfrom verbatim_rag.extractors import ModelSpanExtractor\nfrom verbatim_rag.vector_stores import LocalMilvusStore\nfrom verbatim_rag.embedding_providers import SpladeProvider\n\n# Load your trained extractor\nextractor = ModelSpanExtractor(\"KRLabsOrg/verbatim-rag-modern-bert-v1\")\n\n# Create embedding provider and vector store\nsparse_provider = SpladeProvider(\n    model_name=\"opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill\",\n    device=\"cpu\"\n)\nvector_store = LocalMilvusStore(\n    db_path=\"./index.db\",\n    collection_name=\"verbatim_rag\",\n    enable_dense=False,\n    enable_sparse=True,\n)\n\n# Create index with providers\n# (Assuming you have already populated the index)\nindex = VerbatimIndex(\n    vector_store=vector_store,\n    sparse_provider=sparse_provider\n)\n\n# Create VerbatimRAG system with custom extractor\nrag_system = VerbatimRAG(\n    index=index,\n    extractor=extractor,\n    k=5\n)\n\n# Query the system\nresponse = rag_system.query(\"Main findings of the paper?\")\nprint(response.answer)\n```\n\n## Citation\n\nIf you use Verbatim RAG in your research, please cite our paper:\n\n```bibtex\n@inproceedings{kovacs-etal-2025-kr,\n    title = \"{KR} Labs at {A}rch{EHR}-{QA} 2025: A Verbatim Approach for Evidence-Based Question Answering\",\n    author = \"Kovacs, Adam  and\n      Schmitt, Paul  and\n      Recski, Gabor\",\n    editor = \"Soni, Sarvesh  and\n      Demner-Fushman, Dina\",\n    booktitle = \"Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks)\",\n    month = aug,\n    year = \"2025\",\n    address = \"Vienna, Austria\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2025.bionlp-share.8/\",\n    pages = \"69--74\",\n    ISBN = \"979-8-89176-276-3\",\n    abstract = \"We present a lightweight, domain{-}agnostic verbatim pipeline for evidence{-}grounded question answering. Our pipeline operates in two steps: first, a sentence-level extractor flags relevant note sentences using either zero-shot LLM prompts or supervised ModernBERT classifiers. Next, an LLM drafts a question-specific template, which is filled verbatim with sentences from the extraction step. This prevents hallucinations and ensures traceability. In the ArchEHR{-}QA 2025 shared task, our system scored 42.01{\\%}, ranking top{-}10 in core metrics and outperforming the organiser{'}s 70B{-}parameter Llama{-}3.3 baseline. We publicly release our code and inference scripts under an MIT license.\"\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrlabsorg%2Fverbatim-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkrlabsorg%2Fverbatim-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrlabsorg%2Fverbatim-rag/lists"}