{"id":24738130,"url":"https://github.com/bard/rag-engine","last_synced_at":"2026-04-13T16:04:19.909Z","repository":{"id":274481026,"uuid":"922257645","full_name":"bard/rag-engine","owner":"bard","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-26T23:17:49.000Z","size":1762,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-03-11T15:50:50.624Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-25T18:21:52.000Z","updated_at":"2025-01-26T23:53:57.000Z","dependencies_parsed_at":"2025-01-27T16:22:02.804Z","dependency_job_id":"d98ed4f0-4553-496f-8190-ae8e8808f68a","html_url":"https://github.com/bard/rag-engine","commit_stats":null,"previous_names":["bard/rag-engine"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bard/rag-engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bard%2Frag-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bard%2Frag-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bard%2Frag-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bard%2Frag-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bard","download_url":"https://codeload.github.com/bard/rag-engine/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bard%2Frag-engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31759579,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T15:25:13.801Z","status":"ssl_error","status_checked_at":"2026-04-13T15:25:09.162Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-27T22:35:24.427Z","updated_at":"2026-04-13T16:04:19.892Z","avatar_url":"https://github.com/bard.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Description\n\nAn agentic RAG engine with support for heterogeneous source data formats, query routing between local and external knowledge sources, multiple topics.\n\nComponents:\n\n- LangGraph ingestion workflow\n- LangGraph query workflow\n- FastAPI backend\n- Admin CLI\n- NextJS front end (in a [separate repository](https://github.com/bard/rag-frontend))\n\n## Demo\n\nhttps://github.com/user-attachments/assets/c88b5ef3-7ba8-4a83-84d1-d33a4f0e67cd\n\n\u003c!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc --\u003e\n\n**Table of Contents**\n\n- [Description](#description)\n- [Setup](#setup)\n- [Running the API](#running-the-api)\n- [Running the front end](#running-the-front-end)\n- [Running the CLI](#running-the-cli)\n- [Development](#development)\n- [Architecture and development notes](#architecture-and-development-notes)\n  - [The ingestion workflow](#the-ingestion-workflow)\n  - [The query workflow](#the-query-workflow)\n  - [Modelling, configuration, dependencies](#modelling-configuration-dependencies)\n  - [LLMs and testing](#llms-and-testing)\n  - [Limitations and possible improvements](#limitations-and-possible-improvements)\n\n\u003c!-- markdown-toc end --\u003e\n\n## Setup\n\n```sh\ngit clone https://github.com/bard/rag-engine\ncd rag-engine\npoetry install\ncp .env.example .env\n```\n\nEdit `.env` to specify API keys and database connection strings.\n\n## Running the API\n\n```sh\npoetry run task start_api\n```\n\n## Running the front end\n\n```sh\ngit clone https://github.com/bard/rag-frontend\ncd rag-frontend\npnpm install\npnpm dev\n```\n\n## Running the CLI\n\n```sh\n$ poetry shell\n$ python src/cli.py initdb\n\nDatabase initialized successfully\n\n$ python src/cli.py create_topic --name Paris\n\nCreated topic 'Paris' with ID: 059a97ed-3d7d-4fc9-a2b6-9b12df52b414\n\n$ python src/cli.py ingest https://en.wikivoyage.org/wiki/Paris\n\nData ingested successfully\n\n$ python src/cli.py list_topics\n\nAvailable topics:\n  059a97ed-3d7d-4fc9-a2b6-9b12df52b414: Paris\n\n$ poetry run python src/cli.py query --topic_id 059a97ed-3d7d-4fc9-a2b6-9b12df52b414 'what are some nice things to see?'\n\nSome nice things to see in Paris include the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Additionally, the charming neighborhood of Montmartre and the historic district of Le Marais are also worth exploring.\n```\n\n## Development\n\nRun tests in watch mode:\n\n```\npoetry run task test_watch\n```\n\nWhen adding a test for code that relies on LLM calls, run `poetry run task test_with_new_network_calls` (see [LLMs and testing](#llms-and-testing) below.)\n\n## Architecture and development notes\n\n### The ingestion workflow\n\n```mermaid\n%%{init: {'flowchart': {'curve': 'linear'}}}%%\ngraph TD;\n\t__start__([\u003cp\u003e__start__\u003c/p\u003e]):::first\n\tfetch(fetch)\n\textract(extract)\n\tingest(ingest)\n\t__end__([\u003cp\u003e__end__\u003c/p\u003e]):::last\n\t__start__ --\u003e fetch;\n\textract --\u003e ingest;\n\tfetch --\u003e extract;\n\tingest --\u003e __end__;\n\tclassDef default fill:#f2f0ff,line-height:1.2\n\tclassDef first fill-opacity:0\n\tclassDef last fill:#bfb6fc\n```\n\nThere are three data extractors, meant to provide a framework and examples within the framework, not to exhaust the possibilities:\n\n- [specialized local parsing for well-known structured data](https://github.com/bard/rag-engine/blob/ateam/src/data/insurance_average_expenditure.py) (on the [ateam branch](https://github.com/bard/rag-engine/tree/ateam))\n- [generic textual data](src/data/textual.py)\n- [generic LLM-driven parsing of tabular data](src/data/generic_tabular.py)\n\n[extract](src/workflow_ingest/node_extract.py) runs through extractors in sequence until one is successful. It's up to the extractor to bail out early if it recognizes it cannot do anything useful with the received data.\n\n### The query workflow\n\n```mermaid\n%%{init: {'flowchart': {'curve': 'linear'}}}%%\ngraph TD;\n\t__start__([\u003cp\u003e__start__\u003c/p\u003e]):::first\n\tclassify_query(classify_query)\n\tretrieve_from_weather_service(retrieve_from_weather_service)\n\tretrieve_from_knowledge_base(retrieve_from_knowledge_base)\n\trerank(rerank)\n\tgenerate(generate)\n\t__end__([\u003cp\u003e__end__\u003c/p\u003e]):::last\n\t__start__ --\u003e classify_query;\n\tgenerate --\u003e __end__;\n\trerank --\u003e generate;\n\tretrieve_from_knowledge_base --\u003e rerank;\n\tretrieve_from_weather_service --\u003e retrieve_from_knowledge_base;\n\tclassify_query -.-\u003e retrieve_from_weather_service;\n\tclassify_query -.-\u003e retrieve_from_knowledge_base;\n\tclassDef default fill:#f2f0ff,line-height:1.2\n\tclassDef first fill-opacity:0\n\tclassDef last fill:#bfb6fc\n```\n\nThe conditional edge and the node `retrieve_from_weather_service` isn't necessarily the best design for sourcing external knowledge, and a case could be made for either:\n\n- the `classify_query` node populating an `external_knowledge_sources` array in the agent's state with a list of sources it decided it would be useful to query (the `classify_query` already does this for the limited case of weather queries), then passing control to the `retrieve` node for retrieval from all knowledge sources, both local and external;\n- defining external knowledge sources as LangChain tools and leaving it to the LLM to decide whether to call call those tools.\n\n### Modelling, configuration, dependencies\n\nClass abstractions for the agentic functionality are intentionally avoided since configuration and state are already covered by LangGraph-native concepts (agent state and `RunnableConfig`).\n\nAll runnables (workflow nodes, but also API route handlers and CLI commands) instantiate their own dependencies (database connections, third-party API clients, ...) upon invocation, based on the configuration object, instead of expecting them from module scope. Together with the configuration object being strictly serializable, this allows extracting a runnable to a separate process (e.g. lambda) with minimal effort if the need arises.\n\n### LLMs and testing\n\n[vcr.py](https://vcrpy.readthedocs.io/en/latest/) is used to keep tests realistic, cheap, fast, and to protect from the variability of LLM responses. When a test marked with `@pytest.mark.vcr` runs for the first time, requests go to the network and responses are recorded; in subsequent runs, recorded responses are replayed, thus avoiding latency and API billing, and ensuring stable responses.\n\n### Limitations and possible improvements\n\nThe following is missing:\n\n- database migrations\n- post-retrieval reranking (only stubbed)\n- protection against prompt injection\n- monitoring\n- support for vector stores other than ChromaDB (Pinecone is stubbed)\n- multi-user\n- per-task LLM configuration\n\nAny SQL database supported by SQLAlchemy should work, but only SQLite and Postgres are tested.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbard%2Frag-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbard%2Frag-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbard%2Frag-engine/lists"}