https://github.com/bard/rag-engine
https://github.com/bard/rag-engine
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bard/rag-engine
- Owner: bard
- Created: 2025-01-25T18:21:52.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-01-26T23:17:49.000Z (over 1 year ago)
- Last Synced: 2026-03-11T15:50:50.624Z (4 months ago)
- Language: Python
- Size: 1.68 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Description
An agentic RAG engine with support for heterogeneous source data formats, query routing between local and external knowledge sources, multiple topics.
Components:
- LangGraph ingestion workflow
- LangGraph query workflow
- FastAPI backend
- Admin CLI
- NextJS front end (in a [separate repository](https://github.com/bard/rag-frontend))
## Demo
https://github.com/user-attachments/assets/c88b5ef3-7ba8-4a83-84d1-d33a4f0e67cd
**Table of Contents**
- [Description](#description)
- [Setup](#setup)
- [Running the API](#running-the-api)
- [Running the front end](#running-the-front-end)
- [Running the CLI](#running-the-cli)
- [Development](#development)
- [Architecture and development notes](#architecture-and-development-notes)
- [The ingestion workflow](#the-ingestion-workflow)
- [The query workflow](#the-query-workflow)
- [Modelling, configuration, dependencies](#modelling-configuration-dependencies)
- [LLMs and testing](#llms-and-testing)
- [Limitations and possible improvements](#limitations-and-possible-improvements)
## Setup
```sh
git clone https://github.com/bard/rag-engine
cd rag-engine
poetry install
cp .env.example .env
```
Edit `.env` to specify API keys and database connection strings.
## Running the API
```sh
poetry run task start_api
```
## Running the front end
```sh
git clone https://github.com/bard/rag-frontend
cd rag-frontend
pnpm install
pnpm dev
```
## Running the CLI
```sh
$ poetry shell
$ python src/cli.py initdb
Database initialized successfully
$ python src/cli.py create_topic --name Paris
Created topic 'Paris' with ID: 059a97ed-3d7d-4fc9-a2b6-9b12df52b414
$ python src/cli.py ingest https://en.wikivoyage.org/wiki/Paris
Data ingested successfully
$ python src/cli.py list_topics
Available topics:
059a97ed-3d7d-4fc9-a2b6-9b12df52b414: Paris
$ poetry run python src/cli.py query --topic_id 059a97ed-3d7d-4fc9-a2b6-9b12df52b414 'what are some nice things to see?'
Some nice things to see in Paris include the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Additionally, the charming neighborhood of Montmartre and the historic district of Le Marais are also worth exploring.
```
## Development
Run tests in watch mode:
```
poetry run task test_watch
```
When adding a test for code that relies on LLM calls, run `poetry run task test_with_new_network_calls` (see [LLMs and testing](#llms-and-testing) below.)
## Architecture and development notes
### The ingestion workflow
```mermaid
%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
__start__([
__start__
]):::first
fetch(fetch)
extract(extract)
ingest(ingest)
__end__([__end__
]):::last
__start__ --> fetch;
extract --> ingest;
fetch --> extract;
ingest --> __end__;
classDef default fill:#f2f0ff,line-height:1.2
classDef first fill-opacity:0
classDef last fill:#bfb6fc
```
There are three data extractors, meant to provide a framework and examples within the framework, not to exhaust the possibilities:
- [specialized local parsing for well-known structured data](https://github.com/bard/rag-engine/blob/ateam/src/data/insurance_average_expenditure.py) (on the [ateam branch](https://github.com/bard/rag-engine/tree/ateam))
- [generic textual data](src/data/textual.py)
- [generic LLM-driven parsing of tabular data](src/data/generic_tabular.py)
[extract](src/workflow_ingest/node_extract.py) runs through extractors in sequence until one is successful. It's up to the extractor to bail out early if it recognizes it cannot do anything useful with the received data.
### The query workflow
```mermaid
%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
__start__([
__start__
]):::first
classify_query(classify_query)
retrieve_from_weather_service(retrieve_from_weather_service)
retrieve_from_knowledge_base(retrieve_from_knowledge_base)
rerank(rerank)
generate(generate)
__end__([__end__
]):::last
__start__ --> classify_query;
generate --> __end__;
rerank --> generate;
retrieve_from_knowledge_base --> rerank;
retrieve_from_weather_service --> retrieve_from_knowledge_base;
classify_query -.-> retrieve_from_weather_service;
classify_query -.-> retrieve_from_knowledge_base;
classDef default fill:#f2f0ff,line-height:1.2
classDef first fill-opacity:0
classDef last fill:#bfb6fc
```
The conditional edge and the node `retrieve_from_weather_service` isn't necessarily the best design for sourcing external knowledge, and a case could be made for either:
- the `classify_query` node populating an `external_knowledge_sources` array in the agent's state with a list of sources it decided it would be useful to query (the `classify_query` already does this for the limited case of weather queries), then passing control to the `retrieve` node for retrieval from all knowledge sources, both local and external;
- defining external knowledge sources as LangChain tools and leaving it to the LLM to decide whether to call call those tools.
### Modelling, configuration, dependencies
Class abstractions for the agentic functionality are intentionally avoided since configuration and state are already covered by LangGraph-native concepts (agent state and `RunnableConfig`).
All runnables (workflow nodes, but also API route handlers and CLI commands) instantiate their own dependencies (database connections, third-party API clients, ...) upon invocation, based on the configuration object, instead of expecting them from module scope. Together with the configuration object being strictly serializable, this allows extracting a runnable to a separate process (e.g. lambda) with minimal effort if the need arises.
### LLMs and testing
[vcr.py](https://vcrpy.readthedocs.io/en/latest/) is used to keep tests realistic, cheap, fast, and to protect from the variability of LLM responses. When a test marked with `@pytest.mark.vcr` runs for the first time, requests go to the network and responses are recorded; in subsequent runs, recorded responses are replayed, thus avoiding latency and API billing, and ensuring stable responses.
### Limitations and possible improvements
The following is missing:
- database migrations
- post-retrieval reranking (only stubbed)
- protection against prompt injection
- monitoring
- support for vector stores other than ChromaDB (Pinecone is stubbed)
- multi-user
- per-task LLM configuration
Any SQL database supported by SQLAlchemy should work, but only SQLite and Postgres are tested.