https://github.com/timescale/rag-is-more-than-vector-search

Companion repo to "RAG is more than vector search" blog post
https://github.com/timescale/rag-is-more-than-vector-search

Last synced: about 2 months ago
JSON representation

Companion repo to "RAG is more than vector search" blog post

Host: GitHub
URL: https://github.com/timescale/rag-is-more-than-vector-search
Owner: timescale
Created: 2024-09-04T12:25:42.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-03-06T04:06:46.000Z (4 months ago)
Last Synced: 2025-05-07T04:58:42.370Z (about 2 months ago)
Language: Python
Homepage: https://www.timescale.com/blog/rag-is-more-than-just-vector-search/
Size: 26.4 KB
Stars: 22
Watchers: 12
Forks: 5
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome - timescale/rag-is-more-than-vector-search - Companion repo to "RAG is more than vector search" blog post (Python)

README

        # RAG is more than vector search

# Introduction

This is a repository that contains the code for the article `RAG is more than embeddings`. Head over to the [Timescale blog](https://www.timescale.com/blog/rag-is-more-than-just-vector-search/) to read the article if you haven't already. The code is compatible for python >= 3.9.

## Instructions

1. First install all the required dependencies in the `requirements.txt` file

```

pip install -r requirements.txt

```

2. Make sure to create a `.env` file that has the same environment variables as our `.env.example ` file. You can get your DB_URL after creating a Timescale instance by following the instructions [here](https://docs.timescale.com/getting-started/latest/services/#create-your-timescale-account).

3. Next, ingest in some Github Issues from the `bigcode/the-stack-github-issues` dataset by running the `scripts/ingest.py` file. This will crawl the first 100 issues that match the list of whitelisted repos in our file. We can do so by running the command below.

```bash

python3 ./scripts/ingest.py

```

3. We can then test the function calling ability of our model by running the `scripts/eval.py` file to verify that our model is choosing the right tool with respect to a user query. We can do so by running the command below.

```bash

python3 ./scripts/eval.py

```

4. In order to perform embedding search, we can define a new `.execute` function inside our tools themselves. This allows us to call a `.execute()` function when the tool is selected to immediately return a list of relevant results. To see this in action, run the command below and we'll fetch the top 10 relevant summaries from our database related to the `kubernetes/kubernetes` repository using embedding search.

```bash

python3 ./scripts/embedding_search.py

```

5. Lastly, we'll put it all together in the `agent.py` file where we'll create a one-step agent that'll be able to answer questions about specific repositories in our database. We can run this agent by executing the command below.

```bash

python3 ./scripts/agent.py

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/timescale/rag-is-more-than-vector-search

Awesome Lists containing this project

README