Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tlecomte13/example-rag-csv-ollama

This project uses LangChain to load CSV documents, split them into chunks, store them in a Chroma database, and query this database using a language model. It allows adding documents to the database, resetting the database, and generating context-based responses from the stored documents.
https://github.com/tlecomte13/example-rag-csv-ollama

chroma chromadb csv langchain langchain-python ollama python

Last synced: about 2 months ago
JSON representation

This project uses LangChain to load CSV documents, split them into chunks, store them in a Chroma database, and query this database using a language model. It allows adding documents to the database, resetting the database, and generating context-based responses from the stored documents.

Awesome Lists containing this project

README

        

# Example Project: create RAG (Retrieval-Augmented Generation) with LangChain and Ollama

This project uses LangChain to load CSV documents, split them into chunks, store them in a Chroma database, and query this database using a language model.

## Prerequisites

- [Ollama](https://ollama.com/download)
- Python 3.8 or higher
- pip

## Installation

1. Clone the repository:
```bash
git clone
cd
```

2. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate
```

3. Install the dependencies:
```bash
pip install -r requirements.txt
```

4. Install model with Ollama:
```bash
ollama pull
```

## Configuration

Ensure that the `config.yaml` file is correctly configured.

## Usage
Add documents to the database
To add documents to the Chroma database, run:

```bash
python add_csv_in_database.py
```

You can reset the database using the --reset option:

```bash
python add_csv_in_database.py --reset
```

Query the database

To query the database, run:

```bash
python main.py
```

## Project Structure
* **add_csv_in_database.py**: Script to load CSV documents, split them into chunks, and add them to the Chroma database.
* **main.py**: Script to query the Chroma database and generate context-based responses.
* **helper/get_embedding_function.py**: Utility function to get the embedding function.
* **config.yaml**: Configuration file for file paths, models, and text splitting parameters.

## Dependencies

* langchain_community
* langchain_chroma
* tqdm
* rich
* pyyaml

### Sources

https://www.sakunaharinda.xyz/ragatouille-book/intro.html
https://ollama.com/
https://www.youtube.com/watch?v=2TJxpyO3ei4
https://python.langchain.com/v0.2/docs/introduction/