An open API service indexing awesome lists of open source software.

https://github.com/amscotti/local-llm-with-rag

Running local Language Language Models (LLM) to perform Retrieval-Augmented Generation (RAG)
https://github.com/amscotti/local-llm-with-rag

chatbot embeddings langchain llm mistral ollama python rag retrieval-augmented-generation streamlit

Last synced: 3 months ago
JSON representation

Running local Language Language Models (LLM) to perform Retrieval-Augmented Generation (RAG)

Awesome Lists containing this project

README

        

# Local LLM with RAG


A wizard experimenting - Leonardo AI

This project is an experimental sandbox for testing out ideas related to running local Large Language Models (LLMs) with [Ollama](https://ollama.ai/) to perform Retrieval-Augmented Generation (RAG) for answering questions based on sample PDFs. In this project, we are also using Ollama to create embeddings with the [nomic-embed-text](https://ollama.com/library/nomic-embed-text) to use with [Chroma](https://docs.trychroma.com/). Please note that the embeddings are reloaded each time the application runs, which is not efficient and is only done here for testing purposes.

[![asciicast](https://asciinema.org/a/fepTvXf1UiDpRUhhNiswL8isu.svg)](https://asciinema.org/a/fepTvXf1UiDpRUhhNiswL8isu)

There is also a web UI created using [Streamlit](https://streamlit.io/) to provide a different way to interact with Ollama.


Screenshot of Streamlit web UI

## Requirements

- [Ollama](https://ollama.ai/) verson 0.5.7 or higher.

## Setup

1. Clone this repository to your local machine.
2. Install UV using instructions from the Astral site, [Installation](https://docs.astral.sh/uv/#installation)
3. Create a virtual environment and install the required Python packages by running `uv sync`

## Running the Project

**Note:** The first time you run the project, it will download the necessary models from Ollama for the LLM and embeddings. This is a one-time setup process and may take some time depending on your internet connection.

1. Run the main script with `uv app.py -m -p ` to specify a model and the path to documents. If no model is specified, it defaults to [mistral](https://ollama.com/library/mistral). If no path is specified, it defaults to `Research` located in the repository for example purposes.
2. Optionally, you can specify the embedding model to use with `-e `. If not specified, it defaults to [nomic-embed-text](https://ollama.com/library/nomic-embed-text).

This will load the PDFs and Markdown files, generate embeddings, query the collection, and answer the question defined in `app.py`.

## Running the Streamlit UI

Run the Streamlit application by executing `uv streamlit run ui.py` in your terminal.

This will start a local web server and open a new tab in your default web browser where you can interact with the application. The Streamlit UI allows you to select models, select a folder, providing an easier and more intuitive way to interact with the RAG chatbot system compared to the command-line interface. The application will handle the loading of documents, generating embeddings, querying the collection, and displaying the results interactively.

## Technologies Used

- [Langchain](https://github.com/langchain/langchain): A Python library for working with Large Language Model
- [Ollama](https://ollama.ai/): A platform for running Large Language models locally.
- [Chroma](https://docs.trychroma.com/): A vector database for storing and retrieving embeddings.
- [PyPDF](https://pypi.org/project/PyPDF2/): A Python library for reading and manipulating PDF files.
- [Streamlit](https://streamlit.io/): A web framework for creating interactive applications for machine learning and data science projects.
- [UV](https://astral.sh/uv): A fast and efficient Python package installer and resolver.