Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ellie-sleightholm/marqo-google-gemma2

Simple demo of a Local RAG Question and Answering System with Google Gemma 2 2B and Marqo
https://github.com/ellie-sleightholm/marqo-google-gemma2

Last synced: 7 days ago
JSON representation

Simple demo of a Local RAG Question and Answering System with Google Gemma 2 2B and Marqo

Host: GitHub
URL: https://github.com/ellie-sleightholm/marqo-google-gemma2
Owner: ellie-sleightholm
Created: 2024-07-31T19:06:21.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-07-31T20:20:02.000Z (5 months ago)
Last Synced: 2024-08-01T23:58:19.540Z (5 months ago)
Language: Python
Homepage:
Size: 1.7 MB
Stars: 9
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Marqo x Google Gemma 2 for RAG

This is a small demo of a Local RAG Question and Answering System with [Google Gemma 2 2B](https://huggingface.co/collections/google/gemma-2-2b-release-66a20f3796a2ff2a7c76f98f) and [Marqo](https://marqo.ai/).

Article with full walkthrough (this is for Llama 3.1 but the same principles apply): https://marqo.ai/blog/marqo-llama-rag

## Setup and Installation

### Frontend

Installs the necessary Node.js packages for the frontend project and then start the development server. This will be at http://localhost:3000.
```
cd frontend
npm i
npm run dev
```
The frontend will look the same as in the video at the top of this README.

### Backend

#### 1. Obtaining Google Gemma 2 2B Models
To run this project locally, you will need to obtain the appropriate models. Download the model, `https://huggingface.co/google/gemma-2-2b-it-GGUF`, and place it into a new directory `backend/models/2B/`.

#### 2. Install Dependencies
Next, navigate to the backend directory, create a virtual environment, activate it, and install the required Python packages listed in the [requirements.txt](/backend/requirements.txt) file.

```
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

To run this project, you'll need download NLTK (Natural Language Toolkit) data because the [`document_processors.py`](/backend/document_processors.py) script uses NLTK's sentence tokenization functionality. Specifically, the `sentence_chunker` and `sentence_pair_chunker` functions rely on NLTK's sent_tokenize method to split text into sentences.

Specify Python interpreter:
```
python3
```
Import NLTK:
```python
import nltk
nltk.download("all")
```

#### 3. Run Marqo
For the RAG aspect of this project, I will be using [Marqo](https://marqo.ai/), the end-to-end vector search engine.

Marqo requires Docker. To install Docker go to the [Docker Official website](https://docs.docker.com/get-docker/). Ensure that docker has at least 8GB memory and 50GB storage. In Docker desktop, you can do this by clicking the settings icon, then resources, and selecting 8GB memory.

Use docker to run Marqo:

```bash
docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest
```
When the project starts, the Marqo index will be empty until you add information in the 'Add Knowledge' section of the frontend.

Great, now all that's left to do is run the webserver!

#### 4. Run the Web Server
Starts a Flask development server in debug mode on port 5001 using Python 3:
```
python3 -m flask run --debug -p 5001
```

Navigate to http://localhost:3000 and begin inputting your questions to Google Gemma 2 2B!

## Experimenting
When running this project, feel free to experiment with different settings.

You can change the model in [`backend/ai_chat.py`](/backend/ai_chat.py):
```python
LLM = Llama(
model_path="models/2B/your_model",
)
```

You can also change the score in the function `query_for_content` in [`backend/knowledge_store.py`](/backend/knowledge_store.py):
```python
knowledge = [res[content_var] for res in resp["hits"] if res["_score"] > 0.8]
```
This queries the Marqo knowledge store and retrieves content based on the provided query. It filters the results to include only those with a relevance score above *0.8* and returns the specified content from these results, limited to a maximum number of results as specified by the limit parameter. Feel free to change this score depending on your relevance needs.

## Specifications
This can run locally on an M1 or M2 Mac or with a CUDA capable GPU on Linux or Windows. If you want to run this on an M1 or M2 Mac please be sure to have the ARM64 version of Python installed, this will make `llama.cpp` builds for ARM64 and utilises Metal for inference rather than building for an x86 CPU and being emulated with Rosetta.

## Further Work
This is a very simple demo. Future work on this project will include several enhancements:
* Enable Chatbot Memory: Store conversation history to make conversing with the chatbot more like a real-life experience
* Provide an Initial Set of Documents: at the moment, when the project starts, the Marqo index is empty. Results will be better if we preload the Marqo knowledge store with a set of initial documents relevant to the domain of interest.
* Improve User Interface
* Optimize Backend Performance
* Extend Support for Different Document Types

## Further Guidance
To accompany this project, I wrote an article covering how you can run this repository and what you can expect to see when doing so. Visit this article for further guidance and information.