https://github.com/3m6d/llama2rag
This project is RAG system using two different models, mistral and Llama2 using SQlite as vector DB and FAISS for vector similarity
https://github.com/3m6d/llama2rag
fintech llama2 llm mistral-7b rag
Last synced: 7 months ago
JSON representation
This project is RAG system using two different models, mistral and Llama2 using SQlite as vector DB and FAISS for vector similarity
- Host: GitHub
- URL: https://github.com/3m6d/llama2rag
- Owner: 3m6d
- Created: 2024-11-14T12:02:52.000Z (11 months ago)
- Default Branch: master
- Last Pushed: 2024-12-06T10:10:24.000Z (10 months ago)
- Last Synced: 2025-01-30T15:19:28.270Z (8 months ago)
- Topics: fintech, llama2, llm, mistral-7b, rag
- Language: Python
- Homepage:
- Size: 4.39 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **Retrieval Augmented Generation (RAG) with LLaMA and Mistral Model**
## **Overview**
This project demonstrates how to integrate and interact with two powerful AI models: the **LLaMA** model (Large Language Model by Meta) and **Mistral 7B Instruct v0.3**, an advanced language model developed by Hugging Face. Both models can be used to generate AI-driven responses based on a given prompt, and the application is designed to serve as a basis for building AI-powered systems such as chatbots, virtual assistants, or automated content generation tools.### **Key Features:**
- **LLaMA Model**: A family of foundational language models designed to handle complex language tasks with fine-tuned control over responses.
- **Mistral 7B Instruct v0.3**: A high-performance model optimized for instruction-based tasks, delivering efficient and accurate text generation.## **Technologies Used**
- **Python 3.x**: The main programming language used to implement the system.
- **Flask**: A micro web framework used to create the API for interacting with the models.
- **Hugging Face Transformers**: A library for working with transformer-based models like Mistral and LLaMA.
- **FAISS**: A library for efficient similarity search and clustering, used here to index and retrieve relevant document chunks.
- **Postman**: A tool for testing APIs.
- **LM Studio**: A platform for running the LLaMA and Mistral models locally.## **Setup & Installation**
### 1. **Clone the Repository**
Clone the repository to your local machine:```bash
git clone
cd
```### 2. **Create and Activate a Virtual Environment**
It's recommended to use a virtual environment to manage your dependencies.```bash
python -m venv venv
source venv/bin/activate # On Windows, use venv\Scripts\activate
```### 3. **Install Dependencies**
Install the required libraries using `pip`:```bash
pip install -r requirements.txt
```The `requirements.txt` should contain dependencies like:
```text
Flask==2.x
requests==2.x
sentence-transformers==2.x
faiss-cpu==1.7.2
transformers==4.x
```### 4. **Configure the Environment**
- Make sure to set up any environment variables required for your project. For example, you might need an API key for external services or model endpoints. You can set these in `.env` or export them directly.
Example for Flask setup:
```bash
export FLASK_APP=app.py
export FLASK_ENV=development
```### 5. **Install LM Studio**
- Download and install **LM Studio** from [LM Studio's official website](https://lmstudio.ai/).
- Load the **Mistral 7B Instruct v0.3** and **LLaMA** models into LM Studio.
- Start the LM Studio server and note the API endpoint (e.g., `http://localhost:8000`).### 6. **Initialize FAISS Index and Embedding Model**
- The project requires a pre-built FAISS index (`faiss_index.idx`) and the `SentenceTransformer` model.
- If you don't already have the FAISS index, follow the process for indexing your document chunks in the `faiss_index.py` file.## **Running the Application**
To start the Flask server:
```bash
python app.py
```The Flask app will start running on `http://localhost:5000` by default.
## **Testing the Flask Server Using Postman**
1. **Install Postman**
- Download and install Postman from [Postman’s official website](https://www.postman.com/).2. **Create a New Request**
- Open Postman and create a new request.
- Set the request type to `POST`.
- Enter the API URL: `http://localhost:5000/query`.3. **Send a Test Query**
- In the request body, select `raw` and choose `JSON` format.
- Enter a test query:
```json
{
"query": "What is the capital of France?"
}
```4. **Submit the Request**
- Click the `Send` button to send the request.
- View the response in the output panel. For example:
```json
{
"response": "The capital of France is Paris."
}
```## **How It Works**
### **Mistral 7B Instruct v0.3**
Mistral 7B Instruct v0.3 is a high-performance language model designed for instruction-based tasks. It excels at providing accurate and relevant responses to prompts, making it ideal for applications like chatbots, virtual assistants, and AI-driven systems.#### **Usage with LM Studio**
The **Mistral 7B Instruct v0.3** model is accessed through the **LM Studio API**. The system sends a request to this API endpoint with a user query and receives the corresponding response generated by the model.### **LLaMA Model (Meta)**
LLaMA (Large Language Model Meta AI) is a state-of-the-art foundational model capable of handling a variety of tasks like text generation, summarization, and translation. In this project, the LLaMA model serves as a backbone for generating responses from a system query.### **Integration via `requests`**
Both **Mistral 7B Instruct v0.3** and **LLaMA** are integrated using the `requests` library to send HTTP POST requests to their respective API endpoints. The payload sent to the APIs includes:
- **Model name** (e.g., `"mistral-7b-instruct-v0.3:2"`)
- **Prompt** (including context and user query)
- **Other parameters** (like `max_tokens`, `temperature`)