Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/guhan-tofu/rag
FREE Implementation of a RAG pipeline using nvidia llama-3.1 nemo model and pinecone
https://github.com/guhan-tofu/rag
ai llama3 nvidia pinecone python rag retrieval-augmented-generation
Last synced: about 1 month ago
JSON representation
FREE Implementation of a RAG pipeline using nvidia llama-3.1 nemo model and pinecone
- Host: GitHub
- URL: https://github.com/guhan-tofu/rag
- Owner: guhan-tofu
- License: mit
- Created: 2024-10-29T17:58:18.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-11-06T19:01:34.000Z (3 months ago)
- Last Synced: 2024-12-17T10:45:08.423Z (about 2 months ago)
- Topics: ai, llama3, nvidia, pinecone, python, rag, retrieval-augmented-generation
- Language: Python
- Homepage:
- Size: 78.1 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RAG Application: Retrieval-Augmented Generation with NVIDIA LLaMA 3.1, Pinecone, and MPNet
This project is a Retrieval-Augmented Generation (RAG) pipeline that leverages NVIDIA's **LLaMA 3.1** language model for natural language interaction, **Pinecone** for vector storage and retrieval, and the **all-mpnet-base-v2** model to create text embeddings. The application allows users to upload documents, process them into embeddings, store them in a vector database, and perform contextualized Q&A based on relevant retrieved documents.
![image](https://github.com/user-attachments/assets/a6aacad3-4aef-43bd-bb63-c50dc860b01d)
## Contributors
Sri Guhanathan
Ernest Bozjigtov
## Features
- **Upload Document**: Supports .txt file uploads for document embedding and storage in Pinecone.
- **Conversational AI**: Uses NVIDIA's LLaMA 3.1 model to handle user queries with relevant context retrieval.
- **Vector Database Integration**: Efficient vector storage and retrieval using Pinecone.
- **Embedding Model**: Employs the `all-mpnet-base-v2` embedding model for creating high-quality vector representations.## Tech Stack
- **Frontend**: HTML, CSS, JavaScript (React)
- **Backend**: Python (Flask)
- **Model Integration**: NVIDIA LLaMA 3.1 and Sentence Transformers (all-mpnet-base-v2)
- **Vector Database**: Pinecone## Embedding Model
- **all-mpnet-base-v2**: 768 dimensions and uses dot product scoring instead of cosine
## Nvidia Model
- **nvidia / llama-3.1-nemotron-70b-instruct**: [Llama-3.1-Nemotron-70B-Instruct](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-instruct) is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.
### Model Architecture:
- Architecture Type: Transformer
- Network Architecture: Llama 3.1### Input:
- Input Type(s): Text
- Input Format: String
- Input Parameters: One Dimensional (1D)
- Other Properties Related to Input: Max of 128k tokens### Output:
- Output Type(s): Text
- Output Format: String
- Output Parameters: One Dimensional (1D)
- Other Properties Related to Output: Max of 4k tokens---