https://github.com/mongodb-developer/quickstart-rag-python
This Python project demonstrates semantic search using MongoDB and two different LLM frameworks: LangChain and LlamaIndex. The goal is to load documents from MongoDB, generate embeddings for the text data, and perform semantic searches using both LangChain and LlamaIndex frameworks.
https://github.com/mongodb-developer/quickstart-rag-python
langchain-python llamaindex mongodb-atlas vector-database
Last synced: 2 months ago
JSON representation
This Python project demonstrates semantic search using MongoDB and two different LLM frameworks: LangChain and LlamaIndex. The goal is to load documents from MongoDB, generate embeddings for the text data, and perform semantic searches using both LangChain and LlamaIndex frameworks.
- Host: GitHub
- URL: https://github.com/mongodb-developer/quickstart-rag-python
- Owner: mongodb-developer
- Created: 2023-12-07T12:20:48.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-10T12:36:14.000Z (about 1 year ago)
- Last Synced: 2025-03-22T13:51:25.865Z (3 months ago)
- Topics: langchain-python, llamaindex, mongodb-atlas, vector-database
- Language: Jupyter Notebook
- Homepage:
- Size: 21.5 KB
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://colab.research.google.com/github/path-to-notebook)
# Semantic Search with MongoDB and LLM Frameworks
[**Article Link**](https://www.mongodb.com/developer/products/atlas/guide-to-rag-application/)
## Introduction
This Python project demonstrates semantic search using MongoDB and two different LLM frameworks: **LangChain** and **LlamaIndex**. The goal is to load documents from MongoDB, generate embeddings for the text data, and perform semantic searches using both **LangChain** and **LlamaIndex** frameworks.
## Environment Variables
To run this project, you need to set the following environment variables in a `.env` file:
```dotenv
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
MONGODB_URI=YOUR_MONGODB_CONNECTION_URI
MONGODB_COLL=YOUR_MONGODB_COLLECTION
MONGODB_VECTOR_INDEX=YOUR_MONGODB_VECTOR_INDEX
MONGODB_VECTOR_COLL_LANGCHAIN=YOUR_MONGODB_VECTOR_COLLECTION_LANGCHAIN
MONGODB_VECTOR_COLL_LLAMAINDEX=YOUR_MONGODB_VECTOR_COLLECTION_LLAMAINDEX
```Make sure to replace the placeholder values with your actual API keys and connection details.
## Setup
Install dependencies:
```
pip install -r requirements.txt
```## Project Overview
### 1. Loading DocumentsThe project loads documents from the specified MongoDB collection (`MONGODB_COLL`). Ensure that your MongoDB collection contains the text data you want to perform a semantic search on.
### 2. Generating Embeddings
The application generates embeddings for the loaded text data using the LangChain and LlamaIndex frameworks. The embeddings are stored in separate MongoDB collections (`MONGODB_VECTOR_COLL_LANGCHAIN` and `MONGODB_VECTOR_COLL_LLAMAINDEX`).### 3. Semantic Search
The semantic search is performed using both LangChain and LlamaIndex frameworks. The process involves querying the embeddings collection and retrieving relevant documents based on the semantic similarity of the prompt.## Additional Information
The `OPENAI_API_KEY` is required for embedding generation using external language models (e.g., OpenAI's GPT).
Make sure to configure MongoDB connection details and collections appropriately.
Check the official documentation for LangChain and LlamaIndex for any additional configuration or usage details.## Reference
- Atlas Vector Search : [Link to MongoDB Atlas Vector Search](https://www.mongodb.com/products/platform/atlas-vector-search)
- LangChain: [Link to LangChain Documentation](https://python.langchain.com/docs/get_started/introduction)
- LlamaIndex: [Link to LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/)