https://github.com/mongodb-developer/quickstart-rag-python

This Python project demonstrates semantic search using MongoDB and two different LLM frameworks: LangChain and LlamaIndex. The goal is to load documents from MongoDB, generate embeddings for the text data, and perform semantic searches using both LangChain and LlamaIndex frameworks.
https://github.com/mongodb-developer/quickstart-rag-python

langchain-python llamaindex mongodb-atlas vector-database

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/mongodb-developer/quickstart-rag-python
Owner: mongodb-developer
Created: 2023-12-07T12:20:48.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-06-10T12:36:14.000Z (over 1 year ago)
Last Synced: 2025-04-07T05:35:37.302Z (10 months ago)
Topics: langchain-python, llamaindex, mongodb-atlas, vector-database
Language: Jupyter Notebook
Homepage:
Size: 21.5 KB
Stars: 7
Watchers: 2
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/path-to-notebook)

# Semantic Search with MongoDB and LLM Frameworks

[**Article Link**](https://www.mongodb.com/developer/products/atlas/guide-to-rag-application/)

## Introduction

This Python project demonstrates semantic search using MongoDB and two different LLM frameworks: **LangChain** and **LlamaIndex**. The goal is to load documents from MongoDB, generate embeddings for the text data, and perform semantic searches using both **LangChain** and **LlamaIndex** frameworks.

## Environment Variables

To run this project, you need to set the following environment variables in a `.env` file:

```dotenv
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
MONGODB_URI=YOUR_MONGODB_CONNECTION_URI
MONGODB_COLL=YOUR_MONGODB_COLLECTION
MONGODB_VECTOR_INDEX=YOUR_MONGODB_VECTOR_INDEX
MONGODB_VECTOR_COLL_LANGCHAIN=YOUR_MONGODB_VECTOR_COLLECTION_LANGCHAIN
MONGODB_VECTOR_COLL_LLAMAINDEX=YOUR_MONGODB_VECTOR_COLLECTION_LLAMAINDEX
```

Make sure to replace the placeholder values with your actual API keys and connection details.

## Setup

Install dependencies:

```
pip install -r requirements.txt
```

## Project Overview
### 1. Loading Documents

The project loads documents from the specified MongoDB collection (`MONGODB_COLL`). Ensure that your MongoDB collection contains the text data you want to perform a semantic search on.

### 2. Generating Embeddings
The application generates embeddings for the loaded text data using the LangChain and LlamaIndex frameworks. The embeddings are stored in separate MongoDB collections (`MONGODB_VECTOR_COLL_LANGCHAIN` and `MONGODB_VECTOR_COLL_LLAMAINDEX`).

### 3. Semantic Search
The semantic search is performed using both LangChain and LlamaIndex frameworks. The process involves querying the embeddings collection and retrieving relevant documents based on the semantic similarity of the prompt.

## Additional Information
The `OPENAI_API_KEY` is required for embedding generation using external language models (e.g., OpenAI's GPT).
Make sure to configure MongoDB connection details and collections appropriately.
Check the official documentation for LangChain and LlamaIndex for any additional configuration or usage details.

## Reference
- Atlas Vector Search : [Link to MongoDB Atlas Vector Search](https://www.mongodb.com/products/platform/atlas-vector-search)
- LangChain: [Link to LangChain Documentation](https://python.langchain.com/docs/get_started/introduction)
- LlamaIndex: [Link to LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mongodb-developer/quickstart-rag-python

Awesome Lists containing this project

README