{"id":18398785,"url":"https://github.com/mongodb-developer/atlas-vector-search-rag","last_synced_at":"2025-06-24T16:38:17.204Z","repository":{"id":203413507,"uuid":"709537536","full_name":"mongodb-developer/atlas-vector-search-rag","owner":"mongodb-developer","description":null,"archived":false,"fork":false,"pushed_at":"2024-07-05T15:42:49.000Z","size":11,"stargazers_count":29,"open_issues_count":3,"forks_count":18,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-14T10:55:35.833Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mongodb-developer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-24T22:08:48.000Z","updated_at":"2025-03-29T16:02:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"f9a75b33-fcb4-4682-93e7-9e5b08586b68","html_url":"https://github.com/mongodb-developer/atlas-vector-search-rag","commit_stats":null,"previous_names":["mongodb-developer/atlas-vector-search-rag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mongodb-developer/atlas-vector-search-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mongodb-developer%2Fatlas-vector-search-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mongodb-developer%2Fatlas-vector-search-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mongodb-developer%2Fatlas-vector-search-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mongodb-developer%2Fatlas-vector-search-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mongodb-developer","download_url":"https://codeload.github.com/mongodb-developer/atlas-vector-search-rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mongodb-developer%2Fatlas-vector-search-rag/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261715418,"owners_count":23198762,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T02:24:24.457Z","updated_at":"2025-06-24T16:38:16.756Z","avatar_url":"https://github.com/mongodb-developer.png","language":"Python","readme":"# Atlas Vector Search with RAG\n\nThe Python scripts in this repo use Atlas Vector Search with Retrieval-Augmented Generation (RAG) architecture to build a Question Answering application. They use the LangChain framework, OpenAI models, as well as Gradio in conjunction with Atlas Vector Search in a RAG architecture, to create this app.\n\n\n## Setting up the Environment\n\n1. Install the following packages:\n```\npip3 install langchain pymongo bs4 openai tiktoken gradio requests lxml argparse unstructured\n```\n2. Create OpenAI API Key from [here](https://platform.openai.com/account/api-keys). Note that this requires a paid account with OpenAI, with enough credits. OpenAI API requests stop working if credit balance reaches `$0`.\n\n3. Save the OpenAI API key and the MongoDB URI in the `key_param.py` file, like this:\n```\nopenai_api_key = \"ENTER_OPENAI_API_KEY_HERE\"\nMONGO_URI = \"ENTER_MONGODB_URI_HERE\"\n```\n4. Use the following two python scripts:\n   - **load_data.py**: This script will be used to load your documents and ingest the text and vector embeddings, in a MongoDB collection.\n   - **extract_information.py**: This script will generate the user interface and will allow you to perform question-answering against your data, using Atlas Vector Search and OpenAI.\n\n**Note:** In this demo, I've used:\n   - DB Name: `langchain_demo`\n   - Collection Name: `collection_of_text_blobs`\n   - The text files that I am using as my source data are saved in a directory named `sample_files`.\n\n## Main Components\n\n| LangChain                                                                                                                  | OpenAI                                                                                                                           | Atlas Vector Search                                                                                                  | Gradio                                                     |\n|----------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|\n| [**DirectoryLoader**](https://api.python.langchain.com/en/latest/document_loaders/langchain.document_loaders.unstructured.UnstructuredFileLoader.html): \u003cbr\u003e - All documents from a directory \u003cbr\u003e - Split and load \u003cbr\u003e - Uses the [Unstructured](https://python.langchain.com/docs/integrations/document_loaders/unstructured_file.html) package | **Embedding Model**: \u003cbr\u003e - [text-embedding-ada-002](https://openai.com/blog/new-and-improved-embedding-model) \u003cbr\u003e - Text → Vector embeddings \u003cbr\u003e - 1536 dimensions           | [**Vector Store**](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/)                             | [UI](https://www.gradio.app/) for LLM app \u003cbr\u003e - Open-source Python library \u003cbr\u003e - Allows to quickly create user interfaces for ML models |\n| [**RetrievalQA**](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval_qa.base.BaseRetrievalQA.html?highlight=retrievalqa#langchain.chains.retrieval_qa.base.BaseRetrievalQA): \u003cbr\u003e - Retriever \u003cbr\u003e - Question-answering chain                       | **Language model**: \u003cbr\u003e - [gpt-3.5-turbo](https://platform.openai.com/docs/models/gpt-3-5) \u003cbr\u003e - Understands and generates natural language \u003cbr\u003e - Generates text, answers, translations, etc.                                       |                                                                                                                           |                                                            |\n| [**MongoDBAtlasVectorSearch**](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.mongodb_atlas.MongoDBAtlasVectorSearch.html): \u003cbr\u003e - Wrapper around Atlas Vector Search \u003cbr\u003e - Easily create and store embeddings in MongoDB collections \u003cbr\u003e - Perform KNN Search using Atlas Vector Search          |                                                                                                                                                                                      |                                                                                                                           |                                                            |\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmongodb-developer%2Fatlas-vector-search-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmongodb-developer%2Fatlas-vector-search-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmongodb-developer%2Fatlas-vector-search-rag/lists"}