{"id":24770342,"url":"https://github.com/farhaj499/rag_with_pineconedb","last_synced_at":"2026-04-18T17:03:39.422Z","repository":{"id":270228300,"uuid":"909697153","full_name":"Farhaj499/RAG_with_PineconeDB","owner":"Farhaj499","description":"This project implements a Retrieval Augmented Generation (RAG) system that answers questions based on two local PDF documents stored in Google Drive.","archived":false,"fork":false,"pushed_at":"2025-01-12T17:40:27.000Z","size":74,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-04T02:22:19.998Z","etag":null,"topics":["agentic-ai","embeddings","huggingface-transformers","langchain","langchain-python","local-pdfs","pinecone","python","rag","retrieval-augmented-generation","semantic-search","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Farhaj499.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-29T14:15:41.000Z","updated_at":"2025-01-15T06:50:51.000Z","dependencies_parsed_at":"2025-01-29T03:37:41.358Z","dependency_job_id":"04d8be3a-14ef-4d1a-b878-d58545dcdedd","html_url":"https://github.com/Farhaj499/RAG_with_PineconeDB","commit_stats":null,"previous_names":["farhaj499/rag_with_pineconedb"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Farhaj499/RAG_with_PineconeDB","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Farhaj499%2FRAG_with_PineconeDB","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Farhaj499%2FRAG_with_PineconeDB/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Farhaj499%2FRAG_with_PineconeDB/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Farhaj499%2FRAG_with_PineconeDB/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Farhaj499","download_url":"https://codeload.github.com/Farhaj499/RAG_with_PineconeDB/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Farhaj499%2FRAG_with_PineconeDB/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31976806,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T16:27:12.723Z","status":"ssl_error","status_checked_at":"2026-04-18T16:27:11.140Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","embeddings","huggingface-transformers","langchain","langchain-python","local-pdfs","pinecone","python","rag","retrieval-augmented-generation","semantic-search","vector-database"],"created_at":"2025-01-29T03:37:33.714Z","updated_at":"2026-04-18T17:03:39.403Z","avatar_url":"https://github.com/Farhaj499.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Agentic AI RAG System with Pinecone and Gemini (Local PDFs)\n\nThis project implements a Retrieval Augmented Generation (RAG) system that answers questions based on two local PDF documents stored in Google Drive. It uses Pinecone as a vector database for efficient retrieval of relevant information and Gemini (or another LLM) to generate natural language responses.\n\n## Overview\n\nThis RAG system operates as follows:\n\n1.  **Data Loading:** Two PDF files are loaded from a designated Google Drive folder using `PyPDFDirectoryLoader`.\n2.  **Text Chunking:** The extracted text from both PDFs is divided into smaller chunks to optimize retrieval and manage context for the LLM.\n3.  **Embedding Generation:** Sentence embeddings are created for each text chunk using the \"sentence-transformers/all-mpnet-base-v2\" Hugging Face model.\n4.  **Vector Database Storage:** The text chunks and their corresponding embeddings are stored in a Pinecone vector database.\n5.  **Retrieval and Question Answering:** When a user asks a question, the system generates an embedding for the query, retrieves the most similar text chunks from Pinecone, and uses Gemini (or an alternative LLM) to generate a natural language answer based on the retrieved context.\n\n## Technologies Used\n\n*   **Pinecone:** Vector database for storing and retrieving embeddings.\n*   **Hugging Face Transformers:** For generating sentence embeddings using \"sentence-transformers/all-mpnet-base-v2\".\n*   **Gemini (or alternative LLM):** Large Language Model for generating natural language responses.\n*   **Python:** Programming language for implementation.\n*   **LangChain (Optional but highly recommended):** For streamlining the RAG pipeline.\n*   **PyPDFDirectoryLoader (LangChain):** For loading PDF documents from a directory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarhaj499%2Frag_with_pineconedb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffarhaj499%2Frag_with_pineconedb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarhaj499%2Frag_with_pineconedb/lists"}