{"id":23493131,"url":"https://github.com/semihbugrasezer/chat-to-multipdf","last_synced_at":"2026-02-17T19:01:11.333Z","repository":{"id":187916978,"uuid":"677807025","full_name":"semihbugrasezer/chat-to-multipdf","owner":"semihbugrasezer","description":"Multi-PDF ChatBot","archived":false,"fork":false,"pushed_at":"2024-12-11T18:24:31.000Z","size":82505,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-15T09:17:20.308Z","etag":null,"topics":["chatbot","langchain","pypdf2","python","streamlit-webapp"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/semihbugrasezer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-08-12T17:41:06.000Z","updated_at":"2024-12-11T18:24:35.000Z","dependencies_parsed_at":"2025-04-16T07:47:38.530Z","dependency_job_id":null,"html_url":"https://github.com/semihbugrasezer/chat-to-multipdf","commit_stats":null,"previous_names":["semihbugrasezer/chat-to-multipdf"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/semihbugrasezer/chat-to-multipdf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semihbugrasezer%2Fchat-to-multipdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semihbugrasezer%2Fchat-to-multipdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semihbugrasezer%2Fchat-to-multipdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semihbugrasezer%2Fchat-to-multipdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/semihbugrasezer","download_url":"https://codeload.github.com/semihbugrasezer/chat-to-multipdf/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semihbugrasezer%2Fchat-to-multipdf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29554367,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-17T18:16:07.221Z","status":"ssl_error","status_checked_at":"2026-02-17T18:16:04.782Z","response_time":100,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","langchain","pypdf2","python","streamlit-webapp"],"created_at":"2024-12-25T02:18:28.874Z","updated_at":"2026-02-17T19:01:11.239Z","avatar_url":"https://github.com/semihbugrasezer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n## Multipdf to Chat\n\n### 1. **Streamlit**:\n   - Used to create the web interface. Users upload PDF files, ask questions, and view the answers.\n   - Provides the chatbot interface: Users can type questions and receive answers from the chatbot.\n\n### 2. **PyPDF2**:\n   - Used to extract text from the uploaded PDF files. It reads text from the pages of the PDFs.\n\n### 3. **Langchain**:\n   - **Text Splitter**: The **RecursiveCharacterTextSplitter** is used to split large text files into meaningful chunks. The text is divided into smaller, more manageable pieces.\n   - **GoogleGenerativeAIEmbeddings**: Creates vectors from the text chunks. These vectors are later used for similarity searches.\n   - **FAISS**: A library used for efficient and fast vector searches. FAISS stores the vectors of the uploaded text and performs similarity searches using these vectors.\n   - **ChatGoogleGenerativeAI**: Generates answers to questions using Google's **Gemini Pro** model.\n   - **load_qa_chain**: Loads the question-answer chain, which handles processing the text and generating meaningful answers for the user's questions.\n\n### 4. **dotenv**:\n   - Ensures that sensitive information, such as the **Google API Key**, is securely loaded.\n\n## Steps of the Application:\n\n### 1. **PDF Upload**:\n   - The user uploads the PDF files. These files are uploaded using `st.file_uploader`.\n   - The uploaded PDFs are processed using the `get_pdf_text` function to extract the full text.\n\n### 2. **Text Splitting**:\n   - The extracted text is split into chunks of 10,000 characters using the `get_text_chunks` function. This splitting process helps in efficiently processing large texts.\n\n### 3. **Vector Storage**:\n   - Vectors are created from the text chunks using **GoogleGenerativeAIEmbeddings**.\n   - These vectors are stored in **FAISS**, which enables fast access for similarity searches.\n\n### 4. **Question-Answer Chain**:\n   - When the user asks a question, the application searches for answers based on the similarity between the question and the text in the PDFs.\n   - Similar texts are retrieved using **FAISS** and answers are generated with **ChatGoogleGenerativeAI**.\n\n### 5. **Streamlit Chat Interface**:\n   - The application provides an interactive chat interface between the user and the chatbot. As the user types questions, the chatbot provides appropriate responses.\n\n### 6. **API Key**:\n   - The Google API key is read from the `.env` file, allowing the use of Google's Gemini model.\n\n## User Flow:\n\n1. The user uploads the PDF files.\n2. The application extracts text from the PDFs, splits it into chunks, and creates vectors.\n3. When the user asks a question, the chatbot performs a similarity search on the text.\n4. The chatbot uses the Google Gemini model to generate a response and displays the answer to the user.\n\n## Technologies Used:\n- **Streamlit**: For the web interface.\n- **PyPDF2**: For extracting text from PDFs.\n- **Langchain**: For text processing, embedding creation, and vector storage.\n- **Google Generative AI**: Used to generate answers to questions.\n- **FAISS**: For vector searches.\n- **dotenv**: For environment variables, such as the Google API key.\n\n## Conclusion:\nThis application uses powerful AI and vectorization technologies to extract content from PDF files, break it into meaningful chunks, and then generate answers to user questions based on that content.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemihbugrasezer%2Fchat-to-multipdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsemihbugrasezer%2Fchat-to-multipdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemihbugrasezer%2Fchat-to-multipdf/lists"}