{"id":29222795,"url":"https://github.com/snigdhasv/pdf_chat","last_synced_at":"2026-04-09T17:53:32.470Z","repository":{"id":301729734,"uuid":"1010109229","full_name":"snigdhasv/PDF_Chat","owner":"snigdhasv","description":"A Streamlit-based AI chat application that allows users to upload PDF documents and ask questions about their content. Uses local models(Ollama) and sentence transformers to create embeddings, enabling intelligent document retrieval and conversation.","archived":false,"fork":false,"pushed_at":"2025-06-28T12:53:33.000Z","size":268,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-28T13:42:39.727Z","etag":null,"topics":["faiss","langchain","ollama","pdf-chatbot","python","rag","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/snigdhasv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-28T11:11:58.000Z","updated_at":"2025-06-28T12:53:36.000Z","dependencies_parsed_at":"2025-06-28T13:42:42.291Z","dependency_job_id":"f71dd5fd-4afd-4fdf-b8a5-ed88ce4ab484","html_url":"https://github.com/snigdhasv/PDF_Chat","commit_stats":null,"previous_names":["snigdhasv/chat_with_pdfs"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/snigdhasv/PDF_Chat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snigdhasv%2FPDF_Chat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snigdhasv%2FPDF_Chat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snigdhasv%2FPDF_Chat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snigdhasv%2FPDF_Chat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/snigdhasv","download_url":"https://codeload.github.com/snigdhasv/PDF_Chat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snigdhasv%2FPDF_Chat/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263256549,"owners_count":23438262,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["faiss","langchain","ollama","pdf-chatbot","python","rag","streamlit"],"created_at":"2025-07-03T04:02:19.607Z","updated_at":"2025-12-30T22:18:07.518Z","avatar_url":"https://github.com/snigdhasv.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PDF Chat Application\r\n\r\nA Streamlit-based application that allows you to chat with your PDF documents using AI. The application uses local language models and embeddings to provide intelligent responses based on the content of your uploaded PDFs.\r\n\r\n## 📸 Screenshots\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003cimg src=\"images/image.png\" alt=\"PDF Chat Application Interface\" width=\"800\"\u003e\r\n  \u003cbr\u003e\r\n  \u003cem\u003eMain application interface showing chat with PDF documents\u003c/em\u003e\r\n\u003c/p\u003e\r\n\r\n## Features\r\n\r\n- 📄 **PDF Upload**: Upload multiple PDF documents\r\n- 🤖 **AI Chat**: Ask questions about your PDF content\r\n- 🧠 **Local AI**: Uses Ollama with local language models (no API costs)\r\n- 🔍 **Semantic Search**: Advanced document retrieval using sentence transformers\r\n- 💬 **Chat Interface**: Beautiful chat UI with user and bot avatars\r\n- 📝 **Memory**: Remembers conversation context\r\n\r\n## Prerequisites\r\n\r\nBefore running this application, make sure you have:\r\n\r\n1. **Python 3.8+** installed\r\n2. **Ollama** installed and running locally\r\n3. **Required Python packages** (see installation section)\r\n\r\n### Installing Ollama\r\n\r\n1. Download Ollama from [https://ollama.com/download](https://ollama.com/download)\r\n2. Install and start Ollama\r\n3. Pull the required model:\r\n   ```bash\r\n   ollama pull deepseek-r1:1.5b\r\n   ```\r\n\r\n## Installation\r\n\r\n1. **Clone or download** this project to your local machine\r\n\r\n2. **Navigate to the project directory**:\r\n\r\n   ```bash\r\n   cd PDF_Chat\r\n   ```\r\n\r\n3. **Create a virtual environment** (recommended):\r\n\r\n   ```bash\r\n   python -m venv venv\r\n   ```\r\n\r\n4. **Activate the virtual environment**:\r\n\r\n   - Windows:\r\n     ```bash\r\n     venv\\Scripts\\activate\r\n     ```\r\n   - macOS/Linux:\r\n     ```bash\r\n     source venv/bin/activate\r\n     ```\r\n\r\n5. **API Keys**:\r\n   Obtain your hugging face api key and add it to the `.env` file in the project directory\r\n\r\n   ```bash\r\n   HUGGINGFACEHUB_API_KEY=your_api_key\r\n   ```\r\n\r\n6. **Install required packages**:\r\n   ```bash\r\n   pip install -r requirements.txt\r\n   ```\r\n\r\n## Usage\r\n\r\n1. **Start the application**:\r\n\r\n   ```bash\r\n   streamlit run app.py\r\n   ```\r\n\r\n2. **Open your browser** and go to the URL shown in the terminal (usually `http://localhost:8501`)\r\n\r\n3. **Upload PDF documents**:\r\n\r\n   - Use the sidebar to upload one or more PDF files\r\n   - Click the \"Process\" button to index the documents\r\n\r\n4. **Start chatting**:\r\n   - Type your questions in the text input\r\n   - The AI will answer based on the content of your uploaded PDFs\r\n\r\n## How It Works\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003cimg src=\"images/workflow-diagram.png\" alt=\"Application Workflow\" width=\"600\"\u003e\r\n  \u003cbr\u003e\r\n  \u003cem\u003eHigh-level overview of how the application processes and responds to queries\u003c/em\u003e\r\n\u003c/p\u003e\r\n\r\n1. **Document Processing**: PDFs are converted to text and split into chunks\r\n2. **Embedding Generation**: Text chunks are converted to vector embeddings using sentence transformers\r\n3. **Vector Storage**: Embeddings are stored in a FAISS vector database for fast retrieval\r\n4. **Question Answering**: When you ask a question:\r\n   - The question is converted to an embedding\r\n   - Similar document chunks are retrieved\r\n   - The local language model generates an answer based on the retrieved context\r\n\r\n## Technical Stack\r\n\r\n\u003cp align=\"center\"\u003e\r\n  \u003cimg src=\"https://img.shields.io/badge/Python-3.8+-blue.svg\" alt=\"Python\"\u003e\r\n  \u003cimg src=\"https://img.shields.io/badge/Streamlit-FF4B4B?style=flat\u0026logo=streamlit\u0026logoColor=white\" alt=\"Streamlit\"\u003e\r\n  \u003cimg src=\"https://img.shields.io/badge/LangChain-00FF00?style=flat\u0026logo=langchain\u0026logoColor=black\" alt=\"LangChain\"\u003e\r\n  \u003cimg src=\"https://img.shields.io/badge/Ollama-FF6B35?style=flat\u0026logo=ollama\u0026logoColor=white\" alt=\"Ollama\"\u003e\r\n\u003c/p\u003e\r\n\r\n- **Frontend**: Streamlit\r\n- **PDF Processing**: PyPDF2\r\n- **Text Splitting**: LangChain CharacterTextSplitter\r\n- **Embeddings**: Sentence Transformers (all-MiniLM-L6-v2)\r\n- **Vector Database**: FAISS\r\n- **Language Model**: Ollama with deepseek-r1:1.5b\r\n- **Conversation Management**: LangChain ConversationalRetrievalChain\r\n\r\n## File Structure\r\n\r\n```\r\nChat_with_pdfs/\r\n├── .env\r\n├── app.py              # Main Streamlit application\r\n├── htmlTemplates.py    # CSS styles and HTML templates\r\n├── README.md          # This file\r\n├── images/            # Screenshots and diagrams\r\n│   ├── app-screenshot.png\r\n│   └── workflow-diagram.png\r\n└── venv/              # Virtual environment (created during setup)\r\n```\r\n\r\n## Customization\r\n\r\n### Changing the Language Model\r\n\r\nTo use a different Ollama model, modify the `get_conversation_chain` function in `app.py`:\r\n\r\n```python\r\ndef get_conversation_chain(vectorstore):\r\n    llm = Ollama(model=\"your-preferred-model\")  # Change this line\r\n    # ... rest of the function\r\n```\r\n\r\n### Modifying the Chat Interface\r\n\r\nEdit `htmlTemplates.py` to customize:\r\n\r\n- Chat message styling\r\n- Avatar images\r\n- Colors and layout\r\n\r\n## Troubleshooting\r\n\r\n### Common Issues\r\n\r\n1. **\"Ollama model not found\" error**:\r\n\r\n   - Make sure Ollama is running\r\n   - Pull the required model: `ollama pull deepseek-r1:1.5b`\r\n\r\n2. **Import errors**:\r\n\r\n   - Ensure all packages are installed in your virtual environment\r\n   - Check that you're using the correct Python version\r\n\r\n3. **PDF processing issues**:\r\n   - Ensure PDFs are not password-protected\r\n   - Check that PDFs contain extractable text\r\n\r\n### Performance Tips\r\n\r\n- For large PDFs, processing may take some time\r\n- The first question after processing might be slower as the model loads\r\n- Consider using smaller chunk sizes for faster processing\r\n\r\n## Contributing\r\n\r\nFeel free to submit issues, feature requests, or pull requests to improve this application.\r\n\r\n## License\r\n\r\nThis project is open source and available under the MIT License.\r\n\r\n## Acknowledgments\r\n\r\n- Built with [Streamlit](https://streamlit.io/)\r\n- Powered by [Ollama](https://ollama.ai/)\r\n- Uses [LangChain](https://langchain.com/) for AI workflows\r\n- Embeddings provided by [Sentence Transformers](https://www.sbert.net/)\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnigdhasv%2Fpdf_chat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnigdhasv%2Fpdf_chat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnigdhasv%2Fpdf_chat/lists"}