{"id":25531702,"url":"https://github.com/aghoshpro/chatdocument","last_synced_at":"2026-04-11T11:37:39.262Z","repository":{"id":276847478,"uuid":"929949562","full_name":"aghoshpro/ChatDocument","owner":"aghoshpro","description":"Chat with any document Text files (.txt), PDF files (.pdf), Word documents (.docx), Word documents (.doc), JSON files (.json), GeoJSON files (.geojson) using RAG","archived":false,"fork":false,"pushed_at":"2025-02-17T00:42:58.000Z","size":3847,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-17T01:27:00.378Z","etag":null,"topics":["chatbot","chroma","embeddings","langchain","llms","ollam","rag","retrieval-augmented-generation","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aghoshpro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-09T19:10:41.000Z","updated_at":"2025-02-17T00:43:01.000Z","dependencies_parsed_at":"2025-02-10T19:39:59.762Z","dependency_job_id":"a96e08a6-446e-434e-9473-c5ee54971417","html_url":"https://github.com/aghoshpro/ChatDocument","commit_stats":null,"previous_names":["aghoshpro/chatdocumentrag","aghoshpro/chatdocument"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aghoshpro%2FChatDocument","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aghoshpro%2FChatDocument/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aghoshpro%2FChatDocument/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aghoshpro%2FChatDocument/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aghoshpro","download_url":"https://codeload.github.com/aghoshpro/ChatDocument/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239758890,"owners_count":19692041,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","chroma","embeddings","langchain","llms","ollam","rag","retrieval-augmented-generation","vector-database"],"created_at":"2025-02-20T01:19:34.725Z","updated_at":"2026-04-11T11:37:39.222Z","avatar_url":"https://github.com/aghoshpro.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ChatDocument\n\n\u003cimg src=\"./assets/chatRAG.gif\" alt=\"Streamlit Web App\" width=\"100%\"\u003e\n\nRetrieval Augmented Generation (RAG) application that allows you to chat with any of your local documents in disparate formats e.g., `.txt`,`.pdf`, `.md`, `.docx`, `.doc`, `.json`,`.geojson` using Ollama LLMs and LangChain. Upload your document in the Streamlit Web UI for Q\u0026A interaction. Have fun\n\n## 📂 Project Structure\n\n```\n├── .streamlit/\n│   └── config.toml       # Streamlit configuration (OPTIONAL)\n├── assets/\n│   └── ui.png            # Streamlit UI image\n├── components/\n│   ├── __init__.py\n│   ├── chat.py           # Chat interface implementation\n│   └── upload.py         # Document upload handling\n├── core/\n│   ├── __init__.py\n│   ├── embeddings.py     # Vector embeddings configuration\n│   └── llm.py            # Language model setup\n├── data/\n│   ├── vector_store/     # To store vector embeddings in chromadb\n│   └── sample_docs/      # Sample documents for testing\n├── utils/\n│   ├── __init__.py\n│   └── helpers.py        # Utility functions\n└── main.py               # Application entry point\n```\n\n## 📚 RAG Architecture\n\n\u003cimg src=\"./assets/RagArch.svg\" alt=\"RAG Architecture\" width=\"100%\"\u003e\n\u003cimg align=\"center\" src=\"./assets/rag.png\" alt=\"Streamlit Web App\" width=\"50%\"\u003e\n\n## ✨ Features\n\n- 📄 Multi document (`.txt`, `.pdf`, `.md`, `.docx`, `.doc`, `.json`) processing with intelligent chunking\n- 🧠 Multi-query retrieval for better context understanding\n- 🎯 Advanced RAG implementation using LangChain and Ollama\n- 🔒 Complete local data processing - no data leaves your machine\n- 📓 Jupyter notebook for experimentation\n- 🖥️ Clean Streamlit UI\n\n## 🚀 Getting Started\n\n### 1. **Install Ollama**\n\n- Visit [Ollama.ai](https://ollama.com) to download Ollama and install\n\n- Open `cmd` or `terminal` and run `ollama`\n\n- Install LLM models (locally):\n\n- Start with `ollama pull llama3.2` as it's low sized (4GB) basic llm model tailored for general usecases\n\n- For vector embeddings pull the following,\n\n  ```bash\n  ollama pull mxbai-embed-large # or `nomic-embed-text`\n  ```\n\n- Chat with the model in `terminal`,\n\n  ```bash\n  ollama run llama3.2   # or your preferred model\n  ```\n\n- Go to [Ollama Models](https://ollama.com/search) to search and pull other famous models as follows,\n\n  ```bash\n  ollama pull dolphin3\n  ollama pull deepseek-r1:8b\n  ollama pull mistral\n  ```\n\n- Check the list of locally available ollama models:\n\n  ```bash\n  ollama list\n  ```\n\n### 2. **Clone Repository**\n\n- Open `cmd` or `terminal` and navigate to your preferred directory, then run the following,\n\n  ```bash\n  git clone https://github.com/aghoshpro/ChatDocument.git\n  ```\n\n- Go to the ChatDocument folder using `cd ChatDocument`\n\n### 3. **Set Up Local Environment**\n\n- Create a virtual environment `myvenv` inside the `./ChatDocument` folder and activate it:\n\n  ```bash\n  python -m venv myvenv\n  ```\n\n  ```bash\n  # Windows\n  .\\myvenv\\Scripts\\activate    # OR source myvenv/bin/activate (in Linux or Mac)\n  ```\n\n- Install dependencies:\n\n  ```bash\n  pip install --upgrade -r requirements.txt\n  ```\n  \n- 🧪 Experiment with code in `*.ipynb`\n\n  ```sh\n  jupyter notebook\n  ```\n\n## 🕹️ Run\n\n```bash\nstreamlit run main.py\n```\n\n- Select `llama3.2` as the model and start chatting.\n\n- Content View\n  \u003cimg src=\"./assets/ui.png\" alt=\"Streamlit Web App\" width=\"100%\"\u003e\n\n- WordCloud View:\n  \u003cimg src=\"./assets/ui2.png\" alt=\"Streamlit Web App\" width=\"100%\"\u003e\n\n## 🛠 Troubleshooting\n\n- Ensure Ollama is running in the background\n- GPU preferred for good performance if not CPU (will be slower)\n- `./data/sample_docs` contains few sample documents for you to test\n- Use `pip list` or `pip freeze` to check currently installed packages\n\u003c!-- - Delete `./data/vector_store/` that holds embeddings in case delete file option failed to delete docs. --\u003e\n\n## ✨Theme Configuration\n\n- Edit `.streamlit/config.toml` for your color preferences\n\n  ```toml\n  [theme]\n  primaryColor = \"#FF4B4B\"\n  backgroundColor = \"#0E1117\"\n  secondaryBackgroundColor = \"#262730\"\n  textColor = \"#FAFAFA\"\n  font = \"sans serif\"\n  ```\n\n## 🤝 Contributing\n\n- Open issues for bugs or suggestions\n- Submit pull requests\n\n## 📑 References\n\n### Docs\n\n- [LangChain](https://python.langchain.com/docs/index.html)\n- [Ollama](https://ollama.com/docs/index.html)\n- [ChromaDB](https://www.trychroma.com/)\n- [Streamlit](https://docs.streamlit.io/)\n- [Folium](https://python-visualization.github.io/folium/)\n- [Unstructured](https://docs.unstructured.io/platform/supported-file-types)\n- [ChromaDB Tutorial Step by Step Guide](https://www.datacamp.com/tutorial/chromadb-tutorial-step-by-step-guide)\n- [ChromaDB Collections](https://docs.trychroma.com/docs/collections/create-get-delete)\n\n### Blogs\n\n- [Finding the Best Open Source Embedding Model for RAG](https://medium.com/timescale/finding-the-best-open-source-embedding-model-for-rag-929d1656d331)\n- [Enhancing Retrieval Augmented Generation with ChromaDB and SQLite](https://medium.com/@dassandipan9080/enhancing-retrieval-augmented-generation-with-chromadb-and-sqlite-c499109f8082)\n- [Implementing RAG in LangChain with Chroma](https://medium.com/@callumjmac/implementing-rag-in-langchain-with-chroma-a-step-by-step-guide-16fc21815339)\n- [Build Your Own RAG and Run Them Locally](https://blog.duy.dev/build-your-own-rag-and-run-them-locally/)\n\n### Stack Overflow\n\n- [Langchain Ollama Module Difference](https://stackoverflow.com/questions/78921530/langchain-ollama-module-difference)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faghoshpro%2Fchatdocument","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faghoshpro%2Fchatdocument","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faghoshpro%2Fchatdocument/lists"}