{"id":34636756,"url":"https://github.com/0xarchit/chatdoc","last_synced_at":"2026-04-11T23:40:02.488Z","repository":{"id":311615362,"uuid":"1042531862","full_name":"0xarchit/ChatDoc","owner":"0xarchit","description":"Upload any document and start intelligent conversations. Get instant answers, summaries, and insights from your files using advanced RAG technology.","archived":false,"fork":false,"pushed_at":"2025-09-21T14:05:35.000Z","size":8028,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-21T16:10:24.812Z","etag":null,"topics":["chatdoc","chatdocument","rag","rag-chatbot"],"latest_commit_sha":null,"homepage":"https://chatdoc.0xarchit.is-a.dev","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/0xarchit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":".github/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-22T06:58:46.000Z","updated_at":"2025-09-21T14:05:38.000Z","dependencies_parsed_at":"2025-08-25T15:38:33.533Z","dependency_job_id":"69ea3108-72f7-4bc9-bf53-ab6e7f5e5c8c","html_url":"https://github.com/0xarchit/ChatDoc","commit_stats":null,"previous_names":["0xarchit/chatdoc"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/0xarchit/ChatDoc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2FChatDoc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2FChatDoc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2FChatDoc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2FChatDoc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/0xarchit","download_url":"https://codeload.github.com/0xarchit/ChatDoc/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xarchit%2FChatDoc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28005408,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-24T02:00:07.193Z","response_time":83,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatdoc","chatdocument","rag","rag-chatbot"],"created_at":"2025-12-24T17:02:33.029Z","updated_at":"2025-12-24T17:04:00.430Z","avatar_url":"https://github.com/0xarchit.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ChatDoc\n[![GitHub stars](https://img.shields.io/github/stars/0xarchit/chatdoc?style=social)](https://github.com/0xarchit/chatdoc/stargazers)\n![GitHub issues](https://img.shields.io/github/issues/0xarchit/chatdoc)\n[![Repo Size](https://img.shields.io/github/repo-size/0xarchit/ChatDoc?style=flat-square)](https://github.com/0xarchit/ChatDoc)\n\n\u003e A unified retrieval-augmented generation (RAG) document API and web interface, powered by FastAPI, React, Vite, Milvus, and MistralAI.\n\n## Table of Contents\n1. [Overview](#overview)\n2. [Architecture](#architecture)\n3. [Features](#features)\n4. [Tech Stack](#tech-stack)\n5. [Getting Started](#getting-started)\n   - [Prerequisites](#prerequisites)\n   - [Backend Setup](#backend-setup)\n   - [Frontend Setup](#frontend-setup)\n   - [Docker (Optional)](#docker-optional)\n6. [API Reference](#api-reference)\n7. [Architecture Diagram](#architecture-diagram)\n8. [Future Goals](#future-goals)\n9. [Contributing](#contributing)\n10. [License](#license)\n\n## Overview\nChatDoc is a web application enabling users to upload documents, extract and chunk text, store embeddings in Milvus, and query with state-of-the-art LLMs. It provides both a REST API and a web-based interface for seamless integration.\n\n## Preview\n`Landing Page`![Landing Page](assets/Landing.png)  \n`Dashboard`![Dashboard](assets/Dashboard.png)\n\n\u003e Complete Working Video: [chatdoc.mkv](assets/chatdoc.mkv)\n\n## Architecture\n```mermaid\nflowchart TB\n  subgraph Frontend\n    UI[React \u0026 Vite] --\u003e|REST API| API(FastAPI)\n  end\n  subgraph Backend\n    API --\u003e Extract[Text Extraction]\n    Extract --\u003e Chunk[Text Chunking]\n    Chunk --\u003e Embed[MistralAI Embedding]\n    Embed --\u003e Store[Milvus Vector Store]\n    API --\u003e Retrieve[Retrieval]\n    Retrieve --\u003e LLM[ChatOpenAI]\n    LLM --\u003e Store\n  end\n  Store -.-\u003e|Query Results| API\n```\n\n## Features\n- Upload PDF, TXT, CSV, XLSX, PPTX, DOCX files via API or web form\n- Automatic text extraction and chunking (500 tokens, 50 overlap)\n- Embedding with MistralAI Embeddings \u0026 storage in Milvus (Zilliz)\n- Retrieval and response generation via OpenAI-compatible LLM\n- Real-time, responsive React UI with upload, history, and settings\n- Per-request overrides for API keys, endpoints, and collections\n- Admin endpoints for deleting uploads or clearing the vector store\n\n## Tech Stack\n- **Backend**: FastAPI, Python, PyPDF2, python-pptx, python-docx, Pandas, Milvus\n- **Frontend**: React, Vite, TypeScript, Tailwind CSS\n- **Embeddings**: MistralAI\n- **Vector Database**: Milvus (Zilliz Cloud)\n- **LLM**: OpenAI-compatible ChatOpenAI via LangChain\n\n## Getting Started\n\n### Prerequisites\n- Node.js \u003e= 16 and npm/yarn\n- Python \u003e= 3.9\n- Docker (optional)\n- Milvus or Zilliz Cloud credentials\n- MistralAI \u0026 OpenAI API keys\n\n### Backend Setup\n```powershell\ncd Backend\ncopy .env.example .env\n# Edit .env and set:\n# MISTRAL_API_KEY, ZILLIZ_URI, ZILLIZ_TOKEN, HF_TOKEN (optional), COLLECTION_NAME\npip install -r requirements.txt\nuvicorn main:app --reload\n```\n\n### Frontend Setup\n```powershell\ncd Frontend\nnpm install\nnpm run dev\n```\n\n### Docker (Optional)\n```powershell\n# Build and run backend container\ndocker build -t chatdocapi-backend .\ndocker run --rm -p 8080:8080 \\\n  -e MISTRAL_API_KEY=$env:MISTRAL_API_KEY \\\n  -e ZILLIZ_URI=$env:ZILLIZ_URI \\\n  -e ZILLIZ_TOKEN=$env:ZILLIZ_TOKEN \\\n  -e ZILLIZ_COLLECTION_NAME=$env:ZILLIZ_COLLECTION_NAME \\\n  chatdocapi-backend\n```\n\n## API Reference\n\n### 1) POST /upload\n- **Description**: Upload document and store embeddings.\n- **Content-Type**: multipart/form-data\n- **Fields**:\n  - `file` (required)\n  - `mistral_api_key`, `zilliz_uri`, `zilliz_token`, `collection_name` (optional)\n- **Responses**:\n  - `200`: `{ \"upload_id\": \"\u003cuuid\u003e\" }`\n  - `400`: errors (no file, extraction failure)\n  - `413`: file too large\n\n### 2) POST /query\n- **Description**: Retrieve and answer based on stored chunks.\n- **Content-Type**: application/json\n- **Body**:\n  ```json\n  {\n    \"question\": \"string\",\n    \"upload_id\": \"string\",\n    ...overrides\n  }\n  ```\n- **Responses**:\n  - `200`: `{ \"answer\": \"\u003cgenerated answer\u003e\" }`\n  - `400`: invalid body\n  - `500`: generation error\n\n### 3) DELETE /delete/{upload_id}\n- **Description**: Remove all vectors for a given upload.\n- **Params**: `upload_id` path, overrides as query params\n- **Response**: `{ \"status\": \"deleted\" }`\n\n### 4) GET /deleteall\n- **Description**: Clear entire vector store.\n- **Query**: `password` (native admin) or per-request overrides\n- **Response**: `{ \"status\": \"all_deleted\" }`\n\n## Future Goals\n- Streaming responses from the model to improve perceived latency and UX.\n- Better OCR and robust file parsing for scanned PDFs and more file formats.\n- Pluggable support for multiple vector stores (Milvus, FAISS, Pinecone, etc.).\n- Increase upload and context limits (larger files, fewer artificial word/chunk restrictions).\n- Personalization with login/signup, per-user profiles, metadata, and tags.\n- Expand supported AI models/providers and allow per-request model selection.\n\n\n\n## Contributing\nContributions and suggestions welcome — if you'd like to see something prioritized, open an issue or a discussion.\n\u003e Contributions are welcome! Please fork the repository, create a feature branch, and submit a pull request.\n1. Fork it\n2. Create your feature branch (`git checkout -b feature/fooBar`)\n3. Commit your changes (`git commit -am 'Add some fooBar'`)\n4. Push to the branch (`git push origin feature/fooBar`)\n5. Open a Pull Request\n\nFor major changes, please open an issue first to discuss what you would like to change.\n\n## License\nThis project is licensed under the MIT License. See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xarchit%2Fchatdoc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F0xarchit%2Fchatdoc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xarchit%2Fchatdoc/lists"}