{"id":32873128,"url":"https://github.com/onamfc/rag-chat","last_synced_at":"2026-04-13T00:06:56.670Z","repository":{"id":323069387,"uuid":"1091863055","full_name":"onamfc/rag-chat","owner":"onamfc","description":"A production-ready RAG (Retrieval Augmented Generation) system for chatting with your documents","archived":false,"fork":false,"pushed_at":"2025-11-07T22:35:28.000Z","size":150,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-07T23:28:02.763Z","etag":null,"topics":["ai","chatbot","fastapi","llamaindex","local-first","mcp","rag","retrieval-augmented-generation","streamlit"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/onamfc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-07T16:22:49.000Z","updated_at":"2025-11-07T22:35:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/onamfc/rag-chat","commit_stats":null,"previous_names":["onamfc/rag-chat"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/onamfc/rag-chat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/onamfc%2Frag-chat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/onamfc%2Frag-chat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/onamfc%2Frag-chat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/onamfc%2Frag-chat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/onamfc","download_url":"https://codeload.github.com/onamfc/rag-chat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/onamfc%2Frag-chat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":283517792,"owners_count":26849048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-09T02:00:05.828Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chatbot","fastapi","llamaindex","local-first","mcp","rag","retrieval-augmented-generation","streamlit"],"created_at":"2025-11-09T14:00:38.577Z","updated_at":"2025-11-09T14:02:04.435Z","avatar_url":"https://github.com/onamfc.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Xantus - Private RAG Chat System with MCP Integration\n\n\u003cdiv align=\"center\"\u003e\n\n**A production-ready RAG (Retrieval Augmented Generation) system for chatting with your documents**\n\n*Built with privacy in mind • Extensible via MCP • Multi-provider AI support*\n\n[Features](#features) • [Quick Start](#quick-start) • [Architecture](#architecture) • [MCP Integration](#mcp-integration) • [Configuration](#configuration) • [API Reference](#api-reference)\n\n  ![Screenshot Description](docs/images/xantus-rag-chat--screenshot.png)\n\u003c/div\u003e\n\n---\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Features](#features)\n- [Architecture](#architecture)\n- [Quick Start](#quick-start)\n  - [Prerequisites](#prerequisites)\n  - [Installation](#installation)\n  - [First Run](#first-run)\n- [MCP Integration](#mcp-integration)\n- [Configuration](#configuration)\n  - [Environment Variables](#environment-variables)\n  - [Provider Setup](#provider-setup)\n  - [RAG Tuning](#rag-tuning)\n- [Usage](#usage)\n  - [Streamlit UI](#streamlit-ui)\n  - [API Endpoints](#api-endpoints)\n  - [Python Client](#python-client)\n- [Development](#development)\n- [Troubleshooting](#troubleshooting)\n- [FAQ](#faq)\n- [License](#license)\n\n---\n\n## Overview\n\nXantus is a **privacy-first** RAG system that lets you chat with your documents using AI. Unlike cloud-only solutions, Xantus can run **completely locally** or use cloud providers - your choice.\n\n### What Makes Xantus Different?\n\n- **Privacy-First**: All data stays on your system with local AI\n- **Extensible**: MCP (Model Context Protocol) integration for external tools\n- **Multiple UIs**: Streamlit interface + OpenAI-compatible API\n- **Multi-Provider**: Supports Ollama, OpenAI, Anthropic, and more\n- **Modular**: Swap LLMs, embeddings, vector stores easily\n- **Production-Ready**: Dependency injection, proper error handling, logging\n\n---\n\n## Features\n\n### Core Features\n\n- **Document Chat**: Upload PDFs, DOCX, TXT, Markdown and chat with them\n- **Semantic Search**: RAG-powered retrieval with ChromaDB or Qdrant\n- **Multiple Interfaces**:\n  - Clean Streamlit UI for end users\n  - RESTful API for integration\n  - Python SDK for developers\n- **Flexible AI Backends**:\n  - **Local**: Ollama (privacy-first)\n  - **Cloud**: OpenAI, Anthropic\n  - **Hybrid**: Cloud LLM + local embeddings\n\n### Advanced Features\n\n- **MCP Integration**: Connect external tools (calculator, file system, databases)\n- **⚙Configurable**: YAML + environment variables\n- **Multiple Vector Stores**: ChromaDB, Qdrant\n- **RAG Tuning**: Adjust chunk size, overlap, top-k retrieval\n- **Secure**: API key management via environment variables\n- **Scalable**: Async API with proper dependency injection\n\n---\n\n## Architecture\n\nXantus is built on a modern, modular architecture:\n\n```\n┌─────────────────────────────────────────────────────────┐\n│                        User                             │\n└────────────┬────────────────────────────┬───────────────┘\n             │                            │\n    ┌────────▼────────┐          ┌────────▼─────────┐\n    │  Streamlit UI   │          │   API Clients    │\n    │  (Port 8501)    │          │   (curl, SDK)    │\n    └────────┬────────┘          └────────┬─────────┘\n             │                            │\n             └────────────┬───────────────┘\n                          │\n                 ┌────────▼─────────┐\n                 │   FastAPI Server │\n                 │   (Port 8000)    │\n                 └────────┬─────────┘\n                          │\n          ┌───────────────┼───────────────┐\n          │               │               │\n   ┌──────▼──────┐ ┌──────▼───────┐ ┌────▼─────┐\n   │ Chat Service│ │Ingest Service│ │   MCP    │\n   └──────┬──────┘ └──────┬───────┘ │ Service  │\n          │               │         └────┬─────┘\n          │               │              │\n   ┌──────▼───────────────▼──────────────▼─────┐\n   │        Dependency Injection Container     │\n   │  (LLM • Embeddings • Vector Store • MCP)  │\n   └────────────────────┬──────────────────────┘\n                        │\n        ┌───────────────┼───────────────┐\n        │               │               │\n   ┌────▼────┐    ┌─────▼─────┐   ┌───▼────┐\n   │   LLM   │    │ Embeddings│   │ Vector │\n   │Provider │    │  Provider │   │ Store  │\n   └─────────┘    └───────────┘   └────────┘\n   │Ollama   │    │HuggingFace│   │Chroma  │\n   │OpenAI   │    │  Ollama   │   │Qdrant  │\n   │Anthropic│    │  OpenAI   │   └────────┘\n   └─────────┘    └───────────┘\n                                   ┌──────────┐\n                                   │MCP Server│\n                                   │TypeScript│\n                                   └──────────┘\n                                   │Calculator│\n                                   │FileSystem│\n                                   │TextProc  │\n                                   └──────────┘\n```\n\n### Technology Stack\n\n| Component | Technology | Purpose |\n|-----------|-----------|---------|\n| **Backend** | FastAPI + Python 3.10+ | High-performance async API |\n| **RAG Framework** | LlamaIndex | Document indexing \u0026 retrieval |\n| **UI** | Streamlit | User-friendly chat interface |\n| **Configuration** | Pydantic + YAML | Type-safe settings |\n| **DI** | Injector | Clean dependency injection |\n| **Vector DB** | ChromaDB / Qdrant | Semantic search |\n| **MCP** | Model Context Protocol | External tool integration |\n\n### Project Structure\n\n```\nxantus/\n├── .env.example                  # Environment variable template\n├── .gitignore                    # Git ignore patterns\n├── config.yaml                   # Main configuration file\n├── requirements.txt              # Python dependencies\n├── setup_mcp.sh                 # MCP setup automation\n├── start_api.sh                 # API server startup script\n├── start_ui.sh                  # UI startup script\n│\n├── xantus/                       # Main application package\n│   ├── __init__.py\n│   ├── main.py                  # FastAPI application entry\n│   ├── container.py             # Dependency injection setup\n│   │\n│   ├── api/                     # API endpoints\n│   │   ├── chat_router.py       # /v1/chat/completions\n│   │   ├── ingest_router.py     # /v1/ingest/*\n│   │   └── embeddings_router.py # /v1/embeddings\n│   │\n│   ├── services/                # Business logic\n│   │   ├── chat_service.py      # RAG-powered chat\n│   │   ├── ingest_service.py    # Document processing\n│   │   └── mcp_service.py       # MCP tool orchestration\n│   │\n│   ├── components/              # Component factories\n│   │   ├── llm/\n│   │   │   └── llm_factory.py   # LLM provider factory\n│   │   ├── embeddings/\n│   │   │   └── embedding_factory.py\n│   │   └── vector_store/\n│   │       └── vector_store_factory.py\n│   │\n│   ├── models/                  # Data models\n│   │   └── schemas.py           # Pydantic request/response models\n│   │\n│   └── config/                  # Configuration\n│       └── settings.py          # Settings management with Pydantic\n│\n├── ui/                          # User interface\n│   └── streamlit_app.py         # Streamlit chat application\n│\n├── mcp-servers/                 # MCP integration (git submodules)\n│   └── mcp-starter-template-ts/ # TypeScript MCP server\n│       ├── dist/                # Compiled JavaScript\n│       │   └── start.js         # Entry point\n│       └── src/                 # TypeScript source\n│           └── tools/           # Tool implementations\n│\n├── data/                        # Data directory (gitignored)\n│   └── vector_store/            # Persisted vector embeddings\n│\n└── docs/                        # Documentation\n    ├── MCP_INTEGRATION.md       # MCP technical guide\n    ├── README_MCP.md            # MCP quick start\n    └── SETUP_COMPLETE.md        # Setup summary\n```\n\n---\n\n## Quick Start\n\n### Prerequisites\n\n- **Python 3.10+** (Check: `python --version`)\n- **Node.js 18+** (For MCP integration, check: `node --version`)\n- **Git** (For cloning submodules)\n- **(Optional) Ollama** (For local AI)\n\n### Installation\n\n#### Step 1: Clone the Repository\n\n```bash\n# Clone with MCP submodules\ngit clone --recurse-submodules https://github.com/onamfc/rag-chat\ncd xantus\n\n# OR if you already cloned without submodules:\ngit submodule update --init --recursive\n```\n\n#### Step 2: Create Virtual Environment\n\n```bash\n# Create virtual environment\npython -m venv venv\n\n# Activate it\nsource venv/bin/activate  # Linux/Mac\n# OR\nvenv\\Scripts\\activate     # Windows\n```\n\n#### Step 3: Install Python Dependencies\n\n```bash\npip install -r requirements.txt\n```\n\n#### Step 4: Setup MCP (Optional but Recommended)\n\n```bash\n# This will:\n# - Initialize MCP submodules\n# - Install npm dependencies\n# - Build TypeScript MCP server\n./setup_mcp.sh\n```\n\n#### Step 5: Configure Environment Variables\n\n```bash\n# Copy the example file\ncp .env.example .env\n\n# Edit .env and add your API keys (if using cloud providers)\n# For Anthropic:\nXANTUS_LLM__API_KEY=sk-ant-api03-your-key-here\n\n# For OpenAI:\n# XANTUS_LLM__API_KEY=sk-your-openai-key-here\n```\n\n#### Step 6: Configure Settings\n\nEdit `config.yaml` to choose your providers:\n\n**Option A: Completely Local (Privacy-First)**\n```yaml\nllm:\n  provider: ollama\n  model: llama3.2\n\nembedding:\n  provider: huggingface\n  model: BAAI/bge-small-en-v1.5\n\nmcp:\n  enabled: true  # Enable MCP tools\n```\n\n**Option B: Cloud-Powered (Anthropic)**\n```yaml\nllm:\n  provider: anthropic\n  model: claude-sonnet-4-20250514\n  api_key: null  # Read from .env\n\nembedding:\n  provider: huggingface  # Keep embeddings local\n  model: BAAI/bge-small-en-v1.5\n\nmcp:\n  enabled: true\n```\n\n**Option C: OpenAI**\n```yaml\nllm:\n  provider: openai\n  model: gpt-4\n  api_key: null  # Read from .env\n\nembedding:\n  provider: openai\n  model: text-embedding-3-small\n  api_key: null\n```\n\n### First Run\n\n#### Start the API Server\n\n```bash\n# Option 1: Using the startup script\n./start_api.sh\n\n# Option 2: Manual start\npython -m xantus.main\n\n# The API will be available at http://localhost:8000\n# API docs at http://localhost:8000/docs\n```\n\nYou should see:\n```\nINFO - Starting Xantus application...\nINFO - Loaded settings with LLM provider: anthropic\nINFO - Dependency injection container initialized\nINFO - Starting server on 127.0.0.1:8000\n```\n\nWith MCP enabled, you'll also see:\n```\nINFO - Starting MCP server 'mcp-starter-template': node mcp-servers/...\nINFO - Loaded 4 tools from 'mcp-starter-template': ['calculate', 'filesystem', 'text-processing', 'weather']\n```\n\n#### Start the UI (In a New Terminal)\n\n```bash\n# Activate venv again\nsource venv/bin/activate\n\n# Start Streamlit\nstreamlit run ui/streamlit_app.py\n\n# The UI will open in your browser at http://localhost:8501\n```\n\n#### Upload a Document and Chat!\n\n1. Click **\"Upload Document\"** in the sidebar\n2. Select a PDF, TXT, DOCX, or Markdown file\n3. Wait for processing (you'll see the progress)\n4. Ask questions about your document!\n\n**Example Questions:**\n- \"What is the main topic of this document?\"\n- \"Summarize the key points\"\n- \"Calculate the total revenue mentioned in section 3\" (uses MCP calculator)\n- \"Compare this with the file in ../reports/2023.pdf\" (uses MCP filesystem)\n\n---\n\n## MCP Integration\n\nMCP (Model Context Protocol) allows Claude to use external tools while answering questions.\n\n### What Tools Are Available?\n\nYour TypeScript MCP server (in `mcp-servers/mcp-starter-template-ts/`) provides:\n\n| Tool | Function | Example Use |\n|------|----------|-------------|\n| **Calculator** | Mathematical operations | \"Calculate the sum of Q1-Q4 revenues\" |\n| **File System** | Read/write/list files | \"Compare with last year's report in ../reports/\" |\n| **Text Processing** | Word count, sentiment, case conversion | \"Analyze sentiment of customer feedback\" |\n| **Weather** | Weather data (mock) | \"Check weather for event planning\" |\n\n### MCP Architecture\n\n```\nUser Question\n     ↓\nXantus retrieves document context (RAG)\n     ↓\nSends to Claude with available MCP tools\n     ↓\nClaude decides to use a tool (e.g., calculator)\n     ↓\nXantus forwards tool call to MCP server (TypeScript)\n     ↓\nMCP server executes tool and returns result\n     ↓\nClaude incorporates result into answer\n     ↓\nUser gets comprehensive response\n```\n\n### Enabling/Disabling MCP\n\nIn `config.yaml`:\n\n```yaml\nmcp:\n  enabled: true  # Set to false to disable MCP\n\n  servers:\n    - name: \"mcp-starter-template\"\n      command: \"node\"\n      args: [\"mcp-servers/mcp-starter-template-ts/dist/start.js\"]\n```\n\n### Adding More MCP Servers\n\nYou can connect multiple MCP servers:\n\n```yaml\nmcp:\n  enabled: true\n  servers:\n    # Your custom tools\n    - name: \"my-tools\"\n      command: \"node\"\n      args: [\"mcp-servers/mcp-starter-template-ts/dist/start.js\"]\n\n    # Database access\n    - name: \"postgres\"\n      command: \"npx\"\n      args: [\"-y\", \"@modelcontextprotocol/server-postgres\", \"postgresql://localhost/mydb\"]\n\n    # Web search\n    - name: \"brave-search\"\n      command: \"npx\"\n      args: [\"-y\", \"@modelcontextprotocol/server-brave-search\"]\n```\n\n### MCP Documentation\n\nFor complete MCP setup and customization:\n- **Quick Start**: [`README_MCP.md`](./README_MCP.md)\n\n---\n\n## Configuration\n\n### Environment Variables\n\nCreate a `.env` file in the project root:\n\n```bash\n# ===== LLM API Keys =====\n# For Anthropic (double underscore for nested config!)\nXANTUS_LLM__API_KEY=sk-ant-api03-your-key-here\n\n# For OpenAI\n# XANTUS_LLM__API_KEY=sk-your-openai-key-here\n\n# ===== Embedding API Keys (optional) =====\n# XANTUS_EMBEDDING__API_KEY=sk-your-key-here\n\n# ===== Override Other Settings =====\n# Format: XANTUS_\u003cSECTION\u003e__\u003cKEY\u003e=value\n# Examples:\n# XANTUS_LLM__TEMPERATURE=0.5\n# XANTUS_RAG__SIMILARITY_TOP_K=10\n# XANTUS_SERVER__PORT=8001\n```\n\n**Important**: Use **double underscore** (`__`) for nested configuration!\n\n### Provider Setup\n\n#### Local Setup with Ollama\n\n1. **Install Ollama**: https://ollama.com/download\n\n2. **Start Ollama**:\n   ```bash\n   ollama serve\n   ```\n\n3. **Pull Models**:\n   ```bash\n   ollama pull llama3.2        # For chat\n   ollama pull nomic-embed-text # For embeddings\n   ```\n\n4. **Configure** `config.yaml`:\n   ```yaml\n   llm:\n     provider: ollama\n     model: llama3.2\n     api_base: http://localhost:11434  # Default\n\n   embedding:\n     provider: ollama\n     model: nomic-embed-text\n   ```\n\n#### Anthropic Setup\n\n1. **Get API Key**: https://console.anthropic.com/\n\n2. **Add to `.env`**:\n   ```bash\n   XANTUS_LLM__API_KEY=sk-ant-api03-your-key-here\n   ```\n\n3. **Configure** `config.yaml`:\n   ```yaml\n   llm:\n     provider: anthropic\n     model: claude-sonnet-4-20250514\n     api_key: null  # Read from environment\n     temperature: 0.7\n     max_tokens: 4096\n\n   embedding:\n     provider: huggingface  # Use local for cost savings\n     model: BAAI/bge-small-en-v1.5\n   ```\n\n#### OpenAI Setup\n\n1. **Get API Key**: https://platform.openai.com/api-keys\n\n2. **Add to `.env`**:\n   ```bash\n   XANTUS_LLM__API_KEY=sk-your-openai-key-here\n   ```\n\n3. **Configure** `config.yaml`:\n   ```yaml\n   llm:\n     provider: openai\n     model: gpt-4-turbo-preview\n     api_key: null\n\n   embedding:\n     provider: openai\n     model: text-embedding-3-small\n     api_key: null\n   ```\n\n### RAG Tuning\n\nFine-tune retrieval in `config.yaml`:\n\n```yaml\nrag:\n  # Number of relevant chunks to retrieve\n  similarity_top_k: 5\n\n  # Size of text chunks (characters)\n  chunk_size: 1024\n\n  # Overlap between chunks (prevents context loss)\n  chunk_overlap: 200\n\n  # Enable advanced reranking (requires additional setup)\n  enable_reranking: false\n```\n\n**Tuning Guidelines**:\n- **Larger chunks** (1024-2048): Better for long-form content\n- **Smaller chunks** (512-1024): Better for specific facts\n- **Higher top_k** (8-10): More context but slower\n- **Lower top_k** (3-5): Faster but may miss context\n- **Overlap**: 15-20% of chunk_size is recommended\n\n### Vector Store Configuration\n\n```yaml\nvector_store:\n  provider: chroma  # or qdrant\n\n  # Path to persist vector data\n  persist_path: ./data/vector_store\n\n  # Collection name\n  collection_name: xantus_documents\n```\n\n### Server Configuration\n\n```yaml\nserver:\n  host: 127.0.0.1  # Change to 0.0.0.0 for network access\n  port: 8000\n\n  # CORS settings\n  cors_enabled: true\n  cors_origins:\n    - \"*\"  # Be more restrictive in production!\n```\n\n---\n\n## Usage\n\n### Streamlit UI\n\nThe easiest way to use Xantus:\n\n1. **Start the API** (terminal 1):\n   ```bash\n   ./start_api.sh\n   ```\n\n2. **Start the UI** (terminal 2):\n   ```bash\n   ./start_ui.sh\n   # OR\n   streamlit run ui/streamlit_app.py\n   ```\n\n3. **Navigate to** http://localhost:8501\n\n4. **Upload documents** via the sidebar\n\n5. **Chat** with your documents!\n\n**Features**:\n- ✅ Document upload with progress\n- ✅ Document management (list/delete)\n- ✅ Chat history\n- ✅ Context toggle (use RAG or not)\n- ✅ Health monitoring\n\n### API Endpoints\n\n#### Health Check\n\n```bash\ncurl http://localhost:8000/health\n```\n\nResponse:\n```json\n{\n  \"status\": \"healthy\",\n  \"version\": \"0.1.0\",\n  \"components\": {\n    \"llm\": \"anthropic\",\n    \"embedding\": \"huggingface\",\n    \"vector_store\": \"chroma\"\n  }\n}\n```\n\n#### Chat Completion (with RAG)\n\n```bash\ncurl -X POST http://localhost:8000/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"What are the main findings in the report?\"}\n    ],\n    \"use_context\": true,\n    \"stream\": false\n  }'\n```\n\nResponse:\n```json\n{\n  \"id\": \"chat-123abc\",\n  \"object\": \"chat.completion\",\n  \"created\": 1730000000,\n  \"model\": \"claude-sonnet-4-20250514\",\n  \"choices\": [{\n    \"index\": 0,\n    \"message\": {\n      \"role\": \"assistant\",\n      \"content\": \"Based on the documents, the main findings are...\"\n    },\n    \"finish_reason\": \"stop\"\n  }]\n}\n```\n\n#### Upload Document\n\n```bash\ncurl -X POST http://localhost:8000/v1/ingest/file \\\n  -F \"file=@/path/to/document.pdf\"\n```\n\nResponse:\n```json\n{\n  \"status\": \"success\",\n  \"document_id\": \"doc_abc123\",\n  \"chunks_created\": 42\n}\n```\n\n#### List Documents\n\n```bash\ncurl http://localhost:8000/v1/ingest/documents\n```\n\n#### Delete Document\n\n```bash\ncurl -X DELETE http://localhost:8000/v1/ingest/documents/doc_abc123\n```\n\n#### Generate Embeddings\n\n```bash\ncurl -X POST http://localhost:8000/v1/embeddings \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"input\": \"Text to embed\", \"model\": \"default\"}'\n```\n\n### Python Client\n\n```python\nimport requests\n\n# Start a session\nsession = requests.Session()\napi_url = \"http://localhost:8000\"\n\n# Upload a document\nwith open(\"document.pdf\", \"rb\") as f:\n    response = session.post(\n        f\"{api_url}/v1/ingest/file\",\n        files={\"file\": f}\n    )\n    print(f\"Uploaded: {response.json()}\")\n\n# Chat with RAG\nresponse = session.post(\n    f\"{api_url}/v1/chat/completions\",\n    json={\n        \"messages\": [\n            {\"role\": \"user\", \"content\": \"Summarize the key points\"}\n        ],\n        \"use_context\": True,\n        \"stream\": False\n    }\n)\n\nresult = response.json()\nprint(result[\"choices\"][0][\"message\"][\"content\"])\n```\n\n---\n\n## Development\n\n### Project Philosophy\n\n1. **Privacy First**: Default to local, support cloud\n2. **Modularity**: Easy to swap any component\n3. **Simplicity**: Minimal abstractions\n4. **Type Safety**: Pydantic everywhere\n5. **Production Ready**: Proper DI, error handling, logging\n\n### Adding a New LLM Provider\n\n1. **Add to settings** (`xantus/config/settings.py`):\n   ```python\n   provider: Literal[\"ollama\", \"openai\", \"anthropic\", \"your-provider\"]\n   ```\n\n2. **Implement factory** (`xantus/components/llm/llm_factory.py`):\n   ```python\n   def _create_your_provider_llm(config: LLMConfig) -\u003e LLM:\n       return YourProviderLLM(\n           model=config.model,\n           api_key=config.api_key,\n           temperature=config.temperature\n       )\n   ```\n\n3. **Update factory dispatch**:\n   ```python\n   elif config.provider == \"your-provider\":\n       return _create_your_provider_llm(config)\n   ```\n\n### Adding a New Vector Store\n\nSimilar process in `xantus/components/vector_store/vector_store_factory.py`\n\n### Code Style\n\n```bash\n# Format code\nblack xantus/\n\n# Lint\nruff check xantus/\n\n# Type check\nmypy xantus/\n```\n\n### Testing\n\n```bash\n# Install test dependencies\npip install pytest pytest-asyncio\n\n# Run tests\npytest tests/\n```\n\n---\n\n## Troubleshooting\n\n### Common Issues\n\n#### 1. \"Cannot connect to Ollama\"\n\n**Solution**: Ensure Ollama is running\n```bash\nollama serve\n```\n\n#### 2. \"ValueError: Anthropic API key is required\"\n\n**Solution**: Check your `.env` file:\n```bash\n# Correct (double underscore!):\nXANTUS_LLM__API_KEY=sk-ant-...\n\n# Wrong (single underscore):\nXANTUS_LLM_API_KEY=sk-ant-...\n```\n\n#### 3. \"Import error: No module named 'xantus'\"\n\n**Solution**: Ensure you're in the right directory\n```bash\ncd xantus\npython -c \"import xantus; print('OK')\"\n```\n\n#### 4. \"MCP server not starting\"\n\n**Solution**: Build the MCP server\n```bash\n./setup_mcp.sh\n# OR manually:\ncd mcp-servers/mcp-starter-template-ts\nnpm install\nnpm run build\n```\n\n#### 5. \"Port 8000 already in use\"\n\n**Solution**: Kill existing processes or change port\n```bash\n# Kill existing\npkill -f \"python.*xantus\"\n\n# OR change port in config.yaml:\nserver:\n  port: 8001\n```\n\n#### 6. \"Vector store errors\"\n\n**Solution**: Clear and recreate\n```bash\nrm -rf data/vector_store\nmkdir -p data/vector_store\n# Restart server, re-upload documents\n```\n\n### Debug Mode\n\nEnable verbose logging:\n\n```python\n# In xantus/main.py\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n```\n\n---\n\n## FAQ\n\n**Q: Does my data leave my machine?**\nA: Only if you use cloud providers (OpenAI/Anthropic). With Ollama + HuggingFace, everything stays local.\n\n**Q: Which is faster - local or cloud?**\nA: Cloud (OpenAI/Anthropic) is usually faster. Local (Ollama) depends on your hardware.\n\n**Q: Can I use multiple documents?**\nA: Yes! Upload as many as you want. They're all indexed in the vector store.\n\n**Q: What's the maximum document size?**\nA: No hard limit, but larger documents take longer to process.\n\n**Q: Can I delete documents?**\nA: Yes, via the API `/v1/ingest/documents/{doc_id}` or Streamlit UI.\n\n**Q: Is streaming supported?**\nA: Yes! Set `\"stream\": true` in chat completion requests.\n\n**Q: What LLM is best?**\nA:\n- **Best quality**: Claude Sonnet 4, GPT-4\n- **Best local**: Llama 3.2, Mistral\n- **Best balance**: Claude Haiku, GPT-3.5-turbo\n\n**Q: How do I add authentication?**\nA: Add FastAPI middleware in `xantus/main.py` for API key or OAuth.\n\n---\n\n## Additional Resources\n\n- **MCP Quick Start**: [`README_MCP.md`](./README_MCP.md)\n- **API Documentation**: http://localhost:8000/docs (when running)\n- **LlamaIndex Docs**: https://docs.llamaindex.ai/\n- **FastAPI Docs**: https://fastapi.tiangolo.com/\n- **Streamlit Docs**: https://docs.streamlit.io/\n\n---\n\n## Contributing\n\nContributions are welcome! This project is designed to be:\n- Easy to understand\n- Simple to extend\n- Well-documented\n\nFeel free to:\n- Add new providers\n- Improve the UI\n- Enhance MCP tools\n- Fix bugs\n- Improve documentation\n\n---\n\n## License\n\nThis project is provided as-is for educational and research purposes.\n\n---\n\nBuilt with:\n- [FastAPI](https://fastapi.tiangolo.com/) - Modern async web framework\n- [LlamaIndex](https://www.llamaindex.ai/) - RAG framework\n- [Streamlit](https://streamlit.io/) - Data apps framework\n- [ChromaDB](https://www.trychroma.com/) - Vector database\n- [Ollama](https://ollama.com/) - Local LLM runtime\n- [Model Context Protocol](https://modelcontextprotocol.io/) - Tool integration\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n**Made with ❤️ for the open source community**\n\n[⬆ Back to Top](#xantus---private-rag-chat-system-with-mcp-integration)\n\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fonamfc%2Frag-chat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fonamfc%2Frag-chat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fonamfc%2Frag-chat/lists"}