{"id":50295466,"url":"https://github.com/ssenthilnathan3/dossier","last_synced_at":"2026-05-28T08:32:29.876Z","repository":{"id":305278496,"uuid":"1022458432","full_name":"ssenthilnathan3/dossier","owner":"ssenthilnathan3","description":"A production-ready, open-source Live RAG (Retrieval-Augmented Generation) system designed specifically for Frappe documents.","archived":false,"fork":false,"pushed_at":"2025-07-19T10:08:22.000Z","size":458,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-19T10:39:09.677Z","etag":null,"topics":["erpnext","fastapi","frappe","liverag","microservices","ollama","python","rag"],"latest_commit_sha":null,"homepage":"https://dossier-docs.vercel.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ssenthilnathan3.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-19T05:46:25.000Z","updated_at":"2025-07-19T10:08:25.000Z","dependencies_parsed_at":"2025-07-19T10:39:32.307Z","dependency_job_id":"9d03c7ba-a4a9-48e5-9dc8-9bd911a2ffe1","html_url":"https://github.com/ssenthilnathan3/dossier","commit_stats":null,"previous_names":["ssenthilnathan3/dossier"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ssenthilnathan3/dossier","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssenthilnathan3%2Fdossier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssenthilnathan3%2Fdossier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssenthilnathan3%2Fdossier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssenthilnathan3%2Fdossier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ssenthilnathan3","download_url":"https://codeload.github.com/ssenthilnathan3/dossier/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ssenthilnathan3%2Fdossier/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33601380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["erpnext","fastapi","frappe","liverag","microservices","ollama","python","rag"],"created_at":"2026-05-28T08:32:29.785Z","updated_at":"2026-05-28T08:32:29.867Z","avatar_url":"https://github.com/ssenthilnathan3.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dossier - Live RAG System for Frappe\n\nA production-ready, open-source Live RAG (Retrieval-Augmented Generation) system designed specifically for Frappe documents. Dossier provides real-time document ingestion, intelligent chunking, semantic search, and natural language Q\u0026A capabilities through a modern chat interface.\n\n## 🚀 Features\n\n- **Live Document Synchronization**: Real-time webhook processing for automatic document ingestion\n- **Intelligent Text Chunking**: Semantic-aware document splitting with configurable overlap\n- **Lightweight Vector Embeddings**: High-quality embeddings using BGE-small model\n- **Contextual Search**: Semantic similarity search with metadata filtering\n- **Natural Language Q\u0026A**: AI-powered responses using local LLM inference\n- **Modern Chat Interface**: Real-time streaming responses with source highlighting\n- **Production-Ready**: Docker-first deployment with comprehensive monitoring\n- **Extensible Architecture**: Frappe-agnostic design for any document type\n\n## 🏗️ Architecture\n\nDossier is built as a microservices architecture with clear separation of concerns:\n\n```\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   Frappe        │────│ Webhook Handler │────│  Message Queue  │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n                                                       │\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│  React Frontend │────│   API Gateway   │    │ Ingestion Svc   │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n                                │                       │\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   Query Service │────│   LLM Service   │    │ Embedding Svc   │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n                                                       │\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   PostgreSQL    │    │      Redis      │    │   Qdrant VDB    │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n```\n\n### Core Services\n\n- **Webhook Handler** (Node.js): Receives and validates Frappe webhooks\n- **Ingestion Service** (Python): Processes documents and manages ingestion workflows\n- **Embedding Service** (Python): Generates vector embeddings using BGE-small model\n- **Query Service** (Python): Handles semantic search and retrieval\n- **LLM Service** (Python): Generates natural language responses using Ollama\n- **API Gateway** (Python): Authentication, rate limiting, and request routing\n- **Frontend** (React): Modern chat interface with real-time streaming\n\n### Infrastructure\n\n- **PostgreSQL**: Configuration and metadata storage\n- **Redis**: Message queuing and caching\n- **Qdrant**: Vector database for semantic search\n- **Ollama**: Local LLM inference engine\n\n## 🚦 Quick Start\n\n### Prerequisites\n\n- Docker 20.10+ and Docker Compose 2.0+\n- 8GB RAM minimum (16GB recommended)\n- 50GB free disk space\n\n### 1. Clone and Configure\n\n```bash\ngit clone https://github.com/your-org/dossier.git\ncd dossier\n\n# Copy and edit environment configuration\ncp .env.example .env\n# Edit .env with your Frappe instance details\n```\n\n### 2. Start the System\n\n```bash\n# Start all services\nmake up\n\n# Wait for services to be ready\nmake health-check\n\n# Pull LLM models (optional - takes time)\nmake pull-models\n```\n\n### 3. Access the Interface\n\n- **Chat Interface**: http://localhost:3000\n- **API Gateway**: http://localhost:8080\n- **Service Health**: http://localhost:8080/health\n\n## 🔧 Configuration\n\n### Environment Variables\n\nKey configuration options in `.env`:\n\n```env\n# Database Configuration\nDATABASE_URL=postgresql://dossier:your_password@postgres:5432/dossier\nREDIS_URL=redis://redis:6379\n\n# Frappe Integration\nFRAPPE_URL=https://your-frappe-instance.com\nFRAPPE_API_KEY=your_frappe_api_key\nFRAPPE_API_SECRET=your_frappe_api_secret\n\n# Security\nJWT_SECRET=your_jwt_secret_key\nWEBHOOK_SECRET=your_webhook_secret\n\n# LLM Configuration\nDEFAULT_MODEL=llama3.2\nOLLAMA_URL=http://ollama:11434\n\n# Embedding Configuration\nEMBEDDING_MODEL=all-MiniLM-L6-v2\nBATCH_SIZE=32\n```\n\n### Doctype Configuration\n\nConfigure which Frappe doctypes to index:\n\n```bash\n# Access the database\nmake db-shell\n\n# Insert doctype configuration\nINSERT INTO doctype_configs (doctype, enabled, fields, filters, chunk_size, chunk_overlap)\nVALUES ('Customer', true, '[\"customer_name\", \"customer_details\"]', '{\"disabled\": 0}', 1000, 200);\n```\n\n## 📊 Monitoring and Observability\n\n### Health Checks\n\n```bash\n# Check all services\nmake health-check\n\n# Check specific service\ncurl http://localhost:8001/health\n\n# View service logs\nmake logs\n```\n\n### Metrics and Monitoring\n\n- **Prometheus Metrics**: Available at `/metrics` endpoint on each service\n- **Structured Logging**: JSON logs with correlation IDs\n- **Distributed Tracing**: Request flow tracking across services\n\n### Performance Monitoring\n\n```bash\n# Run performance benchmarks\nmake benchmark\n\n# View system metrics\nmake metrics\n```\n\n## 🧪 Testing\n\n### Test Suites\n\n```bash\n# Run all tests\nmake test-all\n\n# Run specific test suites\nmake test-e2e              # End-to-end functionality\nmake test-performance      # Performance benchmarks\nmake test-integration      # System integration\n\n# Run deployment validation\npython scripts/deployment-validation.py\n```\n\n### Integration Testing\n\n```bash\n# Test complete workflow\nmake integration-full\n\n# Test individual components\nmake test-webhook\nmake test-ingestion\nmake test-query\n```\n\n## 🚀 Production Deployment\n\n### 1. Production Setup\n\n```bash\n# Create production environment\nmake prod-setup\n\n# Edit production configuration\nnano .env.prod\n```\n\n### 2. Security Hardening\n\n```bash\n# Generate secure secrets\nopenssl rand -hex 32  # JWT_SECRET\nopenssl rand -hex 32  # WEBHOOK_SECRET\nopenssl rand -base64 32  # POSTGRES_PASSWORD\n```\n\n### 3. Deploy to Production\n\n```bash\n# Build production images\nmake prod-build\n\n# Start production services\nmake prod-up\n\n# Verify deployment\nmake prod-status\nmake health-check-prod\n```\n\n### 4. SSL/TLS Configuration\n\nConfigure reverse proxy (Nginx/Traefik) for SSL termination. See [Deployment Guide](docs/deployment-guide.md) for detailed instructions.\n\n## 🛠️ Development\n\n### Development Environment\n\n```bash\n# Set up development environment\nmake setup-dev\n\n# Start development services with hot reload\nmake dev-up\n\n# Run development tools\nmake lint\nmake format\nmake test\n```\n\n### Adding New Features\n\n1. **Service Extensions**: Add new endpoints to existing services\n2. **Custom Processors**: Implement custom chunking or embedding strategies\n3. **UI Components**: Extend the React frontend with new features\n4. **Monitoring**: Add custom metrics and dashboards\n\n### API Documentation\n\nEach service exposes OpenAPI documentation:\n\n- **API Gateway**: http://localhost:8080/docs\n- **Ingestion Service**: http://localhost:8001/docs\n- **Query Service**: http://localhost:8003/docs\n- **LLM Service**: http://localhost:8004/docs\n\n## 📚 Documentation\n\n- **[Deployment Guide](docs/deployment-guide.md)**: Comprehensive deployment instructions\n- **[API Reference](docs/api-reference.md)**: Complete API documentation\n- **[Configuration Guide](docs/configuration.md)**: Detailed configuration options\n- **[Development Guide](docs/development.md)**: Development setup and workflows\n\n## 🔧 Troubleshooting\n\n### Common Issues\n\n1. **Services Won't Start**\n   ```bash\n   # Check logs\n   make logs\n\n   # Check resource usage\n   docker stats\n   ```\n\n2. **Database Connection Issues**\n   ```bash\n   # Test database connectivity\n   make db-shell\n\n   # Check database logs\n   docker-compose logs postgres\n   ```\n\n3. **High Memory Usage**\n   ```bash\n   # Check memory usage\n   docker stats --no-stream\n\n   # Restart services\n   make restart\n   ```\n\n### Debug Mode\n\n```bash\n# Enable debug logging\nexport DEBUG=true\nexport LOG_LEVEL=DEBUG\n\n# Restart services\nmake restart\n```\n\n## 🤝 Contributing\n\nYou are welcome to contribute!\n\n### Development Process\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests for new functionality\n5. Run the test suite\n6. Submit a pull request\n\n## 📄 License\n\nThis project is licensed under the MIT License.\n\n## 🙏 Acknowledgments\n\n- [Frappe Framework](https://frappeframework.com/) for the excellent base platform\n- [Qdrant](https://qdrant.tech/) for the vector database\n- [Ollama](https://ollama.ai/) for local LLM inference\n- [FastAPI](https://fastapi.tiangolo.com/) for the API framework\n- [React](https://reactjs.org/) for the frontend framework\n\n---\n\n**Built with ❤️ for the Frappe community**\n\nFor detailed deployment instructions, see the [Deployment Guide](docs/deployment-guide.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssenthilnathan3%2Fdossier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fssenthilnathan3%2Fdossier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fssenthilnathan3%2Fdossier/lists"}