{"id":34085359,"url":"https://github.com/b-a-m-n/flockparser","last_synced_at":"2025-12-14T13:02:44.765Z","repository":{"id":281866884,"uuid":"946680499","full_name":"B-A-M-N/FlockParser","owner":"B-A-M-N","description":"Distributed document RAG system with intelligent GPU/CPU orchestration. Auto-discovers heterogeneous nodes, routes workloads adaptively,    and achieves 60x+ speedups through VRAM-aware load balancing. Privacy-first architecture with 4 interfaces (CLI, API, MCP, Web UI).   Real distributed systems engineering, not just an API wrapper.","archived":false,"fork":false,"pushed_at":"2025-11-12T22:19:07.000Z","size":99914,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-11-13T00:14:54.773Z","etag":null,"topics":["api","auto-discovery","chromadb","cli","distributed-rag","document-processing","gpu-orchestration","heterogeneous-computing","llm","load-balancing","mcp","ollama","privacy-first","python","rag","semantic-search","vector-database","vram-aware","web-ui","workload-orchestration"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/B-A-M-N.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-03-11T14:11:04.000Z","updated_at":"2025-11-12T22:19:25.000Z","dependencies_parsed_at":"2025-03-11T15:46:35.820Z","dependency_job_id":"45767f5e-2611-48af-8dcc-d183b44bbf4a","html_url":"https://github.com/B-A-M-N/FlockParser","commit_stats":null,"previous_names":["benevolentjoker-johnl/flockparser","b-a-m-n/flockparser"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/B-A-M-N/FlockParser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/B-A-M-N%2FFlockParser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/B-A-M-N%2FFlockParser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/B-A-M-N%2FFlockParser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/B-A-M-N%2FFlockParser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/B-A-M-N","download_url":"https://codeload.github.com/B-A-M-N/FlockParser/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/B-A-M-N%2FFlockParser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27728767,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-14T02:00:11.348Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","auto-discovery","chromadb","cli","distributed-rag","document-processing","gpu-orchestration","heterogeneous-computing","llm","load-balancing","mcp","ollama","privacy-first","python","rag","semantic-search","vector-database","vram-aware","web-ui","workload-orchestration"],"created_at":"2025-12-14T13:02:40.928Z","updated_at":"2025-12-14T13:02:44.757Z","avatar_url":"https://github.com/B-A-M-N.png","language":"Python","readme":"# **FlockParse - Document RAG Intelligence with Distributed Processing**\n\n[![PyPI version](https://img.shields.io/pypi/v/flockparser.svg)](https://pypi.org/project/flockparser/)\n[![PyPI downloads](https://img.shields.io/pypi/dm/flockparser.svg)](https://pypi.org/project/flockparser/)\n[![CI Status](https://img.shields.io/github/actions/workflow/status/B-A-M-N/FlockParser/ci.yml?branch=main\u0026label=tests)](https://github.com/B-A-M-N/FlockParser/actions)\n[![codecov](https://codecov.io/gh/B-A-M-N/FlockParser/branch/main/graph/badge.svg)](https://codecov.io/gh/B-A-M-N/FlockParser)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![GitHub Stars](https://img.shields.io/github/stars/B-A-M-N/FlockParser?style=social)](https://github.com/B-A-M-N/FlockParser)\n\n\u003e **Distributed document RAG system with intelligent load balancing across heterogeneous hardware.** Auto-discovers Ollama nodes, routes workloads adaptively, and achieves 2x+ speedups through SOLLOL-powered distributed processing. Privacy-first with local/network/cloud interfaces.\n\n**What makes this different:** Real distributed systems engineering—not just API wrappers. Developed on CPU to ensure universal compatibility, designed for GPU acceleration when available. Handles heterogeneous hardware, network failures, and privacy requirements that rule out cloud APIs.\n\n---\n\n## Quick start — demo in ~3 minutes\n\nClone, start a minimal demo, open the UI:\n\n```bash\ngit clone https://github.com/B-A-M-N/FlockParser \u0026\u0026 cd FlockParser\n# option A: docker-compose demo (recommended)\ndocker-compose up --build -d\n# open Web UI: http://localhost:8501\n# open API: http://localhost:8000\n```\n\nIf you prefer local Python (no Docker):\n\n```bash\n# Option B: Use the idempotent install script\n./INSTALL_SOLLOL_IDEMPOTENT.sh --mode python\nsource .venv/bin/activate \u0026\u0026 python flock_webui.py\n# Web UI opens at http://localhost:8501\n\n# Or manually:\npython -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -r requirements.txt\npython flock_webui.py  # or flockparsecli.py for CLI\n```\n\nFor full setup instructions, see [detailed quickstart below](#-quickstart-3-steps).\n\n---\n\n## ⚠️ Important: Current Maturity\n\n**Status:** Beta (v1.0.0) - **Early adopters welcome, but read this first!**\n\n**What works well:**\n- ✅ Core distributed processing across heterogeneous nodes\n- ✅ GPU detection and VRAM-aware routing\n- ✅ Basic PDF extraction and OCR fallback\n- ✅ Privacy-first local processing (CLI/Web UI modes)\n\n**Known limitations:**\n- ⚠️ **Limited battle testing** - Tested by ~2 developers, not yet proven at scale\n- ⚠️ **Security gaps** - See [SECURITY.md](SECURITY.md) for current limitations\n- ⚠️ **Edge cases** - Some PDF types may fail (encrypted, complex layouts)\n- ⚠️ **Test coverage** - ~40% coverage, integration tests incomplete\n\n**Read before using:** [KNOWN_ISSUES.md](KNOWN_ISSUES.md) documents all limitations, edge cases, and roadmap honestly.\n\n**Recommended for:**\n- 🎓 Learning distributed systems\n- 🔬 Research and experimentation\n- 🏠 Personal projects with non-critical data\n- 🛠️ Contributors who want to help mature the project\n\n**Not yet recommended for:**\n- ❌ Mission-critical production workloads\n- ❌ Regulated industries (healthcare, finance) without additional hardening\n- ❌ Large-scale deployments (\u003e50 concurrent users)\n\n**Help us improve:** Report issues, contribute fixes, share feedback!\n\n---\n\n## **🏛️ Origins \u0026 Legacy**\n\nFlockParser's distributed inference architecture originated from **[FlockParser-legacy](https://github.com/B-A-M-N/FlockParser-legacy)**, which pioneered:\n- **Auto-discovery** of Ollama nodes across heterogeneous hardware\n- **Adaptive load balancing** with GPU/CPU awareness\n- **VRAM-aware routing** and automatic failover mechanisms\n\nThis core distributed logic from FlockParser-legacy was later extracted and generalized to become **[SOLLOL](https://github.com/B-A-M-N/SOLLOL)** - a standalone distributed inference platform that now powers both FlockParser and **[SynapticLlamas](https://github.com/B-A-M-N/SynapticLlamas)**.\n\n### **📊 Performance (CPU Cluster Testing)**\n\n**Tested on 2-node CPU cluster:**\n\n| Version | Workload | Time | Speedup | Notes |\n|---------|----------|------|---------|-------|\n| **Legacy** | 20 PDFs (~400 pages) | 60.9 min | Baseline | Single-threaded routing |\n| **Current (SOLLOL)** | 20 PDFs (~400 pages) | 30.0 min | **2.0×** | Intelligent load balancing |\n\n**Hardware:**\n- 2× CPU nodes (consumer hardware)\n- SOLLOL auto-discovery and adaptive routing\n- Processing rate: 1.9 chunks/sec across cluster\n\n**GPU acceleration:** Designed for GPU-aware routing (VRAM monitoring, adaptive allocation), not yet benchmarked.\n\n**See benchmarks:** [performance-comparison-sollol.png](performance-comparison-sollol.png)\n\n---\n\n## **🔒 Privacy Model**\n\n| Interface | Privacy Level | External Calls | Best For |\n|-----------|---------------|----------------|----------|\n| **CLI** (`flockparsecli.py`) | 🟢 **100% Local** | None | Personal use, air-gapped systems |\n| **Web UI** (`flock_webui.py`) | 🟢 **100% Local** | None | GUI users, visual monitoring |\n| **REST API** (`flock_ai_api.py`) | 🟡 **Local Network** | None | Multi-user, app integration |\n| **MCP Server** (`flock_mcp_server.py`) | 🔴 **Cloud** | ⚠️ Claude Desktop (Anthropic) | AI assistant integration |\n\n**⚠️ MCP Privacy Warning:** The MCP server integrates with Claude Desktop, which sends queries and document snippets to Anthropic's cloud API. Use CLI/Web UI for 100% offline processing.\n\n---\n\n## **Table of Contents**\n\n- [Key Features](#-key-features)\n- [👥 Who Uses This?](#-who-uses-this) - **Target users \u0026 scenarios**\n- [📐 How It Works (5-Second Overview)](#-how-it-works-5-second-overview) - **Visual for non-technical evaluators**\n- [Architecture](#-architecture) | **[📖 Deep Dive: Architecture \u0026 Design Decisions](docs/architecture.md)**\n- [Quickstart](#-quickstart-3-steps)\n- [Performance \u0026 Benchmarks](#-performance)\n- [🎓 Showcase: Real-World Example](#-showcase-real-world-example) ⭐ **Try it yourself**\n- [Usage Examples](#-usage)\n- [Security \u0026 Production](#-security--production-notes)\n- [🔗 Integration with SynapticLlamas \u0026 SOLLOL](#-integration-with-synapticllamas--sollol) - **Complete AI Ecosystem** ⭐\n- [Troubleshooting](#-troubleshooting-guide)\n- [Contributing](#-contributing)\n\n## **⚡ Key Features**\n\n- **🌐 Intelligent Load Balancing** - Auto-discovers Ollama nodes, detects GPU vs CPU, monitors VRAM, and routes work adaptively (2x speedup on CPU clusters, designed for GPU acceleration)\n- **🔌 Multi-Protocol Support** - CLI (100% local), REST API (network), MCP (Claude Desktop), Web UI (Streamlit) - choose your privacy level\n- **🎯 Adaptive Routing** - Sequential vs parallel decisions based on cluster characteristics (prevents slow nodes from bottlenecking)\n- **📊 Production Observability** - Real-time health scores, performance tracking, VRAM monitoring, automatic failover\n- **🔒 Privacy-First Architecture** - No external API calls required (CLI mode), all processing on-premise\n- **📄 Complete Pipeline** - PDF extraction → OCR fallback → Multi-format conversion → Vector embeddings → RAG with source citations\n\n---\n\n## **👥 Who Uses This?**\n\nFlockParser is designed for engineers and researchers who need **private, on-premise document intelligence** with **real distributed systems capabilities**.\n\n### **Ideal Users**\n\n| User Type | Use Case | Why FlockParser? |\n|-----------|----------|------------------|\n| **🔬 ML/AI Engineers** | Process research papers, build knowledge bases, experiment with RAG systems | GPU-aware routing, 21× faster embeddings, full pipeline control |\n| **📊 Data Scientists** | Extract insights from large document corpora (100s-1000s of PDFs) | Distributed processing, semantic search, production observability |\n| **🏢 Enterprise Engineers** | On-premise document search for regulated industries (healthcare, legal, finance) | 100% local processing, no cloud APIs, privacy-first architecture |\n| **🎓 Researchers** | Build custom RAG systems, experiment with distributed inference patterns | Full source access, extensible architecture, real benchmarks |\n| **🛠️ DevOps/Platform Engineers** | Set up document intelligence infrastructure for teams | Multi-node setup, health monitoring, automatic failover |\n| **👨‍💻 Students/Learners** | Understand distributed systems, GPU orchestration, RAG architectures | Real working example, comprehensive docs, honest limitations |\n\n### **Real-World Scenarios**\n\n✅ **\"I have 500 research papers and a spare GPU machine\"** → Process your corpus 20× faster with distributed nodes\n✅ **\"I can't send medical records to OpenAI\"** → 100% local processing (CLI/Web UI modes)\n✅ **\"I want to experiment with RAG without cloud costs\"** → Full pipeline, runs on your hardware\n✅ **\"I need to search 10,000 internal documents\"** → ChromaDB vector search with sub-20ms latency\n✅ **\"I have mismatched hardware (old laptop + gaming PC)\"** → Adaptive routing handles heterogeneous clusters\n\n### **Not Ideal For**\n\n❌ **Production SaaS with 1000+ concurrent users** → Current SQLite backend limits concurrency (~50 users)\n❌ **Mission-critical systems requiring 99.9% uptime** → Still in Beta, see [KNOWN_ISSUES.md](KNOWN_ISSUES.md)\n❌ **Simple one-time PDF extraction** → Overkill; use `pdfplumber` directly\n❌ **Cloud-first deployments** → Designed for on-premise/hybrid; cloud works but misses GPU routing benefits\n\n**Bottom line:** If you're building document intelligence infrastructure on your own hardware and need distributed processing with privacy guarantees, FlockParser is for you.\n\n---\n\n## **📐 How It Works (5-Second Overview)**\n\n**For recruiters and non-technical evaluators:**\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                         INPUT                                    │\n│  📄 Your Documents (PDFs, research papers, internal docs)       │\n└────────────────────────┬────────────────────────────────────────┘\n                         │\n                         ▼\n┌─────────────────────────────────────────────────────────────────┐\n│                     FLOCKPARSER                                  │\n│                                                                  │\n│  1. Extracts text from PDFs (handles scans with OCR)           │\n│  2. Splits into chunks, creates vector embeddings              │\n│  3. Distributes work across GPU/CPU nodes (auto-discovery)     │\n│  4. Stores in searchable vector database (ChromaDB)            │\n│                                                                  │\n│  ⚡ Distributed Processing: 3 nodes → 13× faster               │\n│  🚀 Distributed Processing: SOLLOL routing → 2× speedup        │\n│  🔒 Privacy: 100% local (no cloud APIs)                        │\n└────────────────────────┬────────────────────────────────────────┘\n                         │\n                         ▼\n┌─────────────────────────────────────────────────────────────────┐\n│                        OUTPUT                                    │\n│  🔍 Semantic Search: \"Find all mentions of transformers\"        │\n│  💬 AI Chat: \"Summarize the methodology section\"                │\n│  📊 Source Citations: Exact page/document references            │\n│  🌐 4 Interfaces: CLI, Web UI, REST API, Claude Desktop         │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n**Key Innovation:** Auto-detects GPU nodes, measures performance, and routes work to fastest hardware. No manual configuration needed.\n\n---\n\n## **🏗️ Architecture**\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│             Interfaces (Choose Your Privacy Level)           │\n│  CLI (Local) | REST API (Network) | MCP (Claude) | Web UI   │\n└──────────────────────┬──────────────────────────────────────┘\n                       │\n                       ▼\n┌─────────────────────────────────────────────────────────────┐\n│                  FlockParse Core Engine                      │\n│  ┌─────────────┐  ┌──────────────┐  ┌──────────────┐       │\n│  │   PDF       │  │  Semantic    │  │     RAG      │       │\n│  │ Processing  │→ │   Search     │→ │   Engine     │       │\n│  └─────────────┘  └──────────────┘  └──────────────┘       │\n│         │                │                    │              │\n│         ▼                ▼                    ▼              │\n│  ┌───────────────────────────────────────────────────┐      │\n│  │        ChromaDB Vector Store (Persistent)         │      │\n│  └───────────────────────────────────────────────────┘      │\n└──────────────────────┬──────────────────────────────────────┘\n                       │ Intelligent Load Balancer\n                       │ • Health scoring (GPU/VRAM detection)\n                       │ • Adaptive routing (sequential vs parallel)\n                       │ • Automatic failover \u0026 caching\n                       ▼\n    ┌──────────────────────────────────────────────┐\n    │       Distributed Ollama Cluster              │\n    │  ┌──────────┐  ┌──────────┐  ┌──────────┐   │\n    │  │ Node 1   │  │ Node 2   │  │ Node 3   │   │\n    │  │ GPU A    │  │ GPU B    │  │ CPU      │   │\n    │  │16GB VRAM │  │ 8GB VRAM │  │ 16GB RAM │   │\n    │  │Health:367│  │Health:210│  │Health:50 │   │\n    │  └──────────┘  └──────────┘  └──────────┘   │\n    └──────────────────────────────────────────────┘\n         ▲ Auto-discovery | Performance tracking\n```\n\n**Want to understand how this works?** Read the **[📖 Architecture Deep Dive](docs/architecture.md)** for detailed explanations of:\n- Why distributed AI inference solves real-world problems\n- How adaptive routing decisions are made (sequential vs parallel)\n- MCP integration details and privacy implications\n- Technical trade-offs and design decisions\n\n## **🚀 Quickstart (3 Steps)**\n\n**Requirements:**\n- Python 3.10 or later\n- Ollama 0.1.20+ (install from [ollama.com](https://ollama.com))\n- 4GB+ RAM (8GB+ recommended for GPU nodes)\n\n```bash\n# 1. Install FlockParser\npip install flockparser\n\n# 2. Start Ollama and pull models\nollama serve  # In a separate terminal\nollama pull mxbai-embed-large    # Required for embeddings\nollama pull llama3.1:latest       # Required for chat\n\n# 3. Run your preferred interface\nflockparse-webui                     # Web UI - easiest (recommended) ⭐\nflockparse                           # CLI - 100% local\nflockparse-api                       # REST API - multi-user\nflockparse-mcp                       # MCP - Claude Desktop integration\n```\n\n**💡 Pro tip:** Start with the Web UI to see distributed processing with real-time VRAM monitoring and node health dashboards.\n\n---\n\n### **📝 Programmatic Usage Example**\n\nWant to use FlockParser in your own Python code? Here's the minimal example:\n\n```python\n# Programmatic example\nfrom flockparser import FlockParser\nfp = FlockParser()                      # uses default config/registry\nfp.discover_nodes(timeout=3.0)          # waits for any SOLLOL/agents to register\nresult = fp.process_pdf(\"example.pdf\")  # routes work via SOLLOL; returns result dict\nprint(result[\"summary\"][:250])\n```\n\n**That's it!** FlockParser handles:\n- ✅ GPU detection and routing\n- ✅ Load balancing across nodes\n- ✅ Vector embeddings and storage\n- ✅ Automatic failover\n\n**More examples:** See `showcase/process_arxiv_papers.py` for batch processing and `flockparsecli.py` for the full CLI implementation.\n\n---\n\n### Alternative: Install from Source\n\nIf you want to contribute or modify the code:\n\n```bash\ngit clone https://github.com/B-A-M-N/FlockParser.git\ncd FlockParser\npip install -e .  # Editable install\n```\n\n### **Quick Test (30 seconds)**\n\n```bash\n# Start the CLI\npython flockparsecli.py\n\n# Process the sample PDF\n\u003e open_pdf testpdfs/sample.pdf\n\n# Chat with it\n\u003e chat\n🙋 You: Summarize this document\n```\n\n**First time?** Start with the Web UI (`streamlit run flock_webui.py`) - it's the easiest way to see distributed processing in action with a visual dashboard.\n\n---\n\n## **🐳 Docker Deployment (One Command)**\n\n### **Quick Start with Docker Compose**\n\n```bash\n# Clone and deploy everything\ngit clone https://github.com/B-A-M-N/FlockParser.git\ncd FlockParser\ndocker-compose up -d\n\n# Access services\n# Web UI: http://localhost:8501\n# REST API: http://localhost:8000\n# Ollama: http://localhost:11434\n```\n\n### **What Gets Deployed**\n\n| Service | Port | Description |\n|---------|------|-------------|\n| **Web UI** | 8501 | Streamlit interface with visual monitoring |\n| **REST API** | 8000 | FastAPI with authentication |\n| **CLI** | - | Interactive terminal (docker-compose run cli) |\n| **Ollama** | 11434 | Local LLM inference engine |\n\n### **Production Features**\n\n✅ **Multi-stage build** - Optimized image size\n✅ **Non-root user** - Security hardened\n✅ **Health checks** - Auto-restart on failure\n✅ **Volume persistence** - Data survives restarts\n✅ **GPU support** - Uncomment deploy section for NVIDIA GPUs\n\n### **Custom Configuration**\n\n```bash\n# Set API key\nexport FLOCKPARSE_API_KEY=\"your-secret-key\"\n\n# Set log level\nexport LOG_LEVEL=\"DEBUG\"\n\n# Deploy with custom config\ndocker-compose up -d\n```\n\n### **GPU Support (NVIDIA)**\n\nUncomment the GPU section in `docker-compose.yml`:\n\n```yaml\ndeploy:\n  resources:\n    reservations:\n      devices:\n        - driver: nvidia\n          count: all\n          capabilities: [gpu]\n```\n\nThen run: `docker-compose up -d`\n\n### **CI/CD Pipeline**\n\n```mermaid\ngraph LR\n    A[📝 Git Push] --\u003e B[🔍 Lint \u0026 Format]\n    B --\u003e C[🧪 Test Suite]\n    B --\u003e D[🔒 Security Scan]\n    C --\u003e E[🐳 Build Multi-Arch]\n    D --\u003e E\n    E --\u003e F[📦 Push to GHCR]\n    F --\u003e G[🚀 Deploy]\n\n    style A fill:#4CAF50\n    style B fill:#2196F3\n    style C fill:#2196F3\n    style D fill:#FF9800\n    style E fill:#9C27B0\n    style F fill:#9C27B0\n    style G fill:#F44336\n```\n\n**Automated on every push to `main`:**\n\n| Stage | Tools | Purpose |\n|-------|-------|---------|\n| **Code Quality** | black, flake8, mypy | Enforce formatting \u0026 typing standards |\n| **Testing** | pytest (Python 3.10/3.11/3.12) | 78% coverage across versions |\n| **Security** | Trivy | Vulnerability scanning \u0026 SARIF reports |\n| **Build** | Docker Buildx | Multi-architecture (amd64, arm64) |\n| **Registry** | GitHub Container Registry | Versioned image storage |\n| **Deploy** | On release events | Automated production deployment |\n\n**Pull the latest image:**\n```bash\ndocker pull ghcr.io/benevolentjoker-johnl/flockparser:latest\n```\n\n**View pipeline runs:** https://github.com/B-A-M-N/FlockParser/actions\n\n---\n\n## **🌐 Setting Up Distributed Nodes**\n\n**Want distributed processing?** Set up multiple Ollama nodes across your network for automatic load balancing.\n\n### Quick Multi-Node Setup\n\n**On each additional machine:**\n\n```bash\n# 1. Install Ollama\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# 2. Configure for network access\nexport OLLAMA_HOST=0.0.0.0:11434\nollama serve\n\n# 3. Pull models\nollama pull mxbai-embed-large\nollama pull llama3.1:latest\n\n# 4. Allow firewall (if needed)\nsudo ufw allow 11434/tcp  # Linux\n```\n\n**FlockParser will automatically discover these nodes!**\n\nCheck with:\n```bash\npython flockparsecli.py\n\u003e lb_stats  # Shows all discovered nodes and their capabilities\n```\n\n**📖 Complete Guide:** See **[DISTRIBUTED_SETUP.md](DISTRIBUTED_SETUP.md)** for:\n- Step-by-step multi-machine setup\n- Network configuration and firewall rules\n- Troubleshooting node discovery\n- Example setups (budget home lab to professional clusters)\n- GPU router configuration for automatic optimization\n\n---\n\n### **🔒 Privacy Levels by Interface:**\n- **Web UI (`flock_webui.py`)**: 🟢 100% local, runs in your browser\n- **CLI (`flockparsecli.py`)**: 🟢 100% local, zero external calls\n- **REST API (`flock_ai_api.py`)**: 🟡 Local network only\n- **MCP Server (`flock_mcp_server.py`)**: 🔴 Integrates with Claude Desktop (Anthropic cloud service)\n\n**Choose the interface that matches your privacy requirements!**\n\n## **🏆 Why FlockParse? Comparison to Competitors**\n\n| Feature | **FlockParse** | LangChain | LlamaIndex | Haystack |\n|---------|---------------|-----------|------------|----------|\n| **100% Local/Offline** | ✅ Yes (CLI/JSON) | ⚠️ Partial | ⚠️ Partial | ⚠️ Partial |\n| **Zero External API Calls** | ✅ Yes (CLI/JSON) | ❌ No | ❌ No | ❌ No |\n| **Built-in GPU Load Balancing** | ✅ Yes (auto) | ❌ No | ❌ No | ❌ No |\n| **VRAM Monitoring** | ✅ Yes (dynamic) | ❌ No | ❌ No | ❌ No |\n| **Multi-Node Auto-Discovery** | ✅ Yes | ❌ No | ❌ No | ❌ No |\n| **CPU Fallback Detection** | ✅ Yes | ❌ No | ❌ No | ❌ No |\n| **Document Format Export** | ✅ 4 formats | ❌ Limited | ❌ Limited | ⚠️ Basic |\n| **Setup Complexity** | 🟢 Simple | 🔴 Complex | 🔴 Complex | 🟡 Medium |\n| **Dependencies** | 🟢 Minimal | 🔴 Heavy | 🔴 Heavy | 🟡 Medium |\n| **Learning Curve** | 🟢 Low | 🔴 Steep | 🔴 Steep | 🟡 Medium |\n| **Privacy Control** | 🟢 High (CLI/JSON) | 🔴 Limited | 🔴 Limited | 🟡 Medium |\n| **Out-of-Box Functionality** | ✅ Complete | ⚠️ Requires config | ⚠️ Requires config | ⚠️ Requires config |\n| **MCP Integration** | ✅ Native | ❌ No | ❌ No | ❌ No |\n| **Embedding Cache** | ✅ MD5-based | ⚠️ Basic | ⚠️ Basic | ⚠️ Basic |\n| **Batch Processing** | ✅ Parallel | ⚠️ Sequential | ⚠️ Sequential | ⚠️ Basic |\n| **Performance** | 🚀 2x faster with distributed CPU routing | ⚠️ Varies by config | ⚠️ Varies by config | ⚠️ Varies by config |\n| **Cost** | 💰 Free | 💰💰 Free + Paid | 💰💰 Free + Paid | 💰💰 Free + Paid |\n\n### **Key Differentiators:**\n\n1. **Privacy by Design**: CLI and JSON interfaces are 100% local with zero external calls (MCP interface uses Claude Desktop for chat)\n2. **Intelligent GPU Management**: Automatically finds, tests, and prioritizes GPU nodes\n3. **Production-Ready**: Works immediately with sensible defaults\n4. **Resource-Aware**: Detects VRAM exhaustion and prevents performance degradation\n5. **Complete Solution**: CLI, REST API, MCP, and batch interfaces - choose your privacy level\n\n## **📊 Performance**\n\n### **Real-World Benchmark Results (CPU Cluster)**\n\n| Processing Mode | Workload | Time | Speedup | What It Shows |\n|----------------|----------|------|---------|---------------|\n| Legacy (single-threaded) | 20 PDFs | 60.9 min | 1x baseline | Basic routing |\n| Current (SOLLOL routing) | 20 PDFs | 30.0 min | **2.0x faster** | Intelligent load balancing across 2 CPU nodes |\n\n**Why the Speedup?**\n- SOLLOL intelligently distributes workload across available nodes\n- Adaptive parallelism prevents slow nodes from bottlenecking\n- Per-node queues with cross-node stealing optimize throughput\n- No network overhead (local cluster, no cloud APIs)\n\n**GPU acceleration:** Designed for GPU-aware routing with VRAM monitoring, not yet benchmarked.\n\n**Key Insight:** The system **automatically** detects performance differences and makes routing decisions - no manual GPU configuration needed.\n\n**Hardware (Benchmark Cluster):**\n- **Node 1 (10.9.66.154):** Consumer CPU (Intel/AMD)\n- **Node 2 (10.9.66.250):** Consumer CPU (Intel/AMD)\n- **Software:** Python 3.10, Ollama, SOLLOL 0.9.60+\n\n**Reproducibility:**\n- Full source code available in this repo\n- Test with your own hardware - results will vary based on cluster size and hardware\n\n### **🔬 Run Your Own Benchmarks**\n\nCompare FlockParser against LangChain and LlamaIndex on your hardware:\n\n```bash\n# Clone the repo if you haven't already\ngit clone https://github.com/B-A-M-N/FlockParser.git\ncd FlockParser\n\n# Install dependencies\npip install -r requirements.txt\n\n# Run comparison benchmark\npython benchmark_comparison.py\n```\n\n**What it tests:**\n- ✅ Processing time for 3 research papers (~50 pages total)\n- ✅ GPU utilization and load balancing\n- ✅ Memory efficiency\n- ✅ Caching effectiveness\n\n**Expected results:**\n- FlockParser: ~15-30s (with GPU cluster)\n- LangChain: ~45-60s (single node, no load balancing)\n- LlamaIndex: ~40-55s (single node, no GPU optimization)\n\n**Why FlockParser is faster:**\n- GPU-aware routing (automatic)\n- Multi-node parallelization\n- MD5-based embedding cache\n- Model weight persistence\n\nResults saved to `benchmark_results.json` for your records.\n\n### Reproduce the benchmarks\n\nTo reproduce the benchmark numbers used in this README:\n\n```bash\npython benchmark_comparison.py --runs 10 --concurrency 2\n```\n\n---\n\nThe project offers four main interfaces:\n1. **flock_webui.py** - 🎨 Beautiful Streamlit web interface (NEW!)\n2. **flockparsecli.py** - Command-line interface for personal document processing\n3. **flock_ai_api.py** - REST API server for multi-user or application integration\n4. **flock_mcp_server.py** - Model Context Protocol server for AI assistants like Claude Desktop\n\n---\n\n## **🎓 Showcase: Real-World Example**\n\n**Processing influential AI research papers from arXiv.org**\n\nWant to see FlockParser in action on real documents? Run the included showcase:\n\n```bash\npip install flockparser\npython showcase/process_arxiv_papers.py\n```\n\n### **What It Does**\n\nDownloads and processes 5 seminal AI research papers:\n- **Attention Is All You Need** (Transformers) - arXiv:1706.03762\n- **BERT** - Pre-training Deep Bidirectional Transformers - arXiv:1810.04805\n- **RAG** - Retrieval-Augmented Generation for NLP - arXiv:2005.11401\n- **GPT-3** - Language Models are Few-Shot Learners - arXiv:2005.14165\n- **Llama 2** - Open Foundation Language Models - arXiv:2307.09288\n\n**Total: ~350 pages, ~25 MB of PDFs**\n\n### **Expected Results**\n\n| Configuration | Processing Time | Notes |\n|---------------|----------------|-------|\n| **Single CPU node** | Baseline | Sequential processing |\n| **Multi-node CPU cluster** | **~2x faster** | SOLLOL distributed routing |\n\n**Note:** GPU acceleration designed but not yet benchmarked. Actual performance will vary based on your hardware.\n\n### **What You Get**\n\nAfter processing, the script demonstrates:\n\n1. **Semantic Search** across all papers:\n   ```python\n   # Example queries that work immediately:\n   \"What is the transformer architecture?\"\n   \"How does retrieval-augmented generation work?\"\n   \"What are the benefits of attention mechanisms?\"\n   ```\n\n2. **Performance Metrics** (`showcase/results.json`):\n   ```json\n   {\n     \"total_time\": \"Varies by hardware\",\n     \"papers\": [\n       {\n         \"title\": \"Attention Is All You Need\",\n         \"processing_time\": 4.2,\n         \"status\": \"success\"\n       }\n     ],\n     \"node_info\": [...]\n   }\n   ```\n\n3. **Human-Readable Summary** (`showcase/RESULTS.md`) with:\n   - Per-paper processing times\n   - Hardware configuration used\n   - Fastest/slowest/average performance\n   - Replication instructions\n\n### **Why This Matters**\n\nThis isn't a toy demo - it's processing actual research papers that engineers read daily. It demonstrates:\n\n✅ **Real document processing** - Complex PDFs with equations, figures, multi-column layouts\n✅ **Production-grade pipeline** - PDF extraction → embeddings → vector storage → semantic search\n✅ **Actual performance gains** - Measurable speedups on heterogeneous hardware\n✅ **Reproducible results** - Run it yourself with `pip install`, compare your hardware\n\n**Perfect for portfolio demonstrations:** Show this to hiring managers as proof of real distributed systems work.\n\n---\n\n## **🔧 Installation**  \n\n### **1. Clone the Repository**  \n```bash\ngit clone https://github.com/yourusername/flockparse.git\ncd flockparse\n```\n\n### **2. Install System Dependencies (Required for OCR)**\n\n**⚠️ IMPORTANT: Install these BEFORE pip install, as pytesseract and pdf2image require system packages**\n\n#### For Better PDF Text Extraction:\n- **Linux**:\n  ```bash\n  sudo apt-get update\n  sudo apt-get install poppler-utils\n  ```\n- **macOS**:\n  ```bash\n  brew install poppler\n  ```\n- **Windows**: Download from [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)\n\n#### For OCR Support (Scanned Documents):\nFlockParse automatically detects scanned PDFs and uses OCR!\n\n- **Linux (Ubuntu/Debian)**:\n  ```bash\n  sudo apt-get update\n  sudo apt-get install tesseract-ocr tesseract-ocr-eng poppler-utils\n  ```\n- **Linux (Fedora/RHEL)**:\n  ```bash\n  sudo dnf install tesseract poppler-utils\n  ```\n- **macOS**:\n  ```bash\n  brew install tesseract poppler\n  ```\n- **Windows**:\n  1. Install [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/wiki) - Download the installer\n  2. Install [Poppler for Windows](http://blog.alivate.com.au/poppler-windows/)\n  3. Add both to your system PATH\n\n**Verify installation:**\n```bash\ntesseract --version\npdftotext -v\n```\n\n### **3. Install Python Dependencies**\n```bash\npip install -r requirements.txt\n```\n\n**Key Python dependencies** (installed automatically):\n- fastapi, uvicorn - Web server\n- pdfplumber, PyPDF2, pypdf - PDF processing\n- **pytesseract** - Python wrapper for Tesseract OCR (requires system Tesseract)\n- **pdf2image** - PDF to image conversion (requires system Poppler)\n- Pillow - Image processing for OCR\n- chromadb - Vector database\n- python-docx - DOCX generation\n- ollama - AI model integration\n- numpy - Numerical operations\n- markdown - Markdown generation\n\n**How OCR fallback works:**\n1. Tries PyPDF2 text extraction\n2. Falls back to pdftotext if no text\n3. **Falls back to OCR** if still no text (\u003c100 chars) - **Requires Tesseract + Poppler**\n4. Automatically processes scanned documents without manual intervention\n\n### **4. Install and Configure Ollama**  \n\n1. Install Ollama from [ollama.com](https://ollama.com)\n2. Start the Ollama service:\n   ```bash\n   ollama serve\n   ```\n3. Pull the required models:\n   ```bash\n   ollama pull mxbai-embed-large\n   ollama pull llama3.1:latest\n   ```\n\n## **📜 Usage**\n\n### **🎨 Web UI (flock_webui.py) - Easiest Way to Get Started!**\n\nLaunch the beautiful Streamlit web interface:\n```bash\nstreamlit run flock_webui.py\n```\n\nThe web UI will open in your browser at `http://localhost:8501`\n\n**Features:**\n- 📤 **Upload \u0026 Process**: Drag-and-drop PDF files for processing\n- 💬 **Chat Interface**: Interactive chat with your documents\n- 📊 **Load Balancer Dashboard**: Real-time monitoring of GPU nodes\n- 🔍 **Semantic Search**: Search across all documents\n- 🌐 **Node Management**: Add/remove Ollama nodes, auto-discovery\n- 🎯 **Routing Control**: Switch between routing strategies\n\n**Perfect for:**\n- Users who prefer graphical interfaces\n- Quick document processing and exploration\n- Monitoring distributed processing\n- Managing multiple Ollama nodes visually\n\n---\n\n### **CLI Interface (flockparsecli.py)**\n\nRun the script:\n```bash\npython flockparsecli.py\n```\n\nAvailable commands:\n```\n📖 open_pdf \u003cfile\u003e   → Process a single PDF file\n📂 open_dir \u003cdir\u003e    → Process all PDFs in a directory\n💬 chat              → Chat with processed PDFs\n📊 list_docs         → List all processed documents\n🔍 check_deps        → Check for required dependencies\n🌐 discover_nodes    → Auto-discover Ollama nodes on local network\n➕ add_node \u003curl\u003e    → Manually add an Ollama node\n➖ remove_node \u003curl\u003e → Remove an Ollama node from the pool\n📋 list_nodes        → List all configured Ollama nodes\n⚖️  lb_stats          → Show load balancer statistics\n❌ exit              → Quit the program\n```\n\n### **Web Server API (flock_ai_api.py)**\n\nStart the API server:\n```bash\n# Set your API key (or use default for testing)\nexport FLOCKPARSE_API_KEY=\"your-secret-key-here\"\n\n# Start server\npython flock_ai_api.py\n```\n\nThe server will run on `http://0.0.0.0:8000` by default.\n\n#### **🔒 Authentication (NEW!)**\n\nAll endpoints except `/` require an API key in the `X-API-Key` header:\n\n```bash\n# Default API key (change in production!)\nX-API-Key: your-secret-api-key-change-this\n\n# Or set via environment variable\nexport FLOCKPARSE_API_KEY=\"my-super-secret-key\"\n```\n\n#### **Available Endpoints:**\n\n| Endpoint | Method | Auth Required | Description |\n|----------|--------|---------------|-------------|\n| `/` | GET | ❌ No | API status and version info |\n| `/upload/` | POST | ✅ Yes | Upload and process a PDF file |\n| `/summarize/{file_name}` | GET | ✅ Yes | Get an AI-generated summary |\n| `/search/?query=...` | GET | ✅ Yes | Search for relevant documents |\n\n#### **Example API Usage:**\n\n**Check API status (no auth required):**\n```bash\ncurl http://localhost:8000/\n```\n\n**Upload a document (with authentication):**\n```bash\ncurl -X POST \\\n  -H \"X-API-Key: your-secret-api-key-change-this\" \\\n  -F \"file=@your_document.pdf\" \\\n  http://localhost:8000/upload/\n```\n\n**Get a document summary:**\n```bash\ncurl -H \"X-API-Key: your-secret-api-key-change-this\" \\\n  http://localhost:8000/summarize/your_document.pdf\n```\n\n**Search across documents:**\n```bash\ncurl -H \"X-API-Key: your-secret-api-key-change-this\" \\\n  \"http://localhost:8000/search/?query=your%20search%20query\"\n```\n\n**⚠️ Production Security:**\n- Always change the default API key\n- Use environment variables, never hardcode keys\n- Use HTTPS in production (nginx/apache reverse proxy)\n- Consider rate limiting for public deployments\n\n### **MCP Server (flock_mcp_server.py)**\n\nThe MCP server allows FlockParse to be used as a tool by AI assistants like Claude Desktop.\n\n#### **Setting up with Claude Desktop**\n\n1. **Start the MCP server:**\n   ```bash\n   python flock_mcp_server.py\n   ```\n\n2. **Configure Claude Desktop:**\n   Add to your Claude Desktop config file (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS, or `%APPDATA%\\Claude\\claude_desktop_config.json` on Windows):\n\n   ```json\n   {\n     \"mcpServers\": {\n       \"flockparse\": {\n         \"command\": \"python\",\n         \"args\": [\"/absolute/path/to/FlockParser/flock_mcp_server.py\"]\n       }\n     }\n   }\n   ```\n\n3. **Restart Claude Desktop** and you'll see FlockParse tools available!\n\n#### **Available MCP Tools:**\n\n- `process_pdf` - Process and add PDFs to the knowledge base\n- `query_documents` - Search documents using semantic search\n- `chat_with_documents` - Ask questions about your documents\n- `list_documents` - List all processed documents\n- `get_load_balancer_stats` - View node performance metrics\n- `discover_ollama_nodes` - Auto-discover Ollama nodes\n- `add_ollama_node` - Add an Ollama node manually\n- `remove_ollama_node` - Remove an Ollama node\n\n#### **Example MCP Usage:**\n\nIn Claude Desktop, you can now ask:\n- \"Process the PDF at /path/to/document.pdf\"\n- \"What documents do I have in my knowledge base?\"\n- \"Search my documents for information about quantum computing\"\n- \"What does my research say about black holes?\"\n\n## **💡 Practical Use Cases**\n\n### **Knowledge Management**\n- Create searchable archives of research papers, legal documents, and technical manuals\n- Generate summaries of lengthy documents for quick review\n- Chat with your document collection to find specific information without manual searching\n\n### **Legal \u0026 Compliance**\n- Process contract repositories for semantic search capabilities\n- Extract key terms and clauses from legal documents\n- Analyze regulatory documents for compliance requirements\n\n### **Research \u0026 Academia**\n- Process and convert academic papers for easier reference\n- Create a personal research assistant that can reference your document library\n- Generate summaries of complex research for presentations or reviews\n\n### **Business Intelligence**\n- Convert business reports into searchable formats\n- Extract insights from PDF-based market research\n- Make proprietary documents more accessible throughout an organization\n\n## **🌐 Distributed Processing with Load Balancer**\n\nFlockParse includes a sophisticated load balancer that can distribute embedding generation across multiple Ollama instances on your local network.\n\n### **Setting Up Distributed Processing**\n\n#### **Option 1: Auto-Discovery (Easiest)**\n```bash\n# Start FlockParse\npython flockparsecli.py\n\n# Auto-discover Ollama nodes on your network\n⚡ Enter command: discover_nodes\n```\n\nThe system will automatically scan your local network (/24 subnet) and detect any running Ollama instances.\n\n#### **Option 2: Manual Node Management**\n```bash\n# Add a specific node\n⚡ Enter command: add_node http://192.168.1.100:11434\n\n# List all configured nodes\n⚡ Enter command: list_nodes\n\n# Remove a node\n⚡ Enter command: remove_node http://192.168.1.100:11434\n\n# View load balancer statistics\n⚡ Enter command: lb_stats\n```\n\n### **Benefits of Distributed Processing**\n\n- **Speed**: Process documents 2-10x faster with multiple nodes\n- **GPU Awareness**: Automatically detects and prioritizes GPU nodes over CPU nodes\n- **VRAM Monitoring**: Detects when GPU nodes fall back to CPU due to insufficient VRAM\n- **Fault Tolerance**: Automatic failover if a node becomes unavailable\n- **Load Distribution**: Smart routing based on node performance, GPU availability, and VRAM capacity\n- **Easy Scaling**: Just add more machines with Ollama installed\n\n### **Setting Up Additional Ollama Nodes**\n\nOn each additional machine:\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# Pull the embedding model\nollama pull mxbai-embed-large\n\n# Start Ollama (accessible from network)\nOLLAMA_HOST=0.0.0.0:11434 ollama serve\n```\n\nThen use `discover_nodes` or `add_node` to add them to FlockParse.\n\n### **GPU and VRAM Optimization**\n\nFlockParse automatically detects GPU availability and VRAM usage using Ollama's `/api/ps` endpoint:\n\n- **🚀 GPU nodes** with models loaded in VRAM get +200 health score bonus\n- **⚠️ VRAM-limited nodes** that fall back to CPU get only +50 bonus\n- **🐢 CPU-only nodes** get -50 penalty\n\n**To ensure your GPU is being used:**\n\n1. **Check GPU detection**: Run `lb_stats` command to see node status\n2. **Preload model into GPU**: Run a small inference to load model into VRAM\n   ```bash\n   ollama run mxbai-embed-large \"test\"\n   ```\n3. **Verify VRAM usage**: Check that `size_vram \u003e 0` in `/api/ps`:\n   ```bash\n   curl http://localhost:11434/api/ps\n   ```\n4. **Increase VRAM allocation**: If model won't load into VRAM, free up GPU memory or use a smaller model\n\n**Dynamic VRAM monitoring**: FlockParse continuously monitors embedding performance and automatically detects when a GPU node falls back to CPU due to VRAM exhaustion during heavy load.\n\n## **🔄 Example Workflows**\n\n### **CLI Workflow: Research Paper Processing**\n\n1. **Check Dependencies**:\n   ```\n   ⚡ Enter command: check_deps\n   ```\n\n2. **Process a Directory of Research Papers**:\n   ```\n   ⚡ Enter command: open_dir ~/research_papers\n   ```\n\n3. **Chat with Your Research Collection**:\n   ```\n   ⚡ Enter command: chat\n   🙋 You: What are the key methods used in the Smith 2023 paper?\n   ```\n\n### **API Workflow: Document Processing Service**\n\n1. **Start the API Server**:\n   ```bash\n   python flock_ai_api.py\n   ```\n\n2. **Upload Documents via API**:\n   ```bash\n   curl -X POST -F \"file=@quarterly_report.pdf\" http://localhost:8000/upload/\n   ```\n\n3. **Generate a Summary**:\n   ```bash\n   curl http://localhost:8000/summarize/quarterly_report.pdf\n   ```\n\n4. **Search Across Documents**:\n   ```bash\n   curl http://localhost:8000/search/?query=revenue%20growth%20Q3\n   ```\n\n## **🔧 Troubleshooting Guide**\n\n### **Ollama Connection Issues**\n\n**Problem**: Error messages about Ollama not being available or connection failures.\n\n**Solution**:\n1. Verify Ollama is running: `ps aux | grep ollama`\n2. Restart the Ollama service: \n   ```bash\n   killall ollama\n   ollama serve\n   ```\n3. Check that you've pulled the required models:\n   ```bash\n   ollama list\n   ```\n4. If models are missing:\n   ```bash\n   ollama pull mxbai-embed-large\n   ollama pull llama3.1:latest\n   ```\n\n### **PDF Text Extraction Failures**\n\n**Problem**: No text extracted from certain PDFs.\n\n**Solution**:\n1. Check if the PDF is scanned/image-based:\n   - Install OCR tools: `sudo apt-get install tesseract-ocr` (Linux)\n   - For better scanned PDF handling: `pip install ocrmypdf`\n   - Process with OCR: `ocrmypdf input.pdf output.pdf`\n\n2. If the PDF has unusual fonts or formatting:\n   - Install poppler-utils for better extraction\n   - Try using the `-layout` option with pdftotext manually:\n     ```bash\n     pdftotext -layout problem_document.pdf output.txt\n     ```\n\n### **Memory Issues with Large Documents**\n\n**Problem**: Application crashes with large PDFs or many documents.\n\n**Solution**:\n1. Process one document at a time for very large PDFs\n2. Reduce the chunk size in the code (default is 512 characters)\n3. Increase your system's available memory or use a swap file\n4. For server deployments, consider using a machine with more RAM\n\n### **API Server Not Starting**\n\n**Problem**: Error when trying to start the API server.\n\n**Solution**:\n1. Check for port conflicts: `lsof -i :8000`\n2. If another process is using port 8000, kill it or change the port\n3. Verify FastAPI is installed: `pip install fastapi uvicorn`\n4. Check for Python version compatibility (requires Python 3.7+)\n\n---\n\n## **🔐 Security \u0026 Production Notes**\n\n### **REST API Security**\n\n**⚠️ The default API key is NOT secure - change it immediately!**\n\n```bash\n# Set a strong API key via environment variable\nexport FLOCKPARSE_API_KEY=\"your-super-secret-key-change-this-now\"\n\n# Or generate a random one\nexport FLOCKPARSE_API_KEY=$(openssl rand -hex 32)\n\n# Start the API server\npython flock_ai_api.py\n```\n\n**Production Checklist:**\n- ✅ **Change default API key** - Never use `your-secret-api-key-change-this`\n- ✅ **Use environment variables** - Never hardcode secrets in code\n- ✅ **Enable HTTPS** - Use nginx or Apache as reverse proxy with SSL/TLS\n- ✅ **Add rate limiting** - Use nginx `limit_req` or FastAPI middleware\n- ✅ **Network isolation** - Don't expose API to public internet unless necessary\n- ✅ **Monitor logs** - Watch for authentication failures and abuse\n\n**Example nginx config with TLS:**\n```nginx\nserver {\n    listen 443 ssl;\n    server_name your-domain.com;\n\n    ssl_certificate /path/to/cert.pem;\n    ssl_certificate_key /path/to/key.pem;\n\n    location / {\n        proxy_pass http://127.0.0.1:8000;\n        proxy_set_header Host $host;\n        proxy_set_header X-Real-IP $remote_addr;\n    }\n}\n```\n\n### **MCP Privacy \u0026 Security**\n\n**What data leaves your machine:**\n- 🔴 **Document queries** - Sent to Claude Desktop → Anthropic API\n- 🔴 **Document snippets** - Retrieved context chunks sent as part of prompts\n- 🔴 **Chat messages** - All RAG conversations processed by Claude\n- 🟢 **Document files** - Never uploaded (processed locally, only embeddings stored)\n\n**To disable MCP and stay 100% local:**\n1. Remove FlockParse from Claude Desktop config\n2. Use CLI (`flockparsecli.py`) or Web UI (`flock_webui.py`) instead\n3. Both provide full RAG functionality without external API calls\n\n**MCP is safe for:**\n- ✅ Public documents (research papers, manuals, non-sensitive data)\n- ✅ Testing and development\n- ✅ Personal use where you trust Anthropic's privacy policy\n\n**MCP is NOT recommended for:**\n- ❌ Confidential business documents\n- ❌ Personal identifiable information (PII)\n- ❌ Regulated data (HIPAA, GDPR sensitive content)\n- ❌ Air-gapped or classified environments\n\n### **Database Security**\n\n**SQLite limitations (ChromaDB backend):**\n- ⚠️ No concurrent writes from multiple processes\n- ⚠️ File permissions determine access (not true auth)\n- ⚠️ No encryption at rest by default\n\n**For production with multiple users:**\n```bash\n# Option 1: Separate databases per interface\nCLI:     chroma_db_cli/\nAPI:     chroma_db_api/\nMCP:     chroma_db_mcp/\n\n# Option 2: Use PostgreSQL backend (ChromaDB supports it)\n# See ChromaDB docs: https://docs.trychroma.com/\n```\n\n### **VRAM Detection Method**\n\nFlockParse detects GPU usage via Ollama's `/api/ps` endpoint:\n\n```bash\n# Check what Ollama reports\ncurl http://localhost:11434/api/ps\n\n# Response shows VRAM usage:\n{\n  \"models\": [{\n    \"name\": \"mxbai-embed-large:latest\",\n    \"size\": 705530880,\n    \"size_vram\": 705530880,  # \u003c-- If \u003e0, model is in GPU\n    ...\n  }]\n}\n```\n\n**Health score calculation:**\n- `size_vram \u003e 0` → +200 points (GPU in use)\n- `size_vram == 0` but GPU present → +50 points (GPU available, not used)\n- CPU-only → -50 points\n\nThis is **presence-based detection**, not utilization monitoring. It detects *if* the model loaded into VRAM, not *how efficiently* it's being used.\n\n---\n\n## **💡 Features**\n\n| Feature | Description |\n|---------|-------------|\n| **Multi-method PDF Extraction** | Uses both PyPDF2 and pdftotext for best results |\n| **Format Conversion** | Converts PDFs to TXT, Markdown, DOCX, and JSON |\n| **Semantic Search** | Uses vector embeddings to find relevant information |\n| **Interactive Chat** | Discuss your documents with AI assistance |\n| **Privacy Options** | Web UI/CLI: 100% offline; REST API: local network; MCP: Claude Desktop (cloud) |\n| **Distributed Processing** | Load balancer with auto-discovery for multiple Ollama nodes |\n| **Accurate VRAM Monitoring** | Real GPU memory tracking with nvidia-smi/rocm-smi + Ollama API (NEW!) |\n| **GPU \u0026 VRAM Awareness** | Automatically detects GPU nodes and prevents CPU fallback |\n| **Intelligent Routing** | 4 strategies (adaptive, round_robin, least_loaded, lowest_latency) with GPU priority |\n| **Flexible Model Matching** | Supports model name variants (llama3.1, llama3.1:latest, llama3.1:8b, etc.) |\n| **ChromaDB Vector Store** | Production-ready persistent vector database with cosine similarity |\n| **Embedding Cache** | MD5-based caching prevents reprocessing same content |\n| **Model Weight Caching** | Keep models in VRAM for faster repeated inference |\n| **Parallel Batch Processing** | Process multiple embeddings simultaneously |\n| **Database Management** | Clear cache and clear DB commands for easy maintenance (NEW!) |\n| **Filename Preservation** | Maintains original document names in converted files |\n| **REST API** | Web server for multi-user/application integration |\n| **Document Summarization** | AI-generated summaries of uploaded documents |\n| **OCR Processing** | Extract text from scanned documents using image recognition |\n\n## **Comparing FlockParse Interfaces**\n\n| Feature | **flock_webui.py** | flockparsecli.py | flock_ai_api.py | flock_mcp_server.py |\n|---------|-------------------|----------------|-----------|---------------------|\n| **Interface** | 🎨 Web Browser (Streamlit) | Command line | REST API over HTTP | Model Context Protocol |\n| **Ease of Use** | ⭐⭐⭐⭐⭐ Easiest | ⭐⭐⭐⭐ Easy | ⭐⭐⭐ Moderate | ⭐⭐⭐ Moderate |\n| **Use case** | Interactive GUI usage | Personal CLI processing | Service integration | AI Assistant integration |\n| **Document formats** | Creates TXT, MD, DOCX, JSON | Creates TXT, MD, DOCX, JSON | Stores extracted text only | Creates TXT, MD, DOCX, JSON |\n| **Interaction** | Point-and-click + chat | Interactive chat mode | Query/response via API | Tool calls from AI assistants |\n| **Multi-user** | Single user (local) | Single user | Multiple users/applications | Single user (via AI assistant) |\n| **Storage** | Local file-based | Local file-based | ChromaDB vector database | Local file-based |\n| **Load Balancing** | ✅ Yes (visual dashboard) | ✅ Yes | ❌ No | ✅ Yes |\n| **Node Discovery** | ✅ Yes (one-click) | ✅ Yes | ❌ No | ✅ Yes |\n| **GPU Monitoring** | ✅ Yes (real-time charts) | ✅ Yes | ❌ No | ✅ Yes |\n| **Batch Operations** | ⚠️ Multiple upload | ❌ No | ❌ No | ❌ No |\n| **Privacy Level** | 🟢 100% Local | 🟢 100% Local | 🟡 Local Network | 🔴 Cloud (Claude) |\n| **Best for** | **🌟 General users, GUI lovers** | Direct CLI usage | Integration with apps | Claude Desktop, AI workflows |\n\n## **📁 Project Structure**\n\n- `/converted_files` - Stores the converted document formats (flockparsecli.py)\n- `/knowledge_base` - Legacy JSON storage (backwards compatibility only)\n- `/chroma_db_cli` - **ChromaDB vector database for CLI** (flockparsecli.py) - **Production storage**\n- `/uploads` - Temporary storage for uploaded documents (flock_ai_api.py)\n- `/chroma_db` - ChromaDB vector database (flock_ai_api.py)\n\n## **🚀 Recent Additions**\n- ✅ **GPU Auto-Optimization** - Background process ensures models use GPU automatically (NEW!)\n- ✅ **Programmatic GPU Control** - Force models to GPU/CPU across distributed nodes (NEW!)\n- ✅ **Accurate VRAM Monitoring** - Real GPU memory tracking across distributed nodes\n- ✅ **ChromaDB Production Integration** - Professional vector database for 100x faster search\n- ✅ **Clear Cache \u0026 Clear DB Commands** - Manage embeddings and database efficiently\n- ✅ **Model Weight Caching** - Keep models in VRAM for 5-10x faster inference\n- ✅ **Web UI** - Beautiful Streamlit interface for easy document management\n- ✅ **Advanced OCR Support** - Automatic fallback to OCR for scanned documents\n- ✅ **API Authentication** - Secure API key authentication for REST API endpoints\n- ⬜ **Document versioning** - Track changes over time (Coming soon)\n\n## **📚 Complete Documentation**\n\n### Core Documentation\n- **[📖 Architecture Deep Dive](docs/architecture.md)** - System design, routing algorithms, technical decisions\n- **[🌐 Distributed Setup Guide](DISTRIBUTED_SETUP.md)** - ⭐ **Set up your own multi-node cluster**\n- **[📊 Performance Benchmarks](BENCHMARKS.md)** - Real-world performance data and scaling tests\n- **[⚠️ Known Issues \u0026 Limitations](KNOWN_ISSUES.md)** - 🔴 **READ THIS** - Honest assessment of current state\n- **[🔒 Security Policy](SECURITY.md)** - Security best practices and vulnerability reporting\n- **[🐛 Error Handling Guide](ERROR_HANDLING.md)** - Troubleshooting common issues\n- **[🤝 Contributing Guide](CONTRIBUTING.md)** - How to contribute to the project\n- **[📋 Code of Conduct](CODE_OF_CONDUCT.md)** - Community guidelines\n- **[📝 Changelog](CHANGELOG.md)** - Version history\n\n### Technical Guides\n- **[⚡ Performance Optimization](PERFORMANCE_OPTIMIZATION.md)** - Tuning for maximum speed\n- **[🔧 GPU Router Setup](GPU_ROUTER_SETUP.md)** - Distributed cluster configuration\n- **[🤖 GPU Auto-Optimization](GPU_AUTO_OPTIMIZATION.md)** - Automatic GPU management\n- **[📊 VRAM Monitoring](VRAM_MONITORING.md)** - GPU memory tracking\n- **[🎯 Adaptive Parallelism](ADAPTIVE_PARALLELISM.md)** - Smart workload distribution\n- **[🗄️ ChromaDB Production](CHROMADB_PRODUCTION.md)** - Vector database scaling\n- **[💾 Model Caching](MODEL_CACHING.md)** - Performance through caching\n- **[🖥️ Node Management](NODE_MANAGEMENT.md)** - Managing distributed nodes\n- **[⚡ Quick Setup](QUICK_SETUP.md)** - Fast track to getting started\n\n### Additional Resources\n- **[🏛️ FlockParser-legacy](https://github.com/B-A-M-N/FlockParser-legacy)** - Original distributed inference implementation\n- **[📦 Docker Setup](docker-compose.yml)** - Containerized deployment\n- **[⚙️ Environment Config](.env.example)** - Configuration template\n- **[🧪 Tests](tests/)** - Test suite and CI/CD\n\n## **🔗 Integration with SynapticLlamas \u0026 SOLLOL**\n\nFlockParser is designed to work seamlessly with **[SynapticLlamas](https://github.com/B-A-M-N/SynapticLlamas)** (multi-agent orchestration) and **[SOLLOL](https://github.com/B-A-M-N/SOLLOL)** (distributed inference platform) as a unified AI ecosystem.\n\n### **The Complete Stack**\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│              SynapticLlamas (v0.1.0+)                       │\n│          Multi-Agent System \u0026 Orchestration                 │\n│  • Research agents  • Editor agents  • Storyteller agents  │\n└───────────┬────────────────────────────────────┬───────────┘\n            │                                    │\n            │ RAG Queries                        │ Distributed\n            │ (with pre-computed embeddings)     │ Inference\n            │                                    │\n     ┌──────▼──────────┐              ┌─────────▼────────────┐\n     │  FlockParser    │              │      SOLLOL          │\n     │  API (v1.0.4+)  │              │  Load Balancer       │\n     │  Port: 8000     │              │  (v0.9.31+)          │\n     └─────────────────┘              └──────────────────────┘\n            │                                    │\n            │ ChromaDB                          │ Intelligent\n            │ Vector Store                      │ GPU/CPU Routing\n            │                                    │\n     ┌──────▼──────────┐              ┌─────────▼────────────┐\n     │  Knowledge Base │              │  Ollama Nodes        │\n     │  41 Documents   │              │  (Distributed)       │\n     │  6,141 Chunks   │              │  GPU + CPU           │\n     └─────────────────┘              └──────────────────────┘\n```\n\n### **Why This Integration Matters**\n\n**FlockParser** provides document RAG capabilities, **SynapticLlamas** orchestrates multi-agent workflows, and **SOLLOL** handles distributed inference with intelligent load balancing.\n\n| Component | Role | Key Feature |\n|-----------|------|-------------|\n| **FlockParser** | Document RAG \u0026 Knowledge Base | ChromaDB vector store with 6,141+ chunks |\n| **SynapticLlamas** | Agent Orchestration | Multi-agent workflows with RAG integration |\n| **SOLLOL** | Distributed Inference | Load balanced embedding \u0026 model inference |\n\n### **Quick Start: Complete Ecosystem**\n\n```bash\n# Install all three packages (auto-installs dependencies)\npip install synaptic-llamas  # Pulls in flockparser\u003e=1.0.4 and sollol\u003e=0.9.31\n\n# Start FlockParser API (auto-starts with CLI)\nflockparse\n\n# Configure SynapticLlamas for integration\nsynaptic-llamas --interactive --distributed\n```\n\n### **Integration Example: Load Balanced RAG**\n\n```python\nfrom flockparser_adapter import FlockParserAdapter\nfrom sollol_load_balancer import SOLLOLLoadBalancer\n\n# Initialize SOLLOL for distributed inference\nsollol = SOLLOLLoadBalancer(\n    rpc_backends=[\"http://gpu-node-1:50052\", \"http://gpu-node-2:50052\"]\n)\n\n# Initialize FlockParser adapter\nflockparser = FlockParserAdapter(\"http://localhost:8000\", remote_mode=True)\n\n# Step 1: Generate embedding using SOLLOL (load balanced!)\nembedding = sollol.generate_embedding(\n    model=\"mxbai-embed-large\",\n    prompt=\"quantum entanglement\"\n)\n# SOLLOL routes to fastest GPU automatically\n\n# Step 2: Query FlockParser with pre-computed embedding\nresults = flockparser.query_remote(\n    query=\"quantum entanglement\",\n    embedding=embedding,  # Skip FlockParser's embedding generation\n    n_results=5\n)\n# FlockParser returns relevant chunks from 41 documents\n\n# Performance gain: 2-5x faster when SOLLOL has faster nodes!\n```\n\n### **New API Endpoints (v1.0.4+)**\n\nFlockParser v1.0.4 adds **SynapticLlamas-compatible** public endpoints:\n\n- **`GET /health`** - Check API availability and document count\n- **`GET /stats`** - Get knowledge base statistics (41 docs, 6,141 chunks)\n- **`POST /query`** - Query with pre-computed embeddings (critical for load balanced RAG)\n\n**These endpoints allow SynapticLlamas to bypass FlockParser's embedding generation and use SOLLOL's load balancer instead!**\n\n### **Learn More**\n\n- **[📖 Complete Integration Guide](INTEGRATION_WITH_SYNAPTICLLAMAS.md)** - Full architecture, examples, and setup\n- **[SynapticLlamas Repository](https://github.com/B-A-M-N/SynapticLlamas)** - Multi-agent orchestration\n- **[SOLLOL Repository](https://github.com/B-A-M-N/SOLLOL)** - Distributed inference platform\n\n---\n\n## **📝 Development Process**\n\nThis project was developed iteratively using Claude and Claude Code as coding assistants. All design decisions, architecture choices, and integration strategy were directed and reviewed by me.\n\n## **🤝 Contributing**\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## **📄 License**\nThis project is licensed under the MIT License - see the LICENSE file for details.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fb-a-m-n%2Fflockparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fb-a-m-n%2Fflockparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fb-a-m-n%2Fflockparser/lists"}