https://github.com/rgeissen/uderia

Uderia - Autonomous Enterprise Platform
https://github.com/rgeissen/uderia

actionable agentic ai autonomous collaborative data-sovereignty efficient enterprise fingov hybrid on-premise optimizer platform sovereignty transparent trust

Last synced: 3 months ago
JSON representation

Uderia - Autonomous Enterprise Platform

Host: GitHub
URL: https://github.com/rgeissen/uderia
Owner: rgeissen
License: agpl-3.0
Created: 2025-09-03T12:03:18.000Z (11 months ago)
Default Branch: main
Last Pushed: 2026-03-06T23:20:16.000Z (5 months ago)
Last Synced: 2026-03-06T23:28:55.650Z (5 months ago)
Topics: actionable, agentic, ai, autonomous, collaborative, data-sovereignty, efficient, enterprise, fingov, hybrid, on-premise, optimizer, platform, sovereignty, transparent, trust
Language: Python
Homepage: https://uderia.com
Size: 86.1 MB
Stars: 2
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Uderia Platform - Your Trusted Data Agent
### Cloud-Level Reasoning. Zero-Trust Privacy.

The **Uderia Platform** delivers enterprise-grade AI orchestration with unmatched flexibility. Whether you leverage hyperscaler intelligence for maximum capability, run private local models for absolute sovereignty, or blend both approaches, you get cloud-level reasoning with complete control over your data and costs.

Experience a fundamental transformation in how you work with enterprise data:

- **From Intent to Autonomy** - Your AI organization that senses, reasons, and delivers. Stop orchestrating. Start delegating. Specialized agents coordinate autonomously to gather data, reason across domains, and synthesize actionable intelligence.
- **From Ideation to Operationalization** - Revolutionary IFOC Methodology adapts to your needs. Four execution modes (Ideate, Focus, Optimize, Coordinate) in one conversation with zero friction. Switch between creative ideation, document-verified answers, sovereign efficiency, and cross-team orchestration with a simple @TAG.
- **From Days to Seconds** - Discover insights via conversation. Operationalize them via API. Your conversational discovery is your production-ready automation.
- **From Hallucination to Ground Truth** - Knowledge Graph maps your databases and RAG retrieves verified documents. Every answer grounded in proven sources with full citations. Zero fabrication, complete traceability.
- **From Guesswork to Clarity** - Full transparency eliminates the AI black box. See every strategic plan, tool execution, and self-correction in real-time through the Live Status Window.
- **From Uncertainty to Accountability** - Every action recorded. Every decision traceable. Enterprise-grade audit logging captures every interaction with full forensic context for compliance (GDPR, SOC2) and accountability at scale.
- **From Prompt Hijacking to Prompt Integrity** - Two-layer encryption with license-derived keys makes prompt extraction cryptographically impossible. Your AI logic remains proprietary, auditable, and tamper-proof.
- **From Data Exposure to Data Sovereignty** - Your data, your rules, your environment. Execute with cloud intelligence while maintaining local privacy through decoupled planning and execution with Champion Cases.
- **From Isolated Expertise to Collective Intelligence** - [Intelligence Marketplace](#-intelligence-marketplace-collaborative-knowledge-sharing) transforms individual expertise into collective knowledge. Share and discover repositories, agent packs, skills, extensions, and knowledge graphs with one click.
- **From Context Contamination to Context Optimization** - Nine intelligent context modules with budget-aware orchestration. Dynamic adjustments, surplus redistribution, and intelligent condensation optimize every token.
- **From $$$ to ¢¢¢** - Revolutionary Fusion Optimizer with strategic planning, proactive optimization, and autonomous self-correction for cost-effective execution.
- **From Hidden Costs to Total Visibility** - Complete financial governance with real-time tracking, comprehensive analytics, and fine-grained cost control. Track every token, understand every cost.

Whether on-premises or in the cloud, you get **enterprise results** with **optimized speed** and **minimal token cost**, built on the six core principles detailed below.

![Demo](./images/AppOverview.gif)

---

### Table Of Contents

1. [Core Principles: A Superior Approach](#core-principles-a-superior-approach)
2. [Key Features](#-key-features)
3. [Core Components](#-core-components)
- [Profile Classes: The IFOC Workflow](#-profile-classes-the-ifoc-workflow)
- [The Fusion Optimizer](#-the-heart-of-the-application---the-engine--its-fusion-optimizer)
- [Retrieval-Augmented Generation (RAG)](#-retrieval-augmented-generation-rag-for-self-improving-ai)
- [Vector Store Abstraction Layer](#-vector-store-abstraction-layer)
- [Intelligence Marketplace: Collaborative Knowledge Sharing](#-intelligence-marketplace-collaborative-knowledge-sharing)
- [Skills: Pre-Processing Context Injection](#-skills-pre-processing-context-injection)
- [Extensions: Post-Processing Transformations](#-extensions-post-processing-transformations)
- [Interactive Visual Components: Generative UI](#-interactive-visual-components-generative-ui)
4. [Security Architecture](#-security-architecture)
- [License-Based Prompt Encryption](#-license-based-prompt-encryption)
- [Execution Provenance Chain (EPC)](#-execution-provenance-chain-epc)
5. [System Architecture & Deployment](#%EF%B8%8F-system-architecture--deployment)
6. [Installation and Setup Guide](#-installation-and-setup-guide)
- [Model Selection: Recommended vs All Models](#model-selection-recommended-vs-all-models)
- [Command Line Options](#command-line-options)
7. [User Guide](#-user-guide)
- [Getting Started](#getting-started)
- [Using the Interface](#using-the-interface)
- [Advanced Context Management](#advanced-context-management)
- [REST API Integration](#rest-api-integration)
- [Real-Time Monitoring](#real-time-monitoring)
- [Operationalization](#operationalization)
- [Troubleshooting](#troubleshooting)
8. [Docker Deployment](#docker-deployment)
9. [License](#license)
10. [Author & Contributions](#author-contributions)
11. [Appendix: Feature Update List](#appendix-feature-update-list)

---

## Core Principles: A Superior Approach

The Uderia Platform transcends typical data chat applications by delivering a seamless and powerful experience based on six core principles:

### 🚀 Actionable
Go from conversational discovery to a production-ready, automated workflow in seconds. The agent's unique two-in-one approach means your interactive queries can be immediately operationalized via a REST API, eliminating the friction and redundancy of traditional data operations. What once took data experts weeks is now at your fingertips.

### 🔍 Transparent
Eliminate the "black box" of AI. The Uderia Platform is built on a foundation of absolute trust, with a Live Status Window that shows you every step of the agent's thought process. From the initial high-level plan to every tool execution and self-correction, you have a clear, real-time view, leaving no room for guesswork.

### ⚡ Efficient
Powered by the intelligent Fusion Optimizer, the agent features a revolutionary multi-layered architecture for resilient and cost-effective task execution. Through strategic and tactical planning, proactive optimization, and autonomous self-correction, the agent ensures enterprise-grade performance and reliability.

### 🛡️ Sovereignty
Your data, your rules, your environment. The agent gives you the ultimate freedom to choose your data exposure strategy. Leverage the power of hyperscaler LLMs, or run fully private models on your own infrastructure with Ollama, keeping your data governed entirely by your rules. The agent connects to the models you trust.

### 💰 Financial Governance
Complete cost transparency and control over your LLM spending. The agent provides real-time cost tracking, comprehensive analytics, and detailed visibility into every token consumed. With accurate per-model pricing, cost attribution by provider, and powerful administrative tools, you maintain full financial oversight of your AI operations.

### 🤝 Collaborative
Transform isolated expertise into collective intelligence. The [Intelligence Marketplace](#-intelligence-marketplace-collaborative-knowledge-sharing) enables community-driven sharing of execution patterns, domain knowledge, agent teams, skills, extensions, and knowledge graphs—reducing costs through proven strategies and creating a collaborative ecosystem where collective intelligence amplifies individual capabilities.

[⬆️ Back to Table of Contents](#table-of-contents)

---

## 🌟 Key Features

The Uderia Platform's features are organized around the six core principles that define its value proposition. Each principle is realized through a comprehensive set of capabilities designed to deliver enterprise-grade AI orchestration.

---

### 🚀 Actionable: From Discovery to Production in Seconds

Eliminate the friction between conversational exploration and production automation. The agent's unique architecture enables seamless operationalization of interactive queries.

* **Comprehensive REST API**: Full programmatic control with asynchronous task-based architecture for reliable, scalable automation:
- Session management (create, delete, list with conversation history)
- Query execution with async submit + poll pattern
- Task management (status polling, cancellation, result retrieval)
- Configuration management (profiles, LLM providers, MCP servers)
- RAG collection CRUD operations
- Analytics endpoints (session costs, token usage, efficiency metrics)

* **Long-Lived Access Tokens**: Secure automation without session management:
- Configurable expiration (90 days default, or never)
- SHA256 hashed storage with audit trail
- Usage tracking (last used timestamp, use count, IP address)
- Soft-delete preservation for compliance
- One-time display at creation for enhanced security

* **Apache Airflow Integration**: Production-ready DAG examples for batch query automation:
- Session reuse via `tda_session_id` variable
- Profile override via `tda_profile_id` for specialized workloads
- Bearer token authentication for secure API access
- Async polling pattern for reliable long-running executions
- Complete example DAG (`tda_00_execute_questions.py`) included

* **n8n Workflow Automation**: Visual node-based workflow builder for enterprise automation:
- Three production-ready workflow templates (Simple Query, Scheduled Reports, Slack Integration)
- Profile override support via REST API `profile_id` parameter
- Event-driven triggers (webhooks, cron schedules, manual execution)
- Linear ultra-clean workflow pattern for reliability
- Business process routing (email, Slack, CRM, databases)
- Docker deployment with reverse proxy support
- Comprehensive documentation with troubleshooting guides ([see docs/n8n](docs/n8n/README.md))

* **Flowise Integration**: Low-code workflow automation and chatbot development:
- Pre-built agent flow for TDA Conversation handling
- Asynchronous submit & poll pattern implementation
- Session management with multi-turn conversation support
- Bearer token authentication for secure API access
- Profile override capability for specialized workflows
- TTS payload extraction for voice-enabled chatbots
- Visual workflow designer for complex orchestration
- Import-ready JSON template included ([see docs/Flowise](docs/Flowise/Flowise.md))

* **IFOC Workflow - From Ideation to Operationalization**: Revolutionary methodology that adapts to your needs—four execution modes in one conversation with zero friction:
- **🟢 IDEATE (Conversation)**: Brainstorm, explore, and draft solutions without touching live systems—creative ideation without constraints
- **🔵 FOCUS (Knowledge)**: Verified intelligence with zero-hallucination guarantee—every answer grounded in your documents for document-verified answers
- **🟠 OPTIMIZE (Efficiency)**: The powerhouse—Fusion Optimizer with full MCP Tools + Prompts support, strategic planning, and self-correction for sovereign efficiency
- **🟣 COORDINATE (Multi-Profile)**: Multi-level autonomous orchestration where coordinators manage specialist teams for cross-team orchestration
- Switch between modes instantly with a simple `@TAG` (e.g., `@CHAT`, `@POLICY`, `@OPTIMIZER`, `@EXECUTIVE`)
- Temporary overrides via `@TAG` syntax for single queries without changing defaults
- Nested coordination support: Build 3-level AI hierarchies (Master → Coordinators → Specialists)
- Complete safeguards: Circular dependency detection, depth limits, cost visibility at every level
- Stop force-fitting every problem into one AI—match your intent to the right intelligence phase

* **Session Primer & Automatic Context**: Transform generic LLMs into pre-educated specialists from the first message:
- Auto-initialize new sessions with domain-specific knowledge
- Inject business context, schemas, and common patterns at session creation
- Profiles can define default primer content for consistent onboarding
- Eliminates repetitive context-setting for every new conversation
- Specialists understand your environment without manual training

* **Autonomous AI Organization (Genie Mode)**: From intent to autonomy—your AI organization that senses, reasons, and delivers:
- Multi-profile coordination where specialized agents work as a unified team
- Master coordinator intelligently routes queries to domain experts
- Automatic discovery and orchestration of specialist capabilities
- Cross-domain synthesis: agents gather data independently, then coordinate findings
- Real-time topology visualization showing agent activation and collaboration
- Stop orchestrating manually—start delegating to an AI organization that never sleeps
- Executive-level queries like "Improve Product Margin for Q4" automatically cascade to CFO, CMO, and Legal specialists

* **Intelligent MCP Server Import**: Seamless integration of community MCP servers with dual format support:
- Import from official MCP Registry format (io.example/server-name specifications)
- Import from Claude Desktop configuration files (direct migration)
- Automatic format detection with validation
- Bulk import multiple servers at once
- Three transport types: 🟠 STDIO (local), 🔵 HTTP (network), 🟢 SSE (streaming)
- STDIO servers: automatic subprocess lifecycle management (npx, uvx, python)
- Server-side ID generation ensures uniqueness
- Duplicate detection prevents configuration conflicts
- One-click access to [MCP community servers](https://github.com/modelcontextprotocol/servers)

* **Docker Deployment Support**: Production-ready containerization:
- Multi-user support in single shared container
- Environment variable overrides
- Volume mounts for sessions, logs, and keys
- Load balancer ready for horizontal scaling

---

### 🔍 Transparent: Eliminate the AI Black Box

Build trust through complete visibility into every decision, action, and data point the agent processes.

* **Live Status Panel**: Real-time window into the agent's reasoning process:
- Strategic plan visualization with phase-by-phase breakdown
- Tactical decision display showing tool selection rationale
- Raw data inspection for every tool response
- Self-correction events with recovery strategy visibility
- Streaming updates via Server-Sent Events (SSE)
- Dual-model cost breakdown for Fusion Optimizer showing strategic vs tactical costs with color-coded visualization (12-Feb-2026)

* **Dynamic Capability Discovery**: Instant overview of agent potential:
- Automatic loading of all MCP Tools from connected servers
- Prompt library display with categorization
- Resource enumeration for data source visibility
- Real-time capability updates on configuration changes
- Visual organization in tabbed Capabilities Panel

* **Rich Data Rendering**: Intelligently formats and displays various data types:
- Query results in interactive tables with sorting/filtering
- SQL DDL in syntax-highlighted code blocks
- Key metrics in summary cards
- Integrated charting engine for data visualization
- Real-time rendering as data streams in

* **Comprehensive Token Tracking**: Per-turn visibility into LLM consumption:
- Input token counts for every request
- Output token counts for every response
- Token-to-cost mapping with provider-specific pricing
- Historical token trends across sessions
- Optimization insights for cost-conscious users
- Theme-aware KPI displays adapt seamlessly to dark and light themes (11-Feb-2026)

* **Anti-Hallucination by Architecture**: Ground every answer in verified sources—zero fabrication:
- Strict retrieval-then-synthesize pattern where the LLM answers only from retrieved documents
- Knowledge Graph maps your databases (tables, relationships, business concepts) before query generation
- RAG system retrieves and scores documents from knowledge bases with full citations
- Source traceability with citations back to specific document chunks
- Transparent failure when no relevant sources exist (no guessing)
- Dual knowledge layers work in unison for comprehensive grounding

* **Execution Monitoring Dashboard**: Cross-source workload tracking:
- Real-time task list (running, completed, failed)
- Detailed execution logs with reasoning steps
- Tool invocation history with arguments and responses
- Error messages and stack traces for debugging
- Task control (cancel, retry) for operational flexibility

* **Enterprise Audit Logging - From Uncertainty to Accountability**: Every action recorded, every decision traceable:
- Complete forensic trail with user, IP, timestamp, and outcome for every interaction
- User authentication and authorization events (login attempts, OAuth flows, token generation)
- Configuration changes with before/after snapshots (LLM provider switches, profile updates)
- Prompt executions with full turn-level attribution and cost tracking
- API usage patterns and access history for security monitoring
- Admin actions on user accounts and system settings
- Progressive security lockouts and suspicious activity detection
- 20+ specialized logging functions for comprehensive coverage
- Configurable retention policies for GDPR and data sovereignty compliance
- REST API access for integration with compliance tools (SOC2, audit reports)
- From audit trail to compliance report in one click

* **Intelligent Context Window Management**: Budget-aware orchestration of every token sent to the LLM:
- Modular architecture with 9 pluggable context modules (system prompt, tools, history, RAG, knowledge, documents, and more)
- Five-pass assembly pipeline: resolve → dynamic adjustments → allocate & assemble → surplus reallocation → condense
- Per-module budget allocation with min/max constraints and priority-based condensation
- Dynamic adjustment rules that adapt context composition at runtime (first turn, long conversations, high-confidence RAG)
- Real-time observability via context window snapshot events with per-module utilization metrics
- 4 predefined context window types (Balanced, Knowledge-Heavy, Conversation-First, Token-Efficient) plus custom types
- Admin UI with live budget visualization, condensation order editor, and dynamic rule builder
- Per-session utilization analytics dashboard with trend charts and module breakdown
- tiktoken-based BPE token estimation for accurate budget planning
- [Full architecture documentation →](docs/Architecture/CONTEXT_WINDOW_ARCHITECTURE.md)

* **Interactive Visual Components**: Modular, plugin-based UI component library:
- **Canvas Component:** Interactive code editor powered by CodeMirror 6 with syntax highlighting (SQL, Python, JavaScript), live database connectors, in-place query execution, split-panel and fullscreen modes, and result rendering directly in chat
- **Chart Component:** Data visualization via G2Plot with 16 chart types (bar, line, pie, scatter, heatmap, gauge, radar, treemap, and more), 5-stage mapping resolution pipeline with cardinality-aware column selection, deterministic fast-path execution, and LLM-assisted fallback
- **Knowledge Graph:** Entity-relationship visualization for context enrichment and document structure exploration
- Self-contained component architecture with manifest-driven discovery and hot-reload
- Profile-level intensity control and admin governance
- 3 render targets: inline (chat bubble), sub_window (persistent canvas panel), status_panel (Live Status area)
- Third-party extensibility: add custom components without modifying core files

* **System Customization**: Take control of agent behavior:
- System Prompt Editor for per-model instruction customization
- Save and reset capabilities for experimentation
- Direct Model Chat for baseline testing without tools
- Dynamic Capability Management (enable/disable tools/prompts)
- Phased rollouts without server restart

---

### ⚡ Efficient: Intelligent Optimization Engine

The Fusion Optimizer delivers enterprise-grade performance, cost efficiency, and reliability.

**Real-World Cost Savings:**

- **Typical enterprise query**: "Show me all products with low inventory and notify suppliers"
- Traditional LLM wrapper: 15,000 tokens (full schema + full history) = $0.45/query
- Fusion Optimizer: 6,000 tokens (plan hydration + tactical fast path) = $0.18/query
- **60% cost reduction** on repeated similar queries

- **Monthly workload** (500 queries/day):
- Traditional: 500 × $0.45 × 30 = $6,750/month
- Fusion Optimizer: 500 × $0.18 × 30 = $2,700/month
- **Savings: $4,050/month ($48,600/year)**

- **Self-correction efficiency**: When errors occur, targeted replanning (2K tokens) vs full restart (15K tokens)

See the dedicated section below (**[The Heart of the Application - The Engine & its Fusion Optimizer](#the-heart-of-the-application---the-engine-its-fusion-optimizer)**) for comprehensive architectural details on:

* Multi-layered strategic and tactical planning
* Proactive optimization (Plan Hydration, Tactical Fast Path, Specialized Orchestrators)
* Autonomous self-correction and healing
* Context-aware learning from execution history
* Deterministic plan validation and hallucination prevention

**Key efficiency highlights:**

* **Self-Improving Learning System**: Closed-loop learning from past successes:
- Automatic capture and archiving of all successful interactions
- Token-based efficiency analysis to identify "champion" strategies
- Few-shot learning through injection of best-in-class examples
- Asynchronous processing to eliminate user-facing latency
- Per-user cost savings attribution and tracking

* **Planner Repository Constructors**: Modular plugin system for domain-specific optimization:
- Self-contained templates with validation schemas
- SQL query templates with extensibility for document Q&A, API workflows
- LLM-assisted auto-generation from database schemas
- Dynamic runtime registration from `rag_templates/` directory
- Programmatic population via REST API for CI/CD integration

* **Knowledge Repositories**: Domain context injection for better planning:
- PDF, TXT, DOCX, MD document support
- Configurable chunking strategies (fixed-size, paragraph, sentence, semantic)
- Automatic retrieval during strategic planning for context-aware decisions
- Semantic search for relevant background information
- [Intelligence Marketplace](#-intelligence-marketplace-collaborative-knowledge-sharing) integration for community knowledge sharing

---

### 🛡️ Sovereignty: Your Data, Your Rules, Your Environment

Maintain complete control over your data exposure strategy with flexible deployment and provider options.

* **Multi-Provider LLM Support**: Freedom to choose your AI infrastructure:
- **Cloud Hyperscalers**: Google (Gemini), Anthropic (Claude), OpenAI (GPT-4o), Azure OpenAI
- **AWS Bedrock**: Foundation models and inference profiles for custom/provisioned models
- **OpenRouter**: Unified gateway to 100+ open and proprietary models via a single API key
- **Friendli.AI**: High-performance serverless and dedicated endpoint support
- **Ollama**: Fully local, offline LLM execution on your own infrastructure
- Dynamic provider switching without configuration restart
- Live model refresh to fetch latest available models

* **Comparative LLM Testing**: Validate model behavior across providers:
- Identical MCP tools and prompts across different LLMs
- Side-by-side performance comparison
- Model capability robustness validation
- Direct model chat for baseline reasoning assessment
- Profile-based A/B testing with `@TAG` overrides

* **Encrypted Credential Storage**: Enterprise-grade security:
- Fernet symmetric encryption for all API keys
- Per-user credential isolation in SQLite database
- Credentials never logged or exposed in UI/API responses
- Secure passthrough to LLM/MCP providers
- Admin oversight without credential access

* **System Prompt Encryption**: Defense against prompt extraction and hijacking:
- All system prompts encrypted at rest in database (never stored as plain text)
- Two-layer encryption: distribution protection + license-tier keys
- Runtime-only decryption minimizes attack surface
- Database dumps and prompt injection attacks cannot extract system instructions
- Segregation of duty: all tiers decrypt for runtime execution, but only licensed Prompt Engineer/Enterprise tiers can view or edit system prompts in the UI — preventing unauthorized prompt tampering

* **Multi-User Isolation**: Complete session and data segregation:
- JWT-based authentication with 24-hour expiry
- User-specific sessions in separate directories
- Database-level user UUID isolation
- Role-based access control (User, Developer, Admin)
- Simultaneous multi-user support with no cross-contamination

* **Flexible Deployment Options**: Adapt to your infrastructure:
- Single-user development (local Python process)
- Multi-user production (load-balanced containers or shared instance)
- HTTPS support via reverse proxy configuration

- Docker volume mounts for persistent data

* **Voice Conversation Privacy**: Optional Google Cloud TTS with user-provided credentials:
- User-controlled API key management
- No server-side credential storage for voice features
- Browser-based Speech Recognition (local processing)
- Hands-free operation with configurable voice modes
- Key observations handling (autoplay-off, autoplay-on, off)

* **Document Upload & Multimodal Analysis**: Attach documents and images directly in chat conversations:
- Native multimodal delivery for capable providers (Google Gemini, Anthropic Claude, OpenAI GPT-4o, Azure, AWS Bedrock Claude)
- Automatic text extraction fallback for all other providers (OpenRouter, Friendli, Ollama, Bedrock Nova)
- Supports PDF, DOCX, TXT, MD, and image formats (JPG, PNG, GIF, WebP)
- Drag-and-drop or click-to-attach with image thumbnail previews and Visual/Text processing badges
- Provider-aware routing: images sent natively to vision models, documents via base64 or text extraction as appropriate
- Up to 5 files per message, 50 MB per file
- Full REST API support for programmatic upload workflows

* **Decoupled Planning with Champion Cases - The Sovereignty Breakthrough**: Uderia separates strategic intelligence from execution, enabling local models to perform like hyperscalers.

**How It Works:**
1. **Cloud Planning Phase**: Hyperscaler LLM creates strategic plan using full reasoning capability
2. **Champion Case Injection**: System retrieves proven execution patterns from organizational history
3. **Local Execution Phase**: Private on-prem model (Ollama) executes plan with champion guidance

**Result**: Your data never leaves your infrastructure, yet you get cloud-level strategic thinking.

**Example Workflow:**
- Query: "Analyze Q4 customer churn by segment"
- Cloud Planner: Creates 3-phase strategy (retrieve data, segment analysis, visualize)
- Champion Cases: Injects 5 proven churn analysis patterns from past successes
- Local Executor: Runs analysis on your private database using proven patterns
- Zero cloud exposure, maximum intelligence

**Business Impact:**
- Regulatory compliance: PHI, PII, financial data stays local
- Cost optimization: Expensive planning calls (8K tokens) happen once; cheap execution (2K tokens) reuses patterns
- Best of both worlds: Hyperscaler reasoning + on-prem sovereignty

* **Enterprise OAuth Authentication**: Federated identity with five providers:
- **Supported Providers:** Google (OIDC), GitHub (OAuth2), Microsoft/Azure AD (OIDC), Discord (OAuth2), Okta (OIDC)
- CSRF protection via cryptographic state parameter validation
- Email verification with configurable enforcement
- Account merging and deduplication (link multiple providers to one account)
- Rate limiting with abuse detection and progressive lockout
- Throwaway email blocking for registration integrity
- Brute force detection on login attempts
- Comprehensive audit logging for all authentication events
- Provider popularity analytics and usage tracking
- Account linking/unlinking for existing users
- Full REST API: initiate, callback, link, disconnect, verification endpoints

* **Three-Tier Role-Based Access Control**: Hierarchical permission system with granular feature governance:
- **User Tier** (19 features): Execute prompts, use MCP tools, manage own sessions and credentials, basic configuration
- **Developer Tier** (+25 features, 44 total): RAG collection management, template creation/testing, MCP diagnostics, import/export, advanced configuration
- **Admin Tier** (+24 features, 68 total): User management, credential oversight, system configuration, security settings, database administration, compliance reporting
- 68 distinct feature tags mapped to tiers with `@require_feature` decorators
- Hierarchical permission inheritance (Admin inherits all Developer features, Developer inherits all User features)
- 5 predefined feature groups for bulk permission checks (session_management, rag_management, template_management, user_management, system_admin)
- Tier-based UI adaptation: features appear/disappear based on user's tier
- REST API endpoint: `GET /api/v1/auth/me/features` returns user's available features
- Admin self-protection: administrators cannot modify their own tier
- Backward compatible with legacy `is_admin` field

---

### 💰 Financial Governance: Track Every Penny, Control Every Cost

Transparent, real-time cost tracking with fine-grained control over spending at every level of abstraction.

* **Real-Time Cost Tracking**: Per-interaction visibility:
- Automatic cost calculation using up-to-date provider pricing
- Per-turn breakdown (input tokens, output tokens, total cost)
- Session-level cumulative cost tracking
- User-level cost aggregation across all sessions
- Historical cost trends and analytics

* **Provider-Specific Pricing Models**: Accurate cost attribution:
- Google Gemini (1.5 Pro, 1.5 Flash, etc.) with context length tiers
- Anthropic Claude (Opus, Sonnet, Haiku) with standard/batch pricing
- OpenAI GPT-4o and GPT-4o-mini with tiered pricing
- Azure OpenAI (GPT-4, GPT-3.5-Turbo) with regional pricing
- AWS Bedrock (foundation models, inference profiles)
- OpenRouter (100+ models with live pricing via model catalog)
- Friendli.AI serverless and dedicated endpoints
- Ollama (local models, zero external cost)

* **Database-Backed Cost Persistence**: Complete financial audit trail:
- `llm_model_costs` table with versioned pricing
- `efficiency_metrics` table tracking token usage and learning system savings
- `user_sessions` table with per-session cost summaries
- `long_lived_access_tokens` with usage tracking
- Exportable cost reports for budgeting and forecasting

* **Profile-Based Spending Controls**: Optimize costs by workload:
- Tag profiles by cost characteristics (e.g., "COST" for Gemini Flash)
- Quick switching between expensive (Claude Opus) and economical (Gemini Flash) models
- Profile override via `@TAG` syntax for cost-conscious queries
- REST API profile selection for automated cost optimization

* **Efficiency Attribution**: Quantify learning system savings:
- Before/after token comparison for champion case-guided planning
- Estimated cost savings from few-shot learning
- Per-user attribution of efficiency gains
- Efficiency leaderboard for gamification
- Continuous improvement ROI visibility

* **Cost Optimization Recommendations**: Actionable insights:
- Model selection guidance based on task complexity
- Context pruning opportunities for token reduction
- Champion case population priorities for maximum savings
- Profile configuration suggestions for workload patterns

* **Consumption Profile Enforcement**: Granular usage controls and quotas:
- Four predefined tiers: Free, Pro, Enterprise, Unlimited
- Per-user prompt rate limits (hourly and daily)
- Monthly token quotas (input and output tokens separately)
- Configuration change rate limits per hour
- Profile activation/deactivation for testing
- Global override mode for emergency rate limiting
- Admin bypass for unrestricted system access
- Real-time enforcement with clear error messages
- Database-backed consumption tracking and audit trail

---

### 🤝 Collaborative: Intelligence Marketplace

The [Intelligence Marketplace](#-intelligence-marketplace-collaborative-knowledge-sharing) transforms isolated expertise into collective intelligence. Share and discover execution patterns, domain knowledge, agent teams, skills, extensions, and knowledge graphs with one click. Leverage community-validated assets to reduce costs, accelerate onboarding, and benefit from battle-tested strategies.

**See the full [Intelligence Marketplace](#-intelligence-marketplace-collaborative-knowledge-sharing) section in Core Components for:**
- Product catalog (6 asset types)
- Smart discovery and search
- Reference-based subscriptions and forking
- Community ratings and quality assurance
- Publishing workflows and API integration
- Agent Packs for portable AI teams

---

#### Bundled MCP Server: Google Search (Gemini Grounded Search)

Uderia ships with a ready-to-use MCP server for public internet search, located at `mcp_servers/google_search.py`. It uses Google's Gemini Grounded Search API to find current public information and return factual summaries with source citations.

**How to activate:**

1. **Import MCP Server** — Navigate to Setup → MCP Servers → Import and paste the following Claude Desktop configuration:
```json
{
"mcpServers": {
"Google Search": {
"command": "python",
"args": ["/app/mcp_servers/google_search.py"],
"env": {"GEMINI_API_KEY": "your-gemini-api-key"}
}
}
}
```
Replace `/app/mcp_servers/...` with the actual path if running outside Docker.

2. **Link to a Profile** — Create or edit a profile (e.g., `tool_enabled` type) and select "Google Search" as its MCP Server.

3. **Use** — The `external_search` tool is now available. Queries routed to this profile will search the public internet via Gemini and return results with citations.

Each user provides their own Gemini API key through the `env` field, enabling per-user authentication without shared credentials.

---

## 🧩 Core Components

The platform's capabilities are built on five core components that work together across all profile types — from execution methodology and optimization engine to knowledge retrieval, pre-processing skills, and post-processing extensions.

---

### 🎭 Profile Classes: The IFOC Workflow

The Uderia Platform introduces the **IFOC Workflow**—four distinct execution modes that mirror how experts actually solve problems. From creative exploration to coordinated execution, these modes transform how organizations leverage AI.

#### The IFOC Philosophy: Ideate → Focus → Optimize → Coordinate

```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ THE IFOC WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ 🟢 IDEATE 🔵 FOCUS 🟠 OPTIMIZE 🟣 COORDINATE │
│ ───────── ─────── ────────── ──────────── │
│ Brainstorm Research Execute Orchestrate │
│ Explore Verify Deliver Scale │
│ Draft Ground Operate Synthesize │
│ │
│ "What if...?" "What does "Do it." "Handle │
│ policy say?" everything." │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
```

Every profile belongs to one of four classes, each designed for a specific phase of intelligent work. Together, they create a composable AI architecture that adapts to any challenge.

---

##### 1. 🟢 IDEATE - Conversation Focused (LLM)

**Philosophy: Creative exploration without constraints**

```
┌─────────────────────────────────────────┐
│ User Question │
│ "How do I optimize this query?" │
└──────────────┬──────────────────────────┘
│
▼
┌──────────┐
│ LLM │ ← No tools, no data access
│ Analysis │ ← Pure reasoning & guidance
└─────┬────┘
│
▼
┌──────────────────────────────────────────┐
│ Expert Advice + Code Examples │
│ Ready for review before execution │
└──────────────────────────────────────────┘
```

**The Value:**
Transform your LLM into a trusted thought partner. Explore possibilities, brainstorm solutions, and draft approaches—all without touching live systems. The **Ideate** phase is where creativity flows freely.

**When to Use IDEATE:**
- **Exploring new ideas**: "What approaches could solve this problem?"
- **Learning concepts**: "Explain CTEs in SQL with examples"
- **Drafting solutions**: "Write a query to calculate customer lifetime value"
- **Planning ahead**: "What should I consider before migrating this database?"

**Breakthrough Potential:**
- **Zero-Cost Exploration**: Learn complex concepts without expensive tool invocations
- **Rapid Prototyping**: Draft SQL, APIs, and workflows before committing resources
- **Risk-Free Testing**: Validate approaches before touching production systems
- **Training Ground**: Onboard new team members without data exposure

**Example Profiles:**
- `@CHAT` - Your AI thought partner for any question
- `@ARCHITECT` - System design and architecture guidance
- `@MENTOR` - Code review and technical mentoring

---

##### 2. 🔵 FOCUS - Knowledge Focused (RAG)

**Philosophy: Grounded answers from verified sources**

```
┌─────────────────────────────────────────┐
│ User Question │
│ "What's our remote work policy?" │
└───────────────┬─────────────────────────┘
│
▼
┌───────────────────────┐
│ Semantic Search │
│ Your Document Store │ ← Policies, SOPs, Manuals
└───────┬───────────────┘
│
▼
┌──────────┐
│ LLM │
│ Synthesis│ ← ONLY uses retrieved docs
└─────┬────┘ NO general knowledge allowed
│
▼
┌───────────────────────────────────────────┐
│ Answer + Source Citations │
│ "Per HR Policy 3.2, page 7..." │
│ [View Source Document] │
└───────────────────────────────────────────┘
```

**The Value:**
Eliminate hallucinations entirely. The **Focus** phase grounds every answer in your verified documents, policies, and institutional knowledge. When accuracy matters more than creativity, Focus delivers verified intelligence.

**When to Use FOCUS:**
- **Compliance questions**: "What does policy say about data retention?"
- **Reference lookups**: "What's the approved vendor list?"
- **Verification**: "Is this approach compliant with our security standards?"
- **Institutional knowledge**: "How did we handle this situation before?"

**Breakthrough Potential:**
- **Zero Hallucination Guarantee**: Answers only from your verified documents
- **Institutional Memory**: Never lose domain expertise when people leave
- **Compliance Confidence**: All responses traceable to source documents
- **Instant Expertise**: New hires access decades of knowledge immediately

**Example Profiles:**
- `@POLICY` - Corporate policies and procedures
- `@LEGAL` - Contracts, compliance, and regulations
- `@TECHNICAL` - Engineering documentation and runbooks

---

##### 3. 🟠 OPTIMIZE - Efficiency Focused (Tool)

**Philosophy: Strategic execution that learns and heals**

```
┌──────────────────────────────────────────┐
│ User Request │
│ "Show Q4 revenue by region" │
└───────────────┬──────────────────────────┘
│
▼
┌─────────────────────┐
│ FUSION OPTIMIZER │
│ Strategic Planning │ ← Multi-phase meta-plan
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Tactical Execution │ ← Per-phase tool selection
│ + Self-Correction │ ← Autonomous error recovery
└────────┬────────────┘
│
▼
┌───────────────────────┐
│ Execute Operations │ ← Database queries, APIs, tools
│ via MCP Server │ ← MCP Tools + Prompts access
└───────┬───────────────┘
│
▼
┌───────────────────────────────────────────┐
│ Results + Visualizations │
│ Strategic plans + Self-healing execution │
│ Full transparency + audit trail │
└───────────────────────────────────────────┘
```

**The Value:**
This is **where ideas become reality**. The **Optimize** phase is powered by the revolutionary **Fusion Optimizer**—a multi-layered AI architecture that doesn't just execute tasks, it *thinks strategically*, *learns from experience*, and *heals itself*.

**When to Use OPTIMIZE:**
- **Live data operations**: "Show me Q4 results by region"
- **Complex workflows**: "Calculate inventory turnover and flag anomalies"
- **Automated tasks**: "Export customer segments to CSV"
- **Real-time monitoring**: "Alert if error rate exceeds threshold"

**Breakthrough Potential:**
- **Strategic Intelligence**: Creates multi-phase plans, not just single-shot responses
- **Autonomous Self-Correction**: Detects and fixes errors without human intervention
- **Full MCP Integration**: The only profile class that supports both **MCP Tools AND MCP Prompts**—execute pre-built workflows and complex multi-step operations directly
- **Cost Optimization**: 40% token reduction through plan hydration and tactical fast-path
- **Proactive Optimization**: Learns from context to skip redundant operations
- **Democratize Expertise**: Non-technical users execute complex operations through conversation
- **Complete Transparency**: See every decision, every tool call, every self-correction in real-time

**Real-World Transformation:**
- **Before**: Write SQL → Debug errors → Retry → Export → Format → Email (30 minutes)
- **After**: "Analyze Q4 sales trends and email the exec team" (2 minutes, auto-corrects, learns)

**Example Profiles:**
- `@OPTIMIZER` - Full Fusion Optimizer with all features enabled
- `@PROD` - Production database operations with enterprise LLM
- `@ANALYTICS` - Business intelligence and self-service reporting
- `@DEVOPS` - Infrastructure monitoring and intelligent automation

---

##### 4. 🟣 COORDINATE - Genie (Multi-Profile)

**Philosophy: Autonomous orchestration at scale**

**The Value:**
**This is the breakthrough.** The **Coordinate** phase creates **autonomous AI organizations** where specialized agents collaborate intelligently. One question triggers a cascade of expert consultations, data retrievals, and synthesis—all happening automatically.

**When to Use COORDINATE:**
- **Multi-domain questions**: "Analyze Q4, check compliance, and recommend strategy"
- **Complex investigations**: "Research this issue across all our systems"
- **Executive summaries**: "Prepare a board presentation on performance"
- **Cross-functional work**: "Coordinate finance, legal, and engineering review"

**Breakthrough Potential:**
- **Multi-Level Intelligence**: Coordinators can orchestrate other Coordinators, creating hierarchical AI organizations
- **Compound Expertise**: Combine database operations + knowledge retrieval + analysis in a single workflow
- **Adaptive Problem Solving**: The system decides which experts to consult based on the question
- **Conversational State**: Each expert maintains context across the entire conversation
- **Scalable Architecture**: Build AI "departments" with master coordinators managing specialized teams

**The Game-Changer: Nested Coordination**
Unlike simple AI assistants, Coordinate profiles can orchestrate *other Coordinate profiles*, enabling unprecedented organizational depth:

```
┌─────────────────────────────────────┐
│ User: "@CEO, analyze Q4 and │
│ recommend strategy" │
└────────────────┬────────────────────┘
│
▼
╔═══════════════════════╗
║ @CEO (Level 0) ║ ← Master Coordinator
║ Strategic Genie ║
╚═══════════╤═══════════╝
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ @CFO (Level 1) │ │ @CTO (Level 1) │ │ @LEGAL (Level 1) │
│ Financial Genie │ │ Technical Genie │ │ Policy Knowledge │
└────────┬─────────┘ └────────┬─────────┘ └──────────────────┘
│ │
┌────────┴────────┐ ┌───────┴────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ @ACCT │ │ @AUDIT │ │ @DB_ADM │ │ @SECURE │
│ DB Ops │ │ Checks │ │ Schema │ │ Analysis│
│(Level 2)│ │(Level 2)│ │(Level 2)│ │(Level 2)│
└─────────┘ └─────────┘ └─────────┘ └─────────┘
```

**How It Works:**
1. `@CEO` receives question, delegates to financial, technical, and legal experts
2. `@CFO` (itself a Genie) autonomously coordinates `@ACCT` and `@AUDIT`
3. `@CTO` (itself a Genie) autonomously coordinates `@DB_ADM` and `@SECURE`
4. Each specialist executes its task (queries, document retrieval, analysis)
5. Results cascade back up: specialists → coordinators → master
6. `@CEO` synthesizes comprehensive strategic recommendation

**All of this happens automatically from a single user question.**

**Real-World Transformation:**

*Before Genie Profiles:*
- User manually runs 5 separate queries
- Copies results between tools
- Synthesizes insights manually
- 30 minutes of repetitive work

*With Genie Profiles:*
- User: "@CEO, analyze Q4 performance, check compliance, and recommend next quarter strategy"
- Genie autonomously: Queries financial database → Retrieves policy docs → Analyzes trends → Cross-checks regulations → Synthesizes strategic recommendations
- Result delivered in 2 minutes, fully documented with audit trail

**Example Profiles:**
- `@EXECUTIVE` - C-level strategic intelligence coordinator
- `@ANALYST` - Coordinates data retrieval, policy checks, and reporting
- `@AUDITOR` - Multi-source compliance verification
- `@RESEARCHER` - Deep-dive investigations across systems and knowledge bases

**Safeguards & Control:**
- **Circular Dependency Prevention**: Automatic detection prevents infinite loops
- **Depth Limits**: Configurable maximum nesting (default: 3 levels)
- **Cost Visibility**: Real-time token tracking across all coordination levels
- **Transparent Execution**: See exactly which experts are consulted and why
- **Context Preservation**: Each expert maintains conversation history across turns

**Coordinator Pass-Through Optimization:**

When consulting a single expert is sufficient, the coordinator skips its own synthesis LLM call and passes the expert's answer through directly. This halves coordinator token cost and latency for focused single-expert questions.

*Pass-through fires when all of the following are true:*
- Exactly **one expert** was consulted (routing LLM called a single tool)
- **No prior conversation history** is present — i.e. the first turn in Full Context mode, or any turn when Turn Summaries mode is active
- The expert returned a non-empty answer

*Pass-through is always skipped when:*
- **Multiple experts** were consulted — their answers must be combined
- **Conversation history is present** (Full Context mode, turn 2 and beyond) — the coordinator weaves prior turns into its answer

All profile types qualify for pass-through: Optimize (`tool_enabled`) experts run their own internal synthesis pipeline, so the coordinator receives a finished natural-language answer, not raw data.

The Live Status panel shows a **"Synthesis Skipped"** card (instead of "LLM Synthesis Started / Results") when pass-through fires, making the optimization visible.

**Configuration Example:**
```json
{
"tag": "CEO",
"profile_type": "genie",
"genieConfig": {
"slaveProfiles": ["CFO_GENIE", "CTO_GENIE", "LEGAL_POLICY"],
"maxConcurrentSlaves": 3
}
}
```

Where `CFO_GENIE` and `CTO_GENIE` are themselves Genie profiles that coordinate their own specialist teams—creating true organizational intelligence.

---

#### IFOC Selection Guide: Choose the Right Phase

```
┌──────────────────────────────────────────────────────────────────────────────────────────┐
│ IFOC SELECTION MATRIX │
├──────────────┬────────────────┬───────────────┬──────────────┬─────────────────────────┤
│ │🟢 IDEATE │🔵 FOCUS │🟠 OPTIMIZE │🟣 COORDINATE │
│ │ (Conversation) │ (Knowledge) │ (Efficiency) │ (Multi-Profile) │
│ │ │ │ │ │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ PHILOSOPHY │ Explore │ Verify │ Execute │ Orchestrate │
│ │ Brainstorm │ Ground │ Deliver │ Scale │
│ │ Draft │ Reference │ Operate │ Synthesize │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ DATA ACCESS │ Optional │ Documents │ Full (MCP │ All Sources (Adaptive) │
│ │ (MCP Tools/RAG)│ Only │ Tools+Prompts│ │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ SAFETY │ Exploratory │ Zero │ Governed │ Composite (Inherits) │
│ │ (Interactive) │ Hallucinate │ Audit Trail │ │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ COST │ Low per turn │ Low-Moderate │ Lowest for │ Variable (Scales with │
│ (complex) │ (many turns) │ (~5K tokens) │ complex tasks│ complexity & depth) │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ SPEED │ Fastest │ Fast │ Fast + Smart │ Comprehensive │
│ │ │ │ (Self-heals) │ (Auto-parallel) │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ USE WHEN │ "How do I │ "What does │ "Show me Q4 │ "Analyze Q4, check │
│ │ optimize?" │ policy say?" │ results" │ compliance, recommend" │
└──────────────┴────────────────┴───────────────┴──────────────┴─────────────────────────┘
```

#### IFOC Workflow Patterns

**Pattern 1: Ideate → Focus → Optimize**
```
1. 🟢 IDEATE "Draft a query to find inactive customers"
→ Get SQL without execution (safe, cheap)

2. 🔵 FOCUS "What's our customer data retention policy?"
→ Verify compliance from documents

3. 🟠 OPTIMIZE "Execute the query I just drafted"
→ Run against live database (controlled)
```

**Pattern 2: Coordinate for Strategic Work**
```
🟣 COORDINATE "Prepare board presentation on Q4 performance"

Automatically triggers:
→ 🟠 @CFO (OPTIMIZE: Financial analysis + database queries)
→ 🔵 @LEGAL (FOCUS: Compliance checks from policies)
→ 🟠 @ANALYST (OPTIMIZE: Trend analysis + visualizations)
→ Synthesis (Coordinated strategic narrative)

Result: Complete board deck in minutes, not days
```

**Pattern 3: Learn → Apply → Deploy**
```
1. 🟢 @MENTOR "Explain CTEs in SQL" ← IDEATE: Learning
2. 🟢 @CHAT "Draft a CTE for X" ← IDEATE: Practice
3. 🟠 @DEV "Test this CTE" ← OPTIMIZE: Safe execution
4. 🟠 @PROD "Deploy to production" ← OPTIMIZE: Controlled rollout
```

#### Why IFOC Matters

**Traditional AI Assistants:**
- One-size-fits-all approach
- High token costs on every query
- No separation of concerns
- Limited to single LLM's capabilities

**Uderia's IFOC Architecture:**
- **Right phase for the task**: Match your intent to the appropriate mode
- **Composable intelligence**: Combine phases for compound expertise
- **Governed execution**: Clear boundaries for safety and compliance
- **Organizational scale**: Coordinate specialists like a real team

**The Bottom Line:**
Stop treating AI as a single assistant. The IFOC workflow mirrors how experts actually work: **Ideate** possibilities, **Focus** on verified knowledge, **Optimize** execution, and **Coordinate** complex multi-domain work. Build an AI organization where specialized experts collaborate intelligently.

##### 5. **Strategic Planner Intelligence**

The strategic planner understands profile class context and adapts behavior:

**Recent Enhancement (Jan 2026):** The planner now correctly disambiguates SQL queries when switching between profile classes. It prioritizes:
1. SQL mentioned in most recent llm_only conversation
2. SQL from most recent tool execution
3. Historical queries with explicit turn metadata

This prevents the planner from executing the wrong query when users switch from `@CHAT` to `@GOGET`.

#### Profile Class Specifications

##### Session Metadata Tracking

Every turn in a session records:

```json
{
"turn": 3,
"profile_id": "profile-uuid",
"profile_tag": "CHAT",
"profile_type": "llm_only",
"turn_metadata": {
"turn_number": 3,
"profile_tag": "CHAT",
"profile_type": "llm_only",
"is_most_recent": true,
"sql_mentioned_in_conversation": [
"SELECT UserName FROM DBC.SessionsV WHERE SessionID <> 0"
]
}
}
```

**Key Fields:**
- `profile_type` - "llm_only", "tool_enabled", "rag_focused", or "genie"
- `profile_tag` - Short identifier for quick switching
- `sql_mentioned_in_conversation` - Extracted SQL from llm_only responses
- `execution_trace` - Structured tool calls (only in tool_enabled)
- `knowledge_retrieval_event` - Document retrieval details (only in rag_focused)
- `genie_metadata` - Coordination details and child sessions (only in genie)

##### Profile Classification Modes

Profiles can be classified as:

**Light Classification:**
- Simple filter-based tool/prompt selection
- Fast, deterministic, no LLM call required
- Suitable for well-defined tool sets

**Full Classification (LLM-Assisted):**
- Dynamic categorization using LLM intelligence
- Adapts to ambiguous or complex tool selection
- Higher cost but more flexible

**Note:** Classification only applies to MCP-enabled profiles with multiple tools/prompts available.

##### Session Primer - Automatic Context Initialization

**The Value:**
Session Primer allows each profile to automatically execute an initialization question when a new session starts, pre-populating the context window with domain-specific knowledge. This transforms generic AI agents into **instantly educated specialists**.

**Why This Matters:**
Instead of manually explaining your database schema, business rules, or domain terminology at the start of every conversation, the Session Primer does it automatically. The agent starts every session already understanding your context.

**Configuration:**
In profile settings, enable "Session Primer" and provide an initialization question:
- `"Describe the database schema and explain the business meaning of each table"`
- `"What KPIs are tracked in this system and how are they calculated?"`
- `"Educate yourself on the API endpoints and their authentication requirements"`

**The Game-Changer: Specialized Expert Teams**

Session Primer becomes transformational with Genie profiles. Build teams of pre-educated specialists:

```
┌────────────────────────────────────────────────────────────────────────────────┐
│ BUILDING AN AI EXPERT ORGANIZATION │
├────────────────────────────────────────────────────────────────────────────────┤
│ │
│ @ANALYST (Genie Coordinator) │
│ └─ Primer: "You coordinate business analysis. Understand the team below." │
│ │
│ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │
│ │ @KPI_EXPERT │ │ @SCHEMA_EXPERT │ │ @SQL_EXECUTOR │ │
│ │ │ │ │ │ │ │
│ │ Primer: │ │ Primer: │ │ Primer: │ │
│ │ "Learn all KPI │ │ "Learn the DB │ │ "Learn the │ │
│ │ definitions, │ │ schema and the │ │ available SQL │ │
│ │ formulas, and │ │ business context│ │ tools and │ │
│ │ business │ │ of each table │ │ execution │ │
│ │ thresholds" │ │ and column" │ │ patterns" │ │
│ │ │ │ │ │ │ │
│ │ → Knows: Revenue │ │ → Knows: Orders │ │ → Knows: How to │ │
│ │ targets, churn │ │ = transactions,│ │ write safe, │ │
│ │ definitions, │ │ Customers = │ │ optimized │ │
│ │ seasonality │ │ B2B accounts │ │ queries │ │
│ └───────────────────┘ └───────────────────┘ └───────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────────────┘
```

**Single Question, Compound Intelligence:**

```
User: "@ANALYST, why did Q4 revenue drop?"

Execution Flow:
1. @KPI_EXPERT (already knows KPI definitions from primer)
→ "Revenue = sum(order_total) where status='completed'. Q4 target was $2M."

2. @SCHEMA_EXPERT (already knows table relationships from primer)
→ "Revenue data lives in orders table. Check order_status and created_at."

3. @SQL_EXECUTOR (already knows query patterns from primer)
→ Executes: SELECT month, SUM(order_total) FROM orders WHERE...
→ Returns: October $580K, November $420K, December $310K

4. @ANALYST synthesizes: "Q4 revenue was $1.31M vs $2M target (-34.5%).
December showed steepest decline. Recommend investigating..."
```

**Without Session Primer:** Each expert starts blank. User must explain schemas, KPIs, and context repeatedly.

**With Session Primer:** Each expert is pre-educated. They collaborate immediately with full domain understanding.

**Best Practices:**
- **Efficiency Focused profiles**: Prime with schema descriptions, API documentation
- **Knowledge Focused profiles**: Prime with "summarize the key topics in the knowledge base"
- **Genie profiles**: Prime with team structure and delegation guidelines
- **Conversation profiles**: Prime with domain terminology and business rules

#### Real-World Usage Patterns

##### Pattern 1: Learn, Then Execute

```
@CHAT: "How do I calculate the average sale price by region?"
→ Agent provides SQL template and explanation (Conversation Focused)

@GOGET: "execute this query for the sales_data table"
→ Agent runs the query against live database (Efficiency Focused)
```

##### Pattern 2: Review Before Production

```
@CHAT: "Write a query to delete inactive customers"
→ Agent drafts DELETE query for review (Conversation Focused)

[User reviews, approves]

@PROD: "execute this query"
→ Agent executes against production database with audit trail (Efficiency Focused)
```

##### Pattern 3: Document-Driven Decisions

```
@RAG: "What are our approved customer retention strategies?"
→ Agent retrieves from strategy documents, synthesizes answer (Knowledge Focused)

@CHAT: "Help me design a retention campaign based on those strategies"
→ Agent provides implementation guidance (Conversation Focused)

@GOGET: "Execute a query to identify at-risk customers for the campaign"
→ Agent runs the query against live database (Efficiency Focused)
```

##### Pattern 4: Compliance and Policy Verification

```
@RAG: "What does our security policy say about API key rotation?"
→ Agent retrieves exact policy language with citations (Knowledge Focused)

@GOGET: "Check which API keys in our system are older than 90 days"
→ Agent queries credential store via MCP tools (Efficiency Focused)
```

#### Implementation Details

**Profile Switching:**
- Type `@` in chat input to see all available profiles
- Select with Tab/Enter or click
- Profile badge shows active override
- Session header displays both default (★) and override (⚡)

**Execution Context:**
- Conversation Focused (LLM): System prompt + conversation history
- MCP-Enabled Profiles: System prompt + conversation + tools + prompts + resources
- Knowledge Focused (RAG): RAG synthesis prompt + conversation + retrieved documents

**Cost Implications:**
- Conversation Focused: ~2,000 input tokens per turn
- Efficiency Focused: ~8,000+ input tokens per turn (includes planner context + full tool context)
- Conversation Focused (with MCP tools): ~3,000-4,000 input tokens per turn (LangChain agent with tool context)
- Knowledge Focused: ~3,000-5,000 input tokens per turn (depends on documents retrieved)

**Historical Tracking:**
- `profile_tags_used[]` - All profiles used in session
- `models_used[]` - All LLM models used in session
- `knowledge_retrieval_event` - Document sources and relevance scores (RAG profiles)
- Complete audit trail for cost attribution

#### Best Practices

1. **Start Conversational:** Use Conversation Focused profiles to explore, learn, and draft queries
2. **Verify with Documents:** Use Knowledge Focused profiles for policy, compliance, and reference lookups
3. **Execute When Needed:** Switch to Efficiency Focused profiles only when live data operations are required
4. **Review Before Execution:** Draft destructive queries in `@CHAT`, review, then execute in `@GOGET`
5. **Cost Attribution:** Use profile tags to track which workloads drive costs
6. **Security:** Restrict MCP-enabled profiles to authorized users via role-based access
7. **Knowledge Quality:** Ensure Knowledge Focused profiles have well-curated knowledge collections

For Genie coordinator architecture and nested multi-level coordination, see:
[**Nested Genie Upgrade Guide (docs/Architecture/NESTED_GENIE_UPGRADE_GUIDE.md)**](docs/Architecture/NESTED_GENIE_UPGRADE_GUIDE.md)

[⬆️ Back to Table of Contents](#table-of-contents)

### 🎯 The Heart of the Application - The Engine & its Fusion Optimizer

The Uderia Platform is engineered to be far more than a simple LLM wrapper. Its revolutionary core is the **Fusion Optimizer**, a multi-layered engine designed for resilient, intelligent, and efficient task execution in complex enterprise environments. It transforms the agent from a mere tool into a reliable analytical partner.

#### 🧠 The Multi-Layered Planning Process

The Optimizer deconstructs every user request into a sophisticated, hierarchical plan.

1. **Strategic Planner**: For any non-trivial request, the agent first generates a high-level **meta-plan**. This strategic blueprint outlines the major phases required to fulfill the user's goal, such as "Phase 1: Gather table metadata" followed by "Phase 2: Analyze column statistics."

2. **Tactical Execution**: Within each phase, the agent operates tactically, determining the single best next action (a tool or prompt call) to advance the plan.

3. **Recursive Delegation**: The Planner is fully recursive. A single phase in a high-level plan can delegate its execution to a new, subordinate instance of the Planner. This allows the agent to solve complex problems by breaking them down into smaller, self-contained sub-tasks, executing them, and then returning the results to the parent process.

##### 💎 Dual-Model Architecture for Cost Optimization

The Fusion Optimizer supports **heterogeneous model assignment** across planning layers, enabling sophisticated cost-performance trade-offs:

* **Strategic Model**: More capable model for high-level reasoning
- Handles complex meta-planning and multi-phase orchestration
- Examples: GPT-4o, Claude Opus 4.6, Gemini 2.0 Flash Thinking
- Runs once per query (low call frequency)
- Investment justified by quality of strategic decisions

* **Tactical Model**: Faster, cost-efficient model for execution
- Handles tool selection and argument generation
- Examples: GPT-4o-mini, Claude Haiku, Llama 3.3 70B
- Runs multiple times per phase (high call frequency)
- 80-90% cost reduction vs. using premium model throughout

* **Real-Time Cost Visibility**: Live Status panel displays color-coded cost breakdown
- **Strategic cost** (blue): Planning and orchestration overhead
- **Tactical cost** (green): Per-phase execution costs
- Enables data-driven model selection and optimization

**Example Configuration:**
```
Strategic: Claude Opus 4.6 ($15/$75 per 1M tokens)
Tactical: Claude Haiku 4.5 ($1/$5 per 1M tokens)
Result: 70% cost reduction with negligible quality impact
```

This architecture is particularly effective for:
- High-volume production workloads where tactical calls dominate
- Iterative refinement queries with multiple tactical cycles
- Multi-turn sessions with shared strategic context
- Budget-conscious deployments requiring predictable costs

#### 🔧 Proactive Optimization Engine

Before and during execution, the Optimizer actively seeks to enhance performance and efficiency.

* **Plan Hydration**: The agent intelligently inspects a new plan to see if its initial steps require data that was already generated in the *immediately preceding turn*. If so, it "hydrates" the new plan by injecting the previous results, skipping redundant tool calls and delivering answers faster. This is particularly effective for follow-up clarifications and iterative refinements.

* **Tactical Fast Path**: For simple, single-tool phases where all required arguments are known, the Optimizer bypasses the tactical LLM call entirely and executes the tool directly, dramatically reducing latency. This eliminates unnecessary LLM calls for trivial interactions while maintaining conversational fluidity.

* **Specialized Orchestrators**: The agent is equipped with programmatic orchestrators to handle common complex patterns. For example, it can recognize a date range query (e.g., "last week") and automatically execute a single-day tool iteratively for each day in the range. The **Comparative Llama Invocation Orchestrator** executes deterministic prompt sequences across multiple LLMs, collects responses, and generates analytical comparisons for model behavior analysis.

* **Context Distillation**: To prevent context window overflow with large datasets, the agent automatically distills large tool outputs into concise metadata summaries before passing them to the LLM for planning, ensuring robust performance even with enterprise-scale data.

#### 📚 Continuous Improvement through Champion Case Learning

The agent learns from every successful interaction, building an ever-growing repository of "champion" strategies that guide future planning. This closed-loop learning system transforms individual successes into organizational knowledge.

* **Automatic Case Capture**: Every completed session is analyzed and archived:
- Full conversation history with query-response pairs
- Complete tool invocation sequences with arguments
- Strategic plan and tactical execution details
- Token usage and cost metrics
- Success indicators (no errors, user satisfaction signals)

* **Efficiency Analysis and Scoring**: Each case is evaluated for optimization potential:
- Token reduction opportunities (e.g., plan hydration candidates)
- Fast-path opportunities (e.g., queries that didn't need tools)
- Tool selection improvements (e.g., more direct paths to answers)
- Context management efficiency (e.g., Turn Summaries vs. Full Context)
- Before/after cost comparison for savings attribution

* **Champion Strategy Selection**: The learning system identifies best-in-class examples:
- Lowest token count for similar query patterns
- Fastest execution time for interactive workloads
- Highest success rate for complex multi-step tasks
- Most elegant tool orchestration sequences
- User-endorsed solutions (via explicit feedback)

* **Few-Shot Learning Injection**: Planning-time retrieval enhances strategic decisions:
- `_retrieve_similar_plans()` searches the Planner Repository for analogous cases
- Top-K similar cases injected into strategic planner context
- LLM leverages past successes to guide current planning
- Continuous improvement without model retraining
- Per-user savings attribution for efficiency tracking

* **Asynchronous Processing**: Zero user-facing latency:
- Case archiving happens in background threads
- Champion case retrieval during planning overlaps with user response rendering
- No blocking operations on critical path
- Graceful degradation if learning system unavailable

#### 📊 Performance Metrics and Resource Limits

The engine provides comprehensive observability and built-in safeguards against runaway execution.

**Real-Time Performance Tracking:**

* **Token Consumption Monitoring**: Per-turn and cumulative tracking:
- Input tokens (prompt + context + few-shot examples)
- Output tokens (strategic plan + tactical steps + tool arguments + final response)
- Token-to-cost mapping with provider-specific pricing
- Historical trends and anomaly detection

* **Execution Time Profiling**: Detailed timing breakdown:
- Strategic planning latency
- Tactical loop execution time per iteration
- Tool invocation duration (network + processing)
- Response generation time
- End-to-end query latency with percentile metrics

* **Resource Utilization**: System-level metrics:
- Active session count and concurrency
- MCP server connection pool status
- ChromaDB vector store query performance
- SQLite database read/write latency
- Memory footprint per session

**Built-in Safeguards:**

* **Tactical Loop Iteration Limit**: Maximum 15 cycles per query to prevent infinite loops
* **Maximum Tool Invocations**: Cap on tool calls per tactical iteration to contain runaway execution
* **Context Window Management**: Budget-aware five-pass assembly with automatic condensation when approaching model limits ([architecture details](docs/Architecture/CONTEXT_WINDOW_ARCHITECTURE.md))
* **Timeout Enforcement**: Configurable query timeout with graceful degradation
* **Error Accumulation Threshold**: Abort after N consecutive tool failures to prevent thrashing

#### 🔄 Autonomous Self-Correction & Healing

When errors occur, the Optimizer initiates a sophisticated, multi-tiered recovery process.

1. **Pattern-Based Correction**: The agent first checks for known, recoverable errors (e.g., "table not found," "column not found").

2. **Targeted Recovery Prompts**: For these specific errors, it uses highly targeted, specialized prompts that provide the LLM with the exact context of the failure and guide it toward a precise correction (e.g., "You tried to query table 'X', which does not exist. Here is a list of similar tables...").

3. **Generic Recovery & Replanning**: If the error is novel, the agent falls back to a generic error-handling mechanism or, in the case of persistent failure, can escalate to generating an entirely new strategic plan to achieve the user's goal via an alternative route.

4. **Strategic Correction with Learning System**: The integrated **champion case learning system** provides the highest level of self-healing. By retrieving proven strategies from past successes, the agent can discard a flawed or inefficient plan entirely and adopt a proven, optimal approach, learning from its own history to correct its course.

#### 🛡️ Robust Safeguards

The Optimizer is built with enterprise-grade reliability in mind.

* **Deterministic Plan Validation**: Before execution begins, the agent deterministically validates the LLM-generated meta-plan for common structural errors (e.g., misclassifying a prompt as a tool) and corrects them, preventing entire classes of failures proactively.

* **Hallucination Prevention**: Specialized orchestrators detect and correct "hallucinated loops," where the LLM incorrectly plans to iterate over a list of strings instead of a valid data source. The agent semantically understands the intent and executes a deterministic, correct loop instead.

* **Definitive Error Handling**: The agent recognizes unrecoverable errors (e.g., database permission denied) and halts execution immediately, providing a clear explanation to the user instead of wasting resources on futile retry attempts.

For comprehensive details on the budget-aware context window orchestrator — including the five-pass assembly pipeline, 9 pluggable modules, dynamic adjustment rules, surplus reallocation, condensation strategies, and per-turn observability snapshots — see:
[**Context Window Architecture (docs/Architecture/CONTEXT_WINDOW_ARCHITECTURE.md)**](docs/Architecture/CONTEXT_WINDOW_ARCHITECTURE.md)

[⬆️ Back to Table of Contents](#table-of-contents)

---

### 🧬 Retrieval-Augmented Generation (RAG) for Self-Improving AI

The Uderia Platform integrates a powerful **Retrieval-Augmented Generation (RAG)** system designed to create a self-improving agent. This closed-loop feedback mechanism allows the agent's Planner to learn from its own past successes, continuously enhancing its decision-making capabilities over time.

The core value of this RAG implementation is its ability to automatically identify and leverage the most efficient strategies for given tasks. It works by:

1. **Capturing and Archiving:** Every successful agent interaction is captured and stored as a "case study."
2. **Analyzing Efficiency:** The system analyzes each case based on token cost to determine its efficiency.
3. **Identifying Champions:** It identifies the single "best-in-class" or "champion" strategy for any given user query.
4. **Augmenting Future Prompts:** When a similar query is received in the future, the system retrieves the champion case and injects it into the Planner's prompt as a "few-shot" example.

This process guides the Planner to generate higher-quality, more efficient plans based on proven, successful strategies, reducing token consumption and improving response quality without manual intervention. The entire process runs asynchronously in the background to ensure no impact on user-facing performance.

#### Two-Tier Repository Architecture

The application supports two distinct types of repositories, each serving a different purpose in the AI agent ecosystem:

##### Planner Repositories
**Purpose:** Store execution strategies and planning patterns
- Capture successful agent interactions as few-shot learning examples
- Contain SQL query patterns, API workflows, and proven execution traces
- Retrieved by the RAG system to guide future planning decisions
- Built via **Planner Repository Constructors** - modular templates for domain-specific pattern generation
- Automatically populated from agent execution history or manually via REST API
- Enable the agent to learn from past successes and improve over time
- Available in [Intelligence Marketplace](#-intelligence-marketplace-collaborative-knowledge-sharing) for community sharing

##### Knowledge Repositories
**Purpose:** Provide reference documentation and domain knowledge
- Store general documents, technical manuals, and business context
- Support for PDF, TXT, DOCX, MD, and other document formats
- Configurable chunking strategies (fixed-size, paragraph, sentence, semantic)
- Seamlessly integrated with strategic planning for intelligent context injection
- Retrieved during planning to inject domain context into strategic decision-making
- Enable the agent to query relevant background information when making decisions
- Available in [Intelligence Marketplace](#-intelligence-marketplace-collaborative-knowledge-sharing) for community sharing
- **Feature Status:** ✅ Fully integrated (Phase 1 complete - Nov 2025)

Both repository types are available in the [Intelligence Marketplace](#-intelligence-marketplace-collaborative-knowledge-sharing) for community sharing, discovery, and collaboration. The marketplace enables reference-based subscriptions, forking for customization, ratings and reviews, and flexible publishing options. This separation ensures that execution patterns (how to accomplish tasks) remain distinct from domain knowledge (what the agent needs to know), while both can be leveraged through the unified RAG system.

#### Planner Repository Constructors: Modular Plugin System (New - Nov 2025)

The RAG system now features a **modular template architecture** that enables domain-specific customization and extensibility:

* **Plugin-Based Design**: Templates are self-contained plugins with their own schemas, validation logic, and population strategies
* **Template Types**: Support for SQL query templates, with extensibility for document Q&A, API workflows, and custom domains
* **Manifest System**: Each template declares its capabilities, required fields, and validation rules via a standardized manifest
* **Dynamic Registration**: Templates are automatically discovered and registered at runtime from the `rag_templates/` directory
* **Programmatic & LLM-Assisted Population**: Templates can be populated via REST API with structured examples or through LLM-assisted generation in the UI
* **Auto-Generation**: Built-in LLM workflows to automatically generate domain-specific examples from database schema or documentation

This modular approach allows organizations to extend the RAG system with custom templates tailored to their specific data patterns, query types, and business domains without modifying core agent code.

#### Knowledge Retrieval: Grounded Intelligence for the Focus Profile Class

While Planner Repositories power the self-improving Optimizer, **Knowledge Repositories** serve an entirely different purpose: they deliver **grounded, hallucination-free answers** from verified documents. This is the engine behind the **Focus** profile class (🔵 `rag_focused`).

##### The Value Proposition

Traditional LLMs generate answers from training data — a black box of uncertain provenance. Knowledge Retrieval inverts this model:

* **Zero Hallucination by Design**: The LLM synthesizes answers **exclusively** from retrieved documents. No general knowledge is injected. If the knowledge base doesn't contain the answer, the system says so transparently rather than fabricating one.
* **Institutional Memory at Scale**: Corporate policies, engineering runbooks, product documentation, compliance frameworks — all searchable via natural language. When experts leave, their knowledge stays.
* **Source Traceability**: Every answer includes citations back to specific documents, chunks, and metadata. Auditors and compliance teams can verify any claim.
* **Freshness-Aware Ranking**: Documents are scored using a hybrid of semantic relevance and temporal freshness, ensuring recent updates rank appropriately against older but relevant content.

##### How Knowledge Retrieval Works

```
┌──────────────────────────────────────────────────────────────────────────┐
│ KNOWLEDGE RETRIEVAL PIPELINE │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ User Query │
│ "What is our data retention policy for EU customers?" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Configuration Resolution │ ← Three-tier: Global → Profile → Lock │
│ │ maxDocs, freshnessWeight, │ │
│ │ minRelevance, maxTokens │ │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Semantic Search (ChromaDB) │ ← Embedding: all-MiniLM-L6-v2 │
│ │ Query each knowledge │ │
│ │ collection assigned to │ │
│ │ the profile │ │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Hybrid Scoring │ │
│ │ adjusted = (1-fw) × sim │ fw = freshnessWeight │
│ │ + fw × freshness │ sim = 1 - cosine_distance │
│ │ │ freshness = e^(-decay × days_old) │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Per-Document Deduplication │ ← maxChunksPerDocument limit │
│ │ + Minimum Relevance Filter │ ← minRelevanceScore threshold │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ LLM Synthesis │ ← Custom synthesis prompt override │
│ │ System prompt + retrieved │ available per profile │
│ │ documents + user query │ │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ Answer with Source Citations │
│ "Per the EU Data Governance Policy (Section 4.2)..." │
│ │
└──────────────────────────────────────────────────────────────────────────┘
```

##### Key Differentiators from Planner Repositories

| Aspect | Planner Repositories | Knowledge Repositories |
|--------|---------------------|----------------------|
| **Purpose** | Self-improving execution strategies | Grounded document retrieval |
| **Profile Class** | 🟠 Optimize (tool_enabled) | 🔵 Focus (rag_focused) |
| **Data Source** | Auto-captured execution traces | Uploaded documents (PDF, DOCX, TXT, MD) |
| **Scoring** | Similarity with efficiency penalties | Hybrid similarity + freshness |
| **Tool Execution** | Yes — full MCP tool calling | None — pure retrieval + synthesis |
| **Hallucination Risk** | Mitigated via proven patterns | Eliminated by design |

##### Configuration

Knowledge retrieval behavior is controlled through a **three-tier configuration resolution**:

1. **Admin-Locked** (highest priority): Global settings locked by admin override all profile values
2. **Profile Override**: Per-profile settings in the profile's `knowledgeConfig`
3. **Global Default** (lowest priority): Platform-wide defaults

| Parameter | Description | Default |
|-----------|-------------|---------|
| `maxDocs` | Maximum documents returned | 3 |
| `minRelevanceScore` | Minimum cosine similarity threshold | 0.30 |
| `maxTokens` | Token budget for knowledge context | 2,000 |
| `maxChunksPerDocument` | Limit chunks from same source | 0 (unlimited) |
| `freshnessWeight` | Blend ratio: 0.0 = pure relevance, 1.0 = pure freshness | 0.0 |
| `freshnessDecayRate` | Exponential decay rate for age penalty | 0.005 |
| `synthesisPromptOverride` | Custom system prompt for LLM synthesis | (none) |

##### Document Ingestion & Chunking

Knowledge Repositories support multiple document formats (PDF, DOCX, TXT, Markdown) with configurable chunking strategies:

* **Paragraph-based** (default): Respects natural document structure, combines small paragraphs, splits oversized ones
* **Sentence-based**: Fine-grained chunking for dense technical content
* **Fixed-size**: Character-count chunking with configurable overlap
* **Semantic**: Boundary-aware splitting that preserves meaning

Each chunk is embedded using `all-MiniLM-L6-v2` and stored in ChromaDB with metadata (title, author, creation date, source filename, category, tags) enabling rich filtering and freshness scoring.

For the comprehensive architecture deep-dive including scoring algorithms, execution flow, and advanced features, see:
[**Knowledge Retrieval Architecture (docs/Architecture/KNOWLEDGE_RETRIEVAL_ARCHITECTURE.md)**](docs/Architecture/KNOWLEDGE_RETRIEVAL_ARCHITECTURE.md)

For a comprehensive overview of the RAG architecture, template development, and maintenance utilities, please see the detailed documentation:
[**RAG System Documentation (docs/RAG/RAG.md)**](docs/RAG/RAG.md)
[**RAG Template Plugin Development (rag_templates/README.md)**](rag_templates/README.md)

[⬆️ Back to Table of Contents](#table-of-contents)

---

### 🗄️ Vector Store Abstraction Layer

The platform's RAG and Knowledge capabilities are powered by a **pluggable vector store layer** that decouples all embedding, storage, and retrieval operations from any single database vendor. This means you can start with the built-in local store and scale to enterprise infrastructure — without changing a single profile, collection, or workflow.

**Three production backends, one unified interface:**

| | **ChromaDB** (Default) | **Qdrant Cloud** | **Teradata Enterprise Vector Store** |
|---|---|---|---|
| **Best for** | Local / single-user / getting started | Cloud-native / managed vector search | Enterprise / shared infrastructure / governed data |
| **Deployment** | Embedded, zero-config | Managed cloud (Qdrant Cloud) | Server-side, connects to existing Teradata environment |
| **Embedding** | Client-side (SentenceTransformer) | Client-side (SentenceTransformer) | Server-side (Amazon Bedrock or Azure AI) — no local GPU needed |
| **Chunking** | Client-side (platform-managed) | Client-side (platform-managed) | Client-side *or* server-side — pass raw files and let the database handle it |
| **Search Modes** | Semantic | Semantic, Hybrid (semantic + BM25 keyword) | Semantic, Hybrid (native server-side BM25 — server-side chunked collections) |
| **Scaling** | Single-node | Cloud-managed, horizontal scaling | Massively parallel, leverages Teradata's query engine |

**Why this matters:**

* **No vendor lock-in.** Collections created on ChromaDB work identically on Qdrant or Teradata. Switch backends by changing a configuration — existing profiles, RAG templates, and marketplace packs continue to work unchanged.
* **From laptop to cloud to enterprise.** Start local with ChromaDB, move to Qdrant Cloud for managed scalability, or deploy on Teradata for governed enterprise infrastructure — all without touching your workflows. The Teradata backend adds server-side embedding (no local GPU costs), server-side chunking (upload raw PDFs and let the database handle splitting), and connection resilience with automatic stale-connection detection and serialized reconnect to survive transient network issues.
* **Hybrid Search.** Multiple backends support hybrid search combining dense vector similarity with BM25 keyword matching. Qdrant Cloud uses client-side Reciprocal Rank Fusion (RRF) with a configurable keyword weight (0.0–1.0). Teradata EVS uses **native server-side BM25** — the lake server performs RRF fusion internally, so no embedding or scoring happens on the platform side. Teradata hybrid search requires a server-side chunked collection (FILE_CONTENT_BASED); client-side chunked collections use semantic search only.
* **Teradata server-side chunking control.** When using the Teradata EVS backend, you can choose between *Optimized* (structure-aware dynamic chunking that follows document layout) and *Fixed Size* (character-based splitting with configurable chunk size). Header and footer trimming is available for PDF documents to exclude page headers/footers before chunking. All parameters are persisted at the collection level and editable per-upload.
* **Asynchronous counters.** For Teradata server-side ingestion, document and chunk counts update asynchronously. After a document upload completes, the platform polls the EVS backend at 30s, 60s, and 90s intervals to retrieve the final chunk count. During this window, the Knowledge Repository card may temporarily show stale counts — a notification is emitted once the actual count is available.
* **Capability-based negotiation.** Each backend declares what it supports (e.g., `SERVER_SIDE_CHUNKING`, `GET_ALL`, `METADATA_FILTERING`). The platform adapts its behavior automatically — no feature flags or conditional code in your workflows.
* **Safe concurrent access.** The factory uses config-fingerprinted singleton caching with per-key async locks, ensuring multiple sessions never race to initialize the same backend.

For the full architectural deep-dive — including ingestion paths, connection resilience patterns, and the EVS object ownership model — see:
[**Vector Store Abstraction Architecture (docs/Architecture/VECTOR_STORE_ABSTRACTION_ARCHITECTURE.md)**](docs/Architecture/VECTOR_STORE_ABSTRACTION_ARCHITECTURE.md)

[⬆️ Back to Table of Contents](#table-of-contents)

---

### 🤝 Intelligence Marketplace: Collaborative Knowledge Sharing

The Intelligence Marketplace transforms individual expertise into collective intelligence. Share proven execution patterns, domain knowledge, complete agent teams, behavioral skills, processing extensions, and entity-relationship models with the community—turning isolated insights into a powerful collaborative ecosystem that reduces costs, accelerates onboarding, and amplifies capabilities.

#### What is the Intelligence Marketplace?

The marketplace is Uderia's collaborative ecosystem for sharing, discovering, and deploying enterprise AI assets. It transforms the platform from

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rgeissen/uderia

Awesome Lists containing this project

README