{"id":30087891,"url":"https://github.com/cloudymoma/rustmq","last_synced_at":"2025-09-19T05:02:52.526Z","repository":{"id":306690608,"uuid":"1026939745","full_name":"cloudymoma/rustmq","owner":"cloudymoma","description":"Cloud-native distributed message queue with storage-compute separation architecture. Licensed under BDL 1.0 (academic use permitted, commercial use requires license).","archived":false,"fork":false,"pushed_at":"2025-08-13T06:19:11.000Z","size":5206,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-13T07:21:52.623Z","etag":null,"topics":["bigquery","cloud-native","distributed-systems","google-cloud","message-queue","object-storage","quic","rust","streaming","webassembly"],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cloudymoma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/security/ACL_MANAGEMENT.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-27T00:31:08.000Z","updated_at":"2025-08-13T06:19:14.000Z","dependencies_parsed_at":"2025-07-27T04:29:36.042Z","dependency_job_id":"fc29cc60-40aa-45f2-ac1b-b1fd26e7d4a9","html_url":"https://github.com/cloudymoma/rustmq","commit_stats":null,"previous_names":["cloudymoma/rustmq"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cloudymoma/rustmq","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudymoma%2Frustmq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudymoma%2Frustmq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudymoma%2Frustmq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudymoma%2Frustmq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cloudymoma","download_url":"https://codeload.github.com/cloudymoma/rustmq/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudymoma%2Frustmq/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275883244,"owners_count":25545490,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-19T02:00:09.700Z","response_time":108,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","cloud-native","distributed-systems","google-cloud","message-queue","object-storage","quic","rust","streaming","webassembly"],"created_at":"2025-08-09T04:01:57.473Z","updated_at":"2025-09-19T05:02:52.216Z","avatar_url":"https://github.com/cloudymoma.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RustMQ: Cloud-Native Distributed Message Queue System\n\n[![Build Status](https://github.com/cloudymoma/rustmq/workflows/Rust/badge.svg)](https://github.com/cloudymoma/rustmq/actions)\n[![License: BDL 1.0](https://img.shields.io/badge/License-BDL%201.0-red.svg)](LICENSE)\n[![Rust Version](https://img.shields.io/badge/rust-1.88+-blue.svg)](https://www.rust-lang.org)\n[![Version](https://img.shields.io/badge/version-1.0.0-green.svg)](https://github.com/rustmq/rustmq)\n\nRustMQ is a next-generation, cloud-native distributed message queue system that combines the high-performance characteristics of Apache Kafka with the cost-effectiveness and operational simplicity of modern cloud architectures. Built from the ground up in Rust, RustMQ leverages a shared-storage architecture that decouples compute from storage, enabling unprecedented elasticity, cost savings, and operational efficiency.\n\n**Optimized for Google Cloud Platform**: RustMQ is designed with Google Cloud services as the default target, leveraging Google Cloud Storage for cost-effective object storage and Google Kubernetes Engine for orchestration, with all configurations defaulting to the `us-central1` region for optimal performance and cost efficiency.\n\n## 🚀 Quick Start\n\n### Development Cluster (Single Zone)\n```bash\n# Build images locally\ncd docker/ \u0026\u0026 ./quick-deploy.sh dev-build\n\n# Deploy development cluster  \ncd ../gke/ \u0026\u0026 ./deploy-rustmq-gke.sh deploy --environment development\n```\n\n### Production Cluster (Single Zone)\n```bash\n# Build and push production images\ncd docker/ \u0026\u0026 PROJECT_ID=your-project ./quick-deploy.sh production-images\n\n# Deploy production cluster\ncd ../gke/ \u0026\u0026 PROJECT_ID=your-project ./deploy-rustmq-gke.sh deploy --environment production\n```\n\nFor detailed setup, see [GKE Deployment Guide](docs/gke-deployment-guide.md).\n\n## 🚀 Key Features\n\n- **10x Cost Reduction**: 90% storage cost savings through single-copy storage in Google Cloud Storage\n- **100x Elasticity**: Instant scaling with stateless brokers and metadata-only operations  \n- **Single-Digit Millisecond Latency**: Optimized write path with local NVMe WAL and [zero-copy data movement](docs/zero-copy-optimization.md)\n- **Sub-Microsecond Security**: Enterprise-grade security with 547ns authorization decisions and 2M+ ops/sec\n- **QUIC/HTTP3 Protocol**: Reduced connection overhead and head-of-line blocking elimination\n- **WebAssembly ETL**: Real-time data processing with secure sandboxing and smart filtering\n- **Auto-Balancing**: Continuous load distribution optimization\n- **Google Cloud Native**: Default configurations optimized for GCP services \n\n\n## 🏗️ Architecture Overview\n\nRustMQ implements a **storage-compute separation architecture** with stateless brokers and shared cloud storage for unprecedented elasticity and cost efficiency.\n\n![RustMQ Architecture](docs/rustmq-architecture.svg)\n\n*Click to view the interactive architecture diagram showing RustMQ's innovative storage-compute separation design*\n\n### Key Architectural Principles\n\n1. **Storage-Compute Separation**: Brokers are stateless; all persistent data in shared object storage\n2. **Intelligent Tiered Storage**: Hot data in WAL/cache, cold data in object storage\n3. **Replication Without Data Movement**: Shared storage enables instant failover\n4. **QUIC/HTTP3 Protocol**: Modern transport for reduced latency and head-of-line blocking elimination\n5. **Raft Consensus**: Distributed coordination for metadata and cluster management\n6. **Auto-scaling \u0026 Operations**: Cloud-native operational capabilities with Kubernetes integration\n\n### Architecture Layers\n\nThe diagram above illustrates RustMQ's enhanced layered architecture with enterprise security:\n\n- **🔵 Client Layer** - Production-ready SDKs (Rust, Go) with mTLS support and comprehensive admin CLI with complete security management suite\n- **🟡 Enterprise Security Layer** - Zero Trust architecture with mTLS authentication, multi-level ACL cache (547ns/1310ns/754ns), certificate management, and 2M+ ops/sec authorization capacity\n- **🟢 Broker Cluster** - Stateless compute nodes with MessageBrokerCore, enhanced QUIC/gRPC servers featuring circuit breaker patterns, connection pooling, and real-time health monitoring\n- **🟠 Tiered Storage** - Intelligent WAL with upload triggers, workload-isolated caching (hot/cold), and optimized object storage with bandwidth limiting\n- **🟣 Controller Cluster** - Raft consensus with distributed ACL storage, metadata management, cluster coordination, and comprehensive admin REST API with advanced rate limiting\n- **🔴 Operational Layer** - Production-ready operations with zero-downtime rolling upgrades, automated scaling with partition rebalancing, and Kubernetes integration with volume recovery\n- **🟦 Integration Layer** - WebAssembly ETL processing with sandboxing, BigQuery streaming with schema mapping, and comprehensive monitoring infrastructure\n\n### Data Flow Patterns\n\n#### Core Message Flows\n- **Write Path**: `Client → QUIC → Broker → WAL → Cache → Object Storage`\n- **Read Path**: `Client ← QUIC ← Broker ← Cache ← Object Storage (if cache miss)`\n- **Replication**: `Leader → Followers (Metadata Only)` - no data movement due to shared storage\n\n#### Security \u0026 Authentication Flows\n- **Client Authentication**: `Client → mTLS/QUIC → Broker → Certificate Validation → Principal Extraction`\n- **Authorization**: `Broker → Multi-Level Cache (L1/L2/L3) → Controller ACL → Permission Decision`\n- **Inter-Service**: `Broker ←mTLS/gRPC→ Controller ←mTLS/Raft→ Controller (Cluster)`\n\n#### Administrative \u0026 Operational Flows\n- **Admin Operations**: `Admin CLI/API → Controller → Cluster Coordination → Broker Updates`\n- **Health Monitoring**: `Background Tasks → Broker Health → Admin API → Real-time Status`\n- **Scaling Operations**: `Controller → Partition Rebalancing → Broker Addition/Removal → Traffic Migration`\n\n## 📋 Table of Contents\n\n- [Quick Start](#-quick-start)\n- [Security Management CLI](#-security-management-cli)\n- [Enterprise Security](#-enterprise-security)\n- [Admin REST API](#-admin-rest-api)\n- [BigQuery Subscriber](#-bigquery-subscriber)\n- [WebAssembly ETL Processing](#-webassembly-etl-processing)\n- [Zero-Copy Optimization](docs/zero-copy-optimization.md)\n- [Google Cloud Platform Setup](#-google-cloud-platform-setup)\n- [Configuration](#-configuration) - **Configuration Guide Available: [docs/configuration-guide.md](docs/configuration-guide.md)**\n- [Message Broker Core API](#-message-broker-core-api)\n- [Client SDKs](#-client-sdks)\n- [Usage Examples](#-usage-examples)\n- [Development \u0026 Troubleshooting](#-development--troubleshooting)\n  - [Environment Setup](#-environment-setup) - **New automated development/production setup**\n- [Contributing](#-contributing)\n- [Docker \u0026 Kubernetes Setup](docker/README.md)\n\n\n## 🏃 Quick Start\n\n### Prerequisites\n\n- **Rust 1.73+ and Cargo** - Core development environment\n- **Docker and Docker Compose** - Container orchestration (see [docker/README.md](docker/README.md))\n- **Google Cloud SDK** - For BigQuery integration and GCP services\n- **kubectl** - For Kubernetes deployment (see [docker/README.md](docker/README.md))\n\n### Production Setup\n\n#### Option 1: Docker Environment (Recommended)\n```bash\n# Clone the repository\ngit clone https://github.com/cloudymoma/rustmq.git\ncd rustmq\n\n# Start complete RustMQ cluster with all services\ncd docker \u0026\u0026 docker-compose up -d\n\n# Verify cluster health\ncurl http://localhost:9642/health\n\n# Access services:\n# - Broker QUIC: localhost:9092 (mTLS enabled)\n# - Admin REST API: localhost:9642 (rate limiting enabled)\n# - Controller: localhost:9094 (Raft consensus)\n```\n\n#### Option 2: Local Build \u0026 Run\n```bash\n# Build all production binaries\ncargo build --release\n\n# Build with io_uring for optimal I/O performance (Linux only)\ncargo build --release --features io-uring\n\n# Verify all tests pass (300+ tests)\ncargo test --release\n\n# Start controller cluster (Raft consensus + ACL storage)\n./target/release/rustmq-controller --config config/controller.toml \u0026\n\n# Start broker with security enabled\n./target/release/rustmq-broker --config config/broker.toml \u0026\n\n# Initialize security infrastructure\n./target/release/rustmq-admin ca init --cn \"RustMQ Root CA\" --org \"MyOrg\"\n./target/release/rustmq-admin certs issue --principal \"broker-01\" --role broker\n\n# Start Admin REST API with rate limiting\n./target/release/rustmq-admin serve-api 8080\n```\n\n### Basic Operations\n\n#### Topic Management\n```bash\n# Create topic with replication\n./target/release/rustmq-admin create-topic user-events 12 3\n\n# List all topics with details\n./target/release/rustmq-admin list-topics\n\n# Get comprehensive topic information\n./target/release/rustmq-admin describe-topic user-events\n\n# Check cluster health\n./target/release/rustmq-admin cluster-health\n```\n\n#### Security Operations\n```bash\n# Certificate management\n./target/release/rustmq-admin certs list --role broker\n./target/release/rustmq-admin certs validate --cert-file /path/to/cert.pem\n\n# ACL management\n./target/release/rustmq-admin acl create \\\n  --principal \"app@company.com\" \\\n  --resource \"topic.events.*\" \\\n  --permissions read,write \\\n  --effect allow\n\n# Security monitoring\n./target/release/rustmq-admin security status\n./target/release/rustmq-admin audit logs --since \"2024-01-01T00:00:00Z\"\n```\n\n### Client SDK Usage\n\n#### Rust SDK (Production-Ready)\n```bash\ncd sdk/rust\n\n# Secure producer with mTLS\ncargo run --example secure_producer\n\n# Consumer with ACL authorization\ncargo run --example secure_consumer\n\n# JWT token authentication\ncargo run --example token_authentication\n```\n\n#### Go SDK (Production-Ready)\n```bash\ncd sdk/go\n\n# Basic producer with TLS\ngo run examples/tls_producer.go\n\n# Consumer with health monitoring\ngo run examples/health_monitoring_consumer.go\n\n# Advanced connection management\ngo run examples/connection_pooling.go\n```\n\n### Performance Validation\n\n```bash\n# Run benchmark tests\ncargo bench\n\n# Validate security performance (sub-microsecond authorization)\ncargo test --release security::performance\n\n# Test scaling operations\n./target/release/rustmq-admin scaling add-brokers 3\n\n# Verify zero-downtime upgrades\n./target/release/rustmq-admin operations rolling-upgrade --version latest\n```\n\n\n## 🔐 Security Management CLI\n\nRustMQ now includes a comprehensive security command suite that extends the admin CLI with complete certificate and ACL management capabilities. The security CLI provides enterprise-grade security operations through an intuitive command-line interface.\n\n### Key Security Features\n\n- **Certificate Authority Management**: Create and manage root CAs with simplified architecture\n- **Certificate Lifecycle**: Issue, renew, rotate, revoke, and validate certificates  \n- **Access Control Lists (ACL)**: Create, manage, and test authorization rules\n- **Security Auditing**: View audit logs, real-time events, and operation history\n- **Security Operations**: System status, metrics, health checks, and maintenance\n- **Multiple Output Formats**: Table, JSON, YAML, and CSV support\n\n### Security Command Structure\n\n```bash\nrustmq-admin [OPTIONS] \u003cCOMMAND\u003e\n\nSecurity Commands:\n  ca        Certificate Authority management commands\n  certs     Certificate lifecycle management commands  \n  acl       ACL management commands\n  audit     Security audit commands\n  security  General security commands\n\nGlobal Options:\n  --api-url \u003cURL\u003e     Admin API base URL (default: http://127.0.0.1:8080)\n  --format \u003cFORMAT\u003e   Output format (table, json, yaml, csv)\n  --no-color          Disable colored output\n  --verbose           Enable verbose output\n```\n\n### Certificate Authority Operations\n\n```bash\n# Initialize root CA\nrustmq-admin ca init \\\n  --cn \"RustMQ Root CA\" \\\n  --org \"RustMQ Corp\" \\\n  --country US \\\n  --validity-years 10\n\n# List CAs with filtering\nrustmq-admin ca list --status active --format table\n\n# View CA information\nrustmq-admin ca info root_ca_1\n\n# Export CA certificate for client distribution\nrustmq-admin ca export --ca-id root_ca_1 --output ca-cert.pem\n```\n\n### Certificate Management\n\n```bash\n# Issue certificates for different roles\nrustmq-admin certs issue \\\n  --principal \"broker-01.rustmq.com\" \\\n  --role broker \\\n  --san \"broker-01\" \\\n  --san \"192.168.1.100\" \\\n  --validity-days 365\n\n# Certificate lifecycle operations\nrustmq-admin certs list --filter active --role broker\nrustmq-admin certs renew cert_12345\nrustmq-admin certs rotate cert_12345  # Generate new key pair\nrustmq-admin certs revoke cert_12345 --reason \"key-compromise\"\n\n# Certificate validation and status\nrustmq-admin certs expiring --days 30\nrustmq-admin certs validate --cert-file /path/to/cert.pem\nrustmq-admin certs status cert_12345\n```\n\n### ACL Management\n\n```bash\n# Create ACL rules with conditions\nrustmq-admin acl create \\\n  --principal \"user@domain.com\" \\\n  --resource \"topic.users.*\" \\\n  --permissions \"read,write\" \\\n  --effect allow \\\n  --conditions \"source_ip=192.168.1.0/24\"\n\n# ACL evaluation and testing\nrustmq-admin acl test \\\n  --principal \"user@domain.com\" \\\n  --resource \"topic.users.data\" \\\n  --operation read\n\nrustmq-admin acl permissions \"user@domain.com\"\nrustmq-admin acl bulk-test --input-file test_cases.json\n\n# ACL operations\nrustmq-admin acl sync --force\nrustmq-admin acl cache warm --principals \"user1,user2\"\n```\n\n### Security Auditing\n\n```bash\n# View audit logs with filtering\nrustmq-admin audit logs \\\n  --since \"2024-01-01T00:00:00Z\" \\\n  --type certificate_issued \\\n  --limit 50\n\n# Real-time event monitoring\nrustmq-admin audit events --follow --filter authentication\n\n# Operation-specific audits\nrustmq-admin audit certificates --operation revoke\nrustmq-admin audit acl --principal \"admin@domain.com\"\n```\n\n### Security Operations\n\n```bash\n# System status and health\nrustmq-admin security status\nrustmq-admin security metrics\nrustmq-admin security health\n\n# Maintenance operations\nrustmq-admin security cleanup --expired-certs --dry-run\nrustmq-admin security backup --output backup.json --include-certs\nrustmq-admin security restore --input backup.json --force\n```\n\n### Output Format Examples\n\n**Table Format (Human-readable)**:\n```\nCERTIFICATE_ID    COMMON_NAME              STATUS    EXPIRES_IN\ncert_12345       broker-01.rustmq.com     active    335 days\ncert_67890       client-01.rustmq.com     active    280 days\n```\n\n**JSON Format (Machine-readable)**:\n```bash\nrustmq-admin certs list --format json | jq '.[] | {id: .certificate_id, cn: .common_name}'\n```\n\n**CSV Format (Spreadsheet-compatible)**:\n```bash\nrustmq-admin acl list --format csv \u003e acl_rules.csv\n```\n\n### User Experience Features\n\n- **Progress Indicators**: Visual feedback for long-running operations\n- **Confirmation Prompts**: Safety checks for destructive operations (use `--force` to skip)\n- **Color Output**: Rich formatting with `--no-color` option for scripts\n- **Comprehensive Error Handling**: Clear error messages with troubleshooting hints\n\n### Documentation and Examples\n\n- **Complete Documentation**: [`docs/admin_cli_security.md`](docs/admin_cli_security.md)\n- **Interactive Demo**: [`examples/admin_cli_security_demo.sh`](examples/admin_cli_security_demo.sh)\n- **Unit Tests**: Run `cargo test --bin rustmq-admin` for comprehensive test coverage\n\n### Integration\n\nThe security CLI integrates seamlessly with:\n- **Admin REST API**: All commands use standardized REST endpoints\n- **Rate Limiting**: Handles API rate limits gracefully with retry logic\n- **Authentication**: Configurable API authentication and authorization\n- **Existing Commands**: Backward compatible with all existing topic management commands\n\n## 🔐 Enterprise Security\n\nRustMQ provides enterprise-grade security with Zero Trust architecture, delivering **sub-microsecond authorization performance** while maintaining the highest security standards.\n\n### Key Security Features\n\n- **mTLS Authentication**: Mutual TLS for all client-broker communications with **validated certificate chains** (fixed August 2025)\n- **Ultra-Fast Authorization**: Multi-level ACL caching (L1/L2/L3) with **sub-microsecond latency**\n- **Complete Certificate Management**: Full CA operations, automated renewal, and revocation capabilities with **proper X.509 certificate signing**\n- **Distributed ACL System**: Raft consensus for consistent authorization policies across the cluster\n- **Zero Trust Architecture**: Every request authenticated and authorized with comprehensive audit trails\n- **Performance-Oriented Design**: String interning, batch fetching, and intelligent caching for production workloads\n- **✅ Production-Ready X.509**: Proper certificate signing chains ensuring enterprise-grade security\n- **✅ Advanced Certificate Caching**: WebPKI-based cache keys with intelligent invalidation and batch operations\n\n### ⚡ Measured Performance Characteristics\n\n**Benchmark Results (Verified in Production)**:\n- **L1 Cache**: **547ns** (54% better than 1200ns target) - 1.8M ops/sec capacity\n- **L2 Cache**: **1,310ns** (74% better than 5μs target) - 763K ops/sec capacity  \n- **Bloom Filter**: **754ns** (25% better than 1μs target) - 1.3M ops/sec capacity\n- **System Throughput**: **2.08M operations/second** (108% better than 1M target)\n- **Memory Efficiency**: **60-80% reduction** through string interning and optimized data structures\n- **Authentication**: **\u003c1ms** certificate validation with caching and principal extraction\n- **Zero False Negatives**: **0%** false negative rate (mathematically guaranteed)\n\n**Production Readiness**: ✅ All SLA targets exceeded by significant margins\n\n### Security Architecture\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                    Zero Trust Security                      │\n├─────────────────────────────────────────────────────────────┤\n│  Client Certificate ──mTLS──\u003e Authentication Manager        │\n│       │                              │                     │\n│       └──\u003e Principal Extraction ──\u003e Authorization Manager  │\n│                                       │                     │\n│               ┌─────────────────────────┴───────────────────┐\n│               │        Multi-Level ACL Cache               │\n│               │ L1 (547ns) → L2 (1310ns) → L3 Bloom (754ns)│\n│               │              │                             │\n│               │              ▼                             │\n│               │         Controller ACL                      │\n│               │      (Raft Consensus)                      │\n│               └─────────────────────────────────────────────┘\n│                                                             │\n└─────────────────────────────────────────────────────────────┘\n```\n\n### Quick Security Setup\n\n```bash\n# Initialize root CA for your organization\nrustmq-admin ca init --cn \"RustMQ Root CA\" --org \"MyOrg\" --validity-years 10\n\n# Issue broker certificate with proper role\nrustmq-admin certs issue \\\n  --principal \"broker-01.internal.company.com\" \\\n  --role broker \\\n  --san \"broker-01\" \\\n  --san \"192.168.1.100\" \\\n  --validity-days 365\n\n# Issue client certificate for application\nrustmq-admin certs issue \\\n  --principal \"app@company.com\" \\\n  --role client \\\n  --validity-days 90\n\n# Create comprehensive ACL rule with conditions\nrustmq-admin acl create \\\n  --principal \"app@company.com\" \\\n  --resource \"topic.events.*\" \\\n  --permissions read,write \\\n  --effect allow \\\n  --conditions \"source_ip=10.0.0.0/8,time_range=09:00-17:00\"\n\n# Test authorization before deployment\nrustmq-admin acl test \\\n  --principal \"app@company.com\" \\\n  --resource \"topic.events.user-login\" \\\n  --operation read\n\n# Check comprehensive security status\nrustmq-admin security status\n```\n\n### Security Components\n\n#### Certificate Authority Management\n- **Root CA Operations**: Initialize, manage, and rotate certificate authorities\n- **Intermediate CAs**: Create delegated CAs for different environments or teams\n- **Certificate Lifecycle**: Issue, renew, rotate, and revoke certificates with audit trails\n- **Automated Validation**: Real-time certificate status checking and validation\n\n#### mTLS Authentication\n- **Mutual Authentication**: Both client and server certificate validation\n- **Principal Extraction**: Automatic extraction of identity from certificate subjects\n- **Certificate Caching**: High-performance certificate validation with intelligent caching\n- **Revocation Checking**: Support for CRL and OCSP certificate revocation\n\n#### Multi-Level Authorization\n- **L1 Cache (Connection-Local)**: ~10ns lookup for frequently accessed permissions\n- **L2 Cache (Broker-Wide)**: ~50ns lookup with LRU eviction and sharding\n- **L3 Bloom Filter**: ~20ns negative lookup rejection to reduce controller load\n- **Batch Fetching**: Intelligent batching reduces controller RPC load by 10-100x\n\n#### Access Control Lists (ACL)\n- **Resource Patterns**: Flexible pattern matching for topics, consumer groups, and operations\n- **Conditional Rules**: IP-based, time-based, and custom condition support\n- **Raft Consensus**: Distributed ACL storage with strong consistency guarantees\n- **Policy Management**: Comprehensive policy creation, testing, and management tools\n\n### Security Documentation\n\nComprehensive security documentation is available:\n\n#### **Performance \u0026 Architecture**\n- **[Security Performance](docs/security/SECURITY_PERFORMANCE.md)** - Comprehensive performance metrics, benchmarks, and SLA compliance  \n- **[Cache Architecture](docs/security/CACHE_ARCHITECTURE.md)** - Detailed multi-tier cache design and implementation\n- **[Security Benchmarks](docs/security/SECURITY_BENCHMARKS.md)** - Complete benchmark results and production readiness validation\n- **[Performance Tuning Guide](docs/security/SECURITY_TUNING_GUIDE.md)** - Configuration optimization and troubleshooting\n- **[Certificate Signing Implementation](docs/security/CERTIFICATE_SIGNING_IMPLEMENTATION.md)** - Technical details of the certificate signing fix and X.509 implementation\n\n#### **Operations \u0026 Configuration**\n- **[Security Architecture](docs/security/SECURITY_ARCHITECTURE.md)** - Complete architectural overview and design principles\n- **[Configuration Guide](docs/security/SECURITY_CONFIGURATION.md)** - Security configuration parameters and examples\n- **[Certificate Management](docs/security/CERTIFICATE_MANAGEMENT.md)** - Complete certificate lifecycle operations\n- **[ACL Management](docs/security/ACL_MANAGEMENT.md)** - Access control policy creation and management\n- **[Admin API Security](docs/security/ADMIN_API_SECURITY.md)** - Security API reference and usage examples\n- **[CLI Security Guide](docs/security/CLI_SECURITY_GUIDE.md)** - Command-line security operations reference\n- **[Kubernetes Security](docs/security/KUBERNETES_SECURITY.md)** - Kubernetes deployment with mTLS\n- **[Production Security](docs/security/PRODUCTION_SECURITY.md)** - Production deployment security checklist\n- **[Security Best Practices](docs/security/SECURITY_BEST_PRACTICES.md)** - Security best practices and recommendations\n- **[Troubleshooting](docs/security/TROUBLESHOOTING.md)** - Common security issues and solutions\n\n### Security Examples\n\nReady-to-use security examples and configurations:\n\n```bash\n# Basic mTLS setup\nexamples/security/basic_mtls_setup/\n\n# ACL policy examples\nexamples/security/acl_policies/\n\n# Certificate rotation workflow\nexamples/security/certificate_rotation/\n\n# Security monitoring setup\nexamples/security/monitoring/\n\n# Kubernetes security manifests\nexamples/security/kubernetes/\n```\n\n### Production Security Features\n\n#### Enterprise-Grade Authentication\n- **Zero Trust Model**: Every request requires valid certificate and authorization\n- **Multi-Factor Security**: Certificate-based identity plus ACL-based authorization\n- **Audit Trails**: Comprehensive logging of all security events and decisions\n- **Performance Monitoring**: Real-time security metrics and performance tracking\n\n#### Advanced Authorization\n- **Sub-100ns Performance**: Production-optimized authorization with multi-level caching\n- **Memory Efficient**: String interning reduces memory usage by 60-80%\n- **Highly Scalable**: Linear performance scaling up to 100K ACL rules\n- **Intelligent Caching**: Bloom filters and LRU caches minimize controller load\n\n#### Certificate Management\n- **Automated Lifecycle**: Automated certificate renewal and rotation capabilities\n- **Simplified CA Architecture**: Root CA only for improved performance and reduced complexity\n- **Revocation Management**: Real-time certificate revocation and status checking\n- **Role-Based Certificates**: Different certificate types for brokers, clients, and admins\n\n### Integration with RustMQ Components\n\n- **QUIC Server**: Enhanced with mTLS support for secure client connections\n- **Admin REST API**: Complete security API with 30+ endpoints for certificate and ACL management\n- **Controller Cluster**: Raft consensus integration for distributed ACL storage\n- **Broker Network**: Secure broker-to-broker communication with certificate validation\n- **Monitoring**: Security metrics integration with performance and health monitoring\n\n\n## 🛠️ Admin REST API\n\nRustMQ provides a comprehensive REST API for cluster management, monitoring, and operations. The Admin API includes real-time health tracking, topic management, broker monitoring, and operational metrics.\n\n### 🚀 Key Features\n\n- **Real-time Health Monitoring**: Live broker health tracking with automatic timeout detection\n- **Cluster Status**: Comprehensive cluster health assessment with leadership tracking\n- **Topic Management**: CRUD operations for topics with partition and replication management\n- **Broker Operations**: Broker listing with health status and rack awareness\n- **Operational Metrics**: Uptime tracking and performance monitoring\n- **Advanced Rate Limiting**: Token bucket algorithm with configurable global, per-IP, and endpoint-specific limits\n- **Production Ready**: Comprehensive error handling and JSON API responses\n\n### 🏃 Quick Start\n\nStart the Admin API server:\n\n```bash\n# Start with default settings (port 8080)\n./target/release/rustmq-admin serve-api\n\n# Start on custom port\n./target/release/rustmq-admin serve-api 9642\n\n# Docker environment (included in docker-compose)\ndocker-compose up -d\n# Admin API available at http://localhost:9642\n```\n\n### 📊 Health Monitoring\n\nThe Admin API provides comprehensive health monitoring with real-time broker tracking:\n\n#### Health Endpoint\n```bash\n# Check service health and uptime\ncurl http://localhost:8080/health\n```\n\n**Response:**\n```json\n{\n  \"status\": \"ok\",\n  \"version\": \"0.1.0\",\n  \"uptime_seconds\": 3600,\n  \"is_leader\": true,\n  \"raft_term\": 1\n}\n```\n\n#### Cluster Status\n```bash\n# Get comprehensive cluster status\ncurl http://localhost:8080/api/v1/cluster\n```\n\n**Response:**\n```json\n{\n  \"success\": true,\n  \"data\": {\n    \"brokers\": [\n      {\n        \"id\": \"broker-1\",\n        \"host\": \"localhost\",\n        \"port_quic\": 9092,\n        \"port_rpc\": 9093,\n        \"rack_id\": \"rack-1\",\n        \"online\": true\n      },\n      {\n        \"id\": \"broker-2\",\n        \"host\": \"localhost\",\n        \"port_quic\": 9192,\n        \"port_rpc\": 9193,\n        \"rack_id\": \"rack-2\", \n        \"online\": false\n      }\n    ],\n    \"topics\": [],\n    \"leader\": \"controller-1\",\n    \"term\": 1,\n    \"healthy\": true\n  },\n  \"error\": null,\n  \"leader_hint\": null\n}\n```\n\n### 📋 Topic Management\n\n#### List Topics\n```bash\ncurl http://localhost:8080/api/v1/topics\n```\n\n#### Create Topic\n```bash\ncurl -X POST http://localhost:8080/api/v1/topics \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"name\": \"user-events\",\n    \"partitions\": 12,\n    \"replication_factor\": 3,\n    \"retention_ms\": 604800000,\n    \"segment_bytes\": 1073741824,\n    \"compression_type\": \"lz4\"\n  }'\n```\n\n**Response:**\n```json\n{\n  \"success\": true,\n  \"data\": \"Topic 'user-events' created\",\n  \"error\": null,\n  \"leader_hint\": \"controller-1\"\n}\n```\n\n#### Describe Topic\n```bash\ncurl http://localhost:8080/api/v1/topics/user-events\n```\n\n**Response:**\n```json\n{\n  \"success\": true,\n  \"data\": {\n    \"name\": \"user-events\",\n    \"partitions\": 12,\n    \"replication_factor\": 3,\n    \"config\": {\n      \"retention_ms\": 604800000,\n      \"segment_bytes\": 1073741824,\n      \"compression_type\": \"lz4\"\n    },\n    \"created_at\": \"2024-01-15T10:30:00Z\",\n    \"partition_assignments\": [\n      {\n        \"partition\": 0,\n        \"leader\": \"broker-1\",\n        \"replicas\": [\"broker-1\", \"broker-2\", \"broker-3\"],\n        \"in_sync_replicas\": [\"broker-1\", \"broker-2\"],\n        \"leader_epoch\": 1\n      }\n    ]\n  },\n  \"error\": null,\n  \"leader_hint\": \"controller-1\"\n}\n```\n\n#### Delete Topic\n```bash\ncurl -X DELETE http://localhost:8080/api/v1/topics/user-events\n```\n\n### 🖥️ Broker Management\n\n#### List Brokers\n```bash\ncurl http://localhost:8080/api/v1/brokers\n```\n\n**Response:**\n```json\n{\n  \"success\": true,\n  \"data\": [\n    {\n      \"id\": \"broker-1\",\n      \"host\": \"localhost\",\n      \"port_quic\": 9092,\n      \"port_rpc\": 9093,\n      \"rack_id\": \"us-central1-a\",\n      \"online\": true\n    },\n    {\n      \"id\": \"broker-2\", \n      \"host\": \"localhost\",\n      \"port_quic\": 9192,\n      \"port_rpc\": 9193,\n      \"rack_id\": \"us-central1-b\",\n      \"online\": true\n    }\n  ],\n  \"error\": null,\n  \"leader_hint\": \"controller-1\"\n}\n```\n\n### 🔧 Health Tracking System\n\nThe Admin API includes a sophisticated health tracking system with comprehensive broker health monitoring:\n\n#### Features\n- **Background Health Monitoring**: Automatic health checks every 15 seconds\n- **Timeout-based Health Assessment**: Configurable 30-second health timeout\n- **Intelligent Cluster Health**: Smart health calculation for small clusters\n- **Real-time Updates**: Live health status in all broker-related endpoints\n- **Stale Entry Cleanup**: Automatic cleanup of old health data\n- **🆕 Broker Health Check API**: Comprehensive broker health assessment with component-level monitoring\n\n#### Health Check Logic\n- **Healthy**: Last successful health check within 30 seconds\n- **Unhealthy**: No successful health check or timeout exceeded\n- **Cluster Health**: For ≤2 brokers: healthy if ≥1 broker healthy + leader exists\n- **Large Clusters**: Healthy if majority of brokers healthy + leader exists\n\n#### Broker Health Check\nThe newly implemented broker health check provides detailed component-level monitoring:\n- **WAL Health**: Write-ahead log performance and status monitoring\n- **Cache Health**: Memory cache hit rates and efficiency metrics\n- **Object Storage Health**: Cloud storage connectivity and upload performance\n- **Network Health**: Connection status and throughput monitoring\n- **Replication Health**: Follower sync status and replication lag tracking\n- **Resource Usage**: CPU, memory, disk, and network utilization statistics\n\nFor detailed configuration and usage, see [Broker Health Monitoring](docs/broker-health-monitoring.md).\n\n### 🚨 Error Handling\n\nThe Admin API provides comprehensive error handling with detailed responses:\n\n#### Error Response Format\n```json\n{\n  \"success\": false,\n  \"data\": null,\n  \"error\": \"Detailed error message\",\n  \"leader_hint\": \"controller-2\"\n}\n```\n\n#### Common Error Scenarios\n- **Topic Not Found** (404): Topic doesn't exist\n- **Insufficient Brokers**: Not enough brokers for replication factor\n- **Leader Not Available**: Controller leader election in progress\n- **Invalid Configuration**: Malformed request parameters\n\n### 🛡️ Rate Limiting\n\nThe Admin API includes sophisticated rate limiting to protect against abuse and ensure fair resource usage. Rate limiting is implemented using the Token Bucket algorithm with configurable limits for different scenarios.\n\n#### 🚀 Key Features\n\n- **Token Bucket Algorithm**: Industry-standard rate limiting with burst capacity\n- **Multi-level Rate Limiting**: Global, per-IP, and endpoint-specific limits\n- **Automatic Cleanup**: Background cleanup of expired rate limiters to prevent memory leaks\n- **Comprehensive Monitoring**: Real-time metrics and statistics tracking\n- **Production Ready**: Thread-safe implementation with minimal performance overhead\n\n#### 📊 Rate Limiting Categories\n\nThe Admin API applies different rate limits based on endpoint sensitivity and resource requirements:\n\n##### High-Frequency Endpoints (100 requests/minute)\n- `GET /health` - Service health checks\n- `GET /api/v1/cluster` - Cluster status monitoring\n\n##### Medium-Frequency Endpoints (30 requests/minute)  \n- `GET /api/v1/topics` - Topic listing\n- `GET /api/v1/brokers` - Broker listing\n- `GET /api/v1/topics/{name}` - Topic details\n\n##### Low-Frequency Endpoints (10 requests/minute)\n- `POST /api/v1/topics` - Topic creation\n- `DELETE /api/v1/topics/{name}` - Topic deletion\n\n#### ⚙️ Configuration\n\nRate limiting can be configured through TOML configuration or environment variables:\n\n##### TOML Configuration\n\n```toml\n[admin.rate_limiting]\nenabled = true                      # Enable/disable rate limiting (default: true)\nglobal_burst_size = 1000           # Global burst capacity (default: 1000)\nglobal_refill_rate = 60            # Global refill rate per minute (default: 60)\nper_ip_burst_size = 100            # Per-IP burst capacity (default: 100)\nper_ip_refill_rate = 30            # Per-IP refill rate per minute (default: 30)\ncleanup_interval_seconds = 3600    # Cleanup interval in seconds (default: 3600)\n\n# Endpoint-specific configuration\n[admin.rate_limiting.endpoints]\n\"/health\" = { burst_size = 50, refill_rate = 100 }\n\"/api/v1/cluster\" = { burst_size = 50, refill_rate = 100 }\n\"/api/v1/topics\" = { burst_size = 20, refill_rate = 30 }\n\"/api/v1/brokers\" = { burst_size = 20, refill_rate = 30 }\n\"POST:/api/v1/topics\" = { burst_size = 5, refill_rate = 10 }\n\"DELETE:/api/v1/topics\" = { burst_size = 5, refill_rate = 10 }\n```\n\n##### Environment Variables\n\n```bash\n# Global rate limiting settings\nRUSTMQ_ADMIN_RATE_LIMITING_ENABLED=true\nRUSTMQ_ADMIN_GLOBAL_BURST_SIZE=1000\nRUSTMQ_ADMIN_GLOBAL_REFILL_RATE=60\nRUSTMQ_ADMIN_PER_IP_BURST_SIZE=100\nRUSTMQ_ADMIN_PER_IP_REFILL_RATE=30\nRUSTMQ_ADMIN_CLEANUP_INTERVAL_SECONDS=3600\n```\n\n#### 🔍 Rate Limit Headers\n\nAll API responses include rate limiting information in the headers:\n\n```bash\n# Example response headers\nHTTP/1.1 200 OK\nX-RateLimit-Limit: 30              # Requests per minute allowed\nX-RateLimit-Remaining: 25          # Remaining requests in current window\nX-RateLimit-Reset: 1640995260      # Unix timestamp when limit resets\nX-RateLimit-Type: endpoint         # Type of rate limit applied (global/ip/endpoint)\n```\n\n#### 🚫 Rate Limit Exceeded Response\n\nWhen rate limits are exceeded, the API returns a 429 status code:\n\n```bash\n# Request\ncurl -H \"X-Forwarded-For: 192.168.1.100\" http://localhost:8080/api/v1/topics\n\n# Response when rate limited\nHTTP/1.1 429 Too Many Requests\nX-RateLimit-Limit: 30\nX-RateLimit-Remaining: 0\nX-RateLimit-Reset: 1640995320\nX-RateLimit-Type: ip\nRetry-After: 60\n\n{\n  \"success\": false,\n  \"data\": null,\n  \"error\": \"Rate limit exceeded for IP 192.168.1.100. Limit: 30 requests per minute\",\n  \"leader_hint\": null\n}\n```\n\n#### 🎯 Rate Limiting Strategy\n\nThe Admin API employs a hierarchical rate limiting strategy:\n\n1. **Global Rate Limit**: Applied to all requests to prevent system overload\n2. **Per-IP Rate Limit**: Applied per client IP address to prevent individual abuse\n3. **Endpoint-Specific Rate Limit**: Applied per endpoint based on resource intensity\n\nRate limits are checked in order, and the most restrictive limit applies. For example:\n- Global limit: 60 requests/minute \n- Per-IP limit: 30 requests/minute\n- Endpoint limit: 10 requests/minute\n- **Result**: Client is limited to 10 requests/minute for that endpoint\n\n#### 🔧 Operational Benefits\n\n##### Security\n- **DDoS Protection**: Prevents overwhelming the API with excessive requests\n- **Resource Protection**: Ensures critical operations aren't starved by high-frequency requests\n- **Fair Usage**: Prevents individual clients from monopolizing resources\n\n##### Performance\n- **Memory Efficient**: Automatic cleanup prevents unbounded memory growth\n- **Low Latency**: Token bucket algorithm adds minimal overhead (\u003c1μs per request)\n- **Thread Safe**: Concurrent request handling without performance degradation\n\n#### 📈 Monitoring Rate Limiting\n\nRate limiting statistics are available through the health endpoint:\n\n```bash\n# Check rate limiting statistics\ncurl http://localhost:8080/health\n\n# Response includes rate limiting metrics\n{\n  \"status\": \"ok\",\n  \"version\": \"0.1.0\", \n  \"uptime_seconds\": 3600,\n  \"is_leader\": true,\n  \"raft_term\": 1,\n  \"rate_limiting\": {\n    \"enabled\": true,\n    \"active_limiters\": 15,\n    \"total_requests\": 1250,\n    \"blocked_requests\": 25,\n    \"last_cleanup\": \"2024-01-15T10:30:00Z\"\n  }\n}\n```\n\n#### 🛠️ Development and Testing\n\nFor development environments, rate limiting can be disabled or configured with higher limits:\n\n```toml\n# Development configuration\n[admin.rate_limiting]\nenabled = false                     # Disable for local development\n\n# Or use high limits for testing\nenabled = true\nglobal_refill_rate = 10000         # Very high global limit\nper_ip_refill_rate = 1000          # High per-IP limit\n```\n\n#### 🚀 Production Recommendations\n\nFor production deployments:\n\n1. **Monitor Rate Limiting Metrics**: Track blocked requests and adjust limits as needed\n2. **Configure Endpoint-Specific Limits**: Set appropriate limits based on operational patterns\n3. **Use Load Balancers**: Distribute traffic across multiple Admin API instances\n4. **Alert on High Block Rates**: Set up alerts if \u003e 5% of requests are being blocked\n5. **Regular Review**: Periodically review and adjust rate limits based on usage patterns\n\n### 📈 Production Deployment\n\n#### Production Deployment\nFor production deployment with Kubernetes, see the comprehensive guide in [docker/README.md](docker/README.md) which includes:\n\n- Complete Kubernetes manifests\n- Service configuration\n- Health check setup\n- Production resource limits\n- Security configurations\n\n### 🧪 Testing\n\nThe Admin API includes comprehensive test coverage:\n\n- **11 Unit Tests**: All API endpoints and health tracking functionality\n- **Integration Testing**: End-to-end API workflows with mock backends\n- **Error Scenario Testing**: Comprehensive error condition validation\n- **Performance Testing**: Health tracking timeout and expiration behavior\n\n#### Running Tests\n```bash\n# Run admin API tests\ncargo test admin::api\n\n# Run specific health tracking tests\ncargo test test_broker_health_tracking test_cluster_health_calculation\n\n# All admin tests pass\n# test result: ok. 11 passed; 0 failed; 0 ignored; 0 measured\n```\n\n### 🔧 Configuration\n\n#### Environment Variables\n```bash\n# Admin API configuration\nADMIN_API_PORT=8080\nHEALTH_CHECK_INTERVAL=15    # seconds\nHEALTH_TIMEOUT=30          # seconds\n```\n\n#### TOML Configuration\n```toml\n[admin]\nport = 8080\nhealth_check_interval_ms = 15000\nhealth_timeout_ms = 30000\nenable_cors = true\nlog_requests = true\n```\n\n### 🔍 Monitoring Integration\n\nThe Admin API provides monitoring endpoints for observability:\n\n#### Metrics Endpoint (Future)\n```bash\n# Prometheus metrics (planned)\ncurl http://localhost:8080/metrics\n```\n\n#### Log Analysis\n```bash\n# View API request logs\ndocker-compose logs rustmq-admin\n\n# Filter for health check logs\ndocker-compose logs rustmq-admin | grep \"Health check\"\n```\n\n## 📊 BigQuery Subscriber\n\nRustMQ includes a configurable Google Cloud BigQuery subscriber that can stream messages from RustMQ topics directly to BigQuery tables with high throughput and reliability.\n\n### Key Features\n\n- **Streaming Inserts**: Direct streaming to BigQuery using the insertAll API\n- **Storage Write API**: Future support for BigQuery Storage Write API (higher throughput)\n- **Configurable Batching**: Optimize for latency vs throughput with flexible batching\n- **Schema Mapping**: Direct, custom, or nested JSON field mapping\n- **Error Handling**: Comprehensive retry logic with dead letter handling\n- **Monitoring**: Built-in health checks and metrics endpoints\n- **Authentication**: Support for service account, metadata server, and application default credentials\n\n### Quick Start with BigQuery\n\n```bash\n# Set required environment variables\nexport GCP_PROJECT_ID=\"your-gcp-project\"\nexport BIGQUERY_DATASET=\"analytics\"\nexport BIGQUERY_TABLE=\"events\"\nexport RUSTMQ_TOPIC=\"user-events\"\n\n# Optional: Set authentication method\nexport AUTH_METHOD=\"application_default\"  # or \"service_account\"\nexport GOOGLE_APPLICATION_CREDENTIALS=\"/path/to/service-account.json\"\n\n# Start the cluster with BigQuery subscriber (from docker/ directory)\ncd docker \u0026\u0026 docker-compose --profile bigquery up -d\n\n# Check BigQuery subscriber health\ncurl http://localhost:8080/health\n\n# View metrics\ncurl http://localhost:8080/metrics\n```\n\n### Configuration Options\n\nThe BigQuery subscriber supports extensive configuration through environment variables:\n\n#### Required Configuration\n- `GCP_PROJECT_ID` - Google Cloud Project ID\n- `BIGQUERY_DATASET` - BigQuery dataset name\n- `BIGQUERY_TABLE` - BigQuery table name\n- `RUSTMQ_TOPIC` - RustMQ topic to subscribe to\n\n#### Authentication\n- `AUTH_METHOD` - Authentication method (`application_default`, `service_account`, `metadata_server`)\n- `GOOGLE_APPLICATION_CREDENTIALS` - Path to service account key file\n\n#### Batching \u0026 Performance\n- `MAX_ROWS_PER_BATCH` - Maximum rows per batch (default: 1000)\n- `MAX_BATCH_SIZE_BYTES` - Maximum batch size in bytes (default: 10MB)\n- `MAX_BATCH_LATENCY_MS` - Maximum time to wait before sending partial batch (default: 1000ms)\n- `MAX_CONCURRENT_BATCHES` - Maximum concurrent batches (default: 10)\n\n#### Schema Mapping\n- `SCHEMA_MAPPING` - Mapping strategy (`direct`, `custom`, `nested`)\n- `AUTO_CREATE_TABLE` - Whether to auto-create table if not exists (default: false)\n\n#### Error Handling\n- `MAX_RETRIES` - Maximum retry attempts (default: 3)\n- `DEAD_LETTER_ACTION` - Action for failed messages (`log`, `drop`, `dead_letter_queue`, `file`)\n- `RETRY_BASE_MS` - Base retry delay in milliseconds (default: 1000)\n- `RETRY_MAX_MS` - Maximum retry delay in milliseconds (default: 30000)\n\n### Usage Examples\n\n#### Basic Streaming Configuration\n\n```bash\n# Start with basic streaming inserts\ndocker run --rm \\\n  -e GCP_PROJECT_ID=\"my-project\" \\\n  -e BIGQUERY_DATASET=\"analytics\" \\\n  -e BIGQUERY_TABLE=\"events\" \\\n  -e RUSTMQ_TOPIC=\"user-events\" \\\n  -e RUSTMQ_BROKERS=\"rustmq-broker:9092\" \\\n  rustmq/bigquery-subscriber\n```\n\n#### High-Throughput Configuration\n\n```bash\n# Optimized for high throughput\ndocker run --rm \\\n  -e GCP_PROJECT_ID=\"my-project\" \\\n  -e BIGQUERY_DATASET=\"telemetry\" \\\n  -e BIGQUERY_TABLE=\"metrics\" \\\n  -e RUSTMQ_TOPIC=\"telemetry-data\" \\\n  -e MAX_ROWS_PER_BATCH=\"5000\" \\\n  -e MAX_BATCH_SIZE_BYTES=\"52428800\" \\\n  -e MAX_BATCH_LATENCY_MS=\"500\" \\\n  -e MAX_CONCURRENT_BATCHES=\"50\" \\\n  rustmq/bigquery-subscriber\n```\n\n#### Custom Schema Mapping\n\nCreate a custom configuration file:\n\n```toml\n# bigquery-config.toml\nproject_id = \"my-project\"\ndataset = \"transformed_data\"\ntable = \"processed_events\"\n\n[write_method.streaming_inserts]\nskip_invalid_rows = true\nignore_unknown_values = true\n\n[subscription]\ntopic = \"raw-events\"\nbroker_endpoints = [\"rustmq-broker:9092\"]\n\n[schema]\nmapping = \"custom\"\n\n[schema.column_mappings]\n\"event_id\" = \"id\"\n\"event_timestamp\" = \"timestamp\" \n\"user_data.user_id\" = \"user_id\"\n\"event_data.action\" = \"action\"\n\n[schema.default_values]\n\"processed_at\" = \"CURRENT_TIMESTAMP()\"\n\"version\" = \"1.0\"\n```\n\n```bash\n# Use custom configuration\ndocker run --rm \\\n  -v $(pwd)/bigquery-config.toml:/etc/rustmq/custom-config.toml \\\n  -e CONFIG_FILE=\"/etc/rustmq/custom-config.toml\" \\\n  rustmq/bigquery-subscriber\n```\n\n### Monitoring and Observability\n\nThe BigQuery subscriber exposes health and metrics endpoints:\n\n```bash\n# Health check endpoint\ncurl http://localhost:8080/health\n# Response: {\"status\":\"healthy\",\"last_successful_insert\":\"2023-...\", ...}\n\n# Metrics endpoint  \ncurl http://localhost:8080/metrics\n# Response: {\"messages_received\":1500,\"messages_processed\":1487, ...}\n```\n\n### Error Handling and Reliability\n\nThe subscriber includes comprehensive error handling:\n\n- **Automatic Retries**: Configurable exponential backoff for transient errors\n- **Dead Letter Handling**: Failed messages can be logged, dropped, or sent to dead letter queue\n- **Health Monitoring**: Continuous health checks with degraded/unhealthy states\n- **Graceful Shutdown**: Ensures all pending batches are processed during shutdown\n\n### Production Deployment\n\nFor production deployments:\n\n1. **Use Service Account Authentication**:\n   ```bash\n   export AUTH_METHOD=\"service_account\"\n   export GOOGLE_APPLICATION_CREDENTIALS=\"/etc/gcp/service-account.json\"\n   ```\n\n2. **Optimize Batching for Your Workload**:\n   - High volume: Increase batch size and reduce latency\n   - Low latency: Reduce batch size and latency threshold\n   - Mixed workload: Use default settings\n\n3. **Monitor Key Metrics**:\n   - Messages processed per second\n   - Error rate and retry counts\n   - BigQuery insertion latency\n   - Backlog size\n\n4. **Set Up Alerting**:\n   - Health endpoint failures\n   - High error rates (\u003e5%)\n   - Growing backlog size\n   - BigQuery quota issues\n\n## 🧠 WebAssembly ETL Processing\n\nRustMQ features a powerful WebAssembly (WASM) ETL system that transforms messages in real-time as they flow through the system. This allows you to process, filter, and enrich data without additional infrastructure.\n\n### What is WASM ETL?\n\nWASM ETL lets you write custom code in Rust (or other languages) that runs safely inside RustMQ to process messages. It's like having a tiny, secure computer program that can:\n\n- **Transform data**: Convert temperature units, normalize emails, add timestamps\n- **Filter messages**: Remove spam, block oversized messages, require certain headers  \n- **Enrich content**: Add geolocation, detect language, analyze content type\n- **Split or combine**: Turn one message into many, or combine multiple messages\n\n### Key Benefits\n\n- **Safe and Secure**: Code runs in a sandbox - can't access files, network, or harm the system\n- **High Performance**: Near-native speed with efficient binary processing\n- **Hot Deployment**: Update processing logic without restarting RustMQ\n- **Priority-Based**: Process messages in stages with different priorities\n- **Smart Filtering**: Only process messages that match specific topics or conditions\n\n### Simple Example\n\nHere's what a basic message processor looks like:\n\n```rust\n// Transform JSON messages to add processing info\nif message.headers.get(\"content-type\") == Some(\"application/json\") {\n    let mut json: Value = serde_json::from_slice(\u0026message.value)?;\n    json[\"processed_at\"] = chrono::Utc::now().timestamp().into();\n    json[\"processor\"] = \"my-etl-v1\".into();\n    message.value = serde_json::to_vec(\u0026json)?;\n}\n```\n\n### Multi-Stage Processing\n\nConfigure complex pipelines that process messages in stages:\n\n1. **Priority 0** (First): Validate message format and required fields\n2. **Priority 1** (Second): Transform data and add enrichments (can run in parallel)\n3. **Priority 2** (Last): Final formatting and cleanup\n\n### Topic Filtering\n\nOnly process messages from specific topics using patterns:\n\n- **Exact**: `\"events.user.login\"` - matches exactly\n- **Wildcard**: `\"logs.*.error\"` - matches any middle part\n- **Regex**: `\"^sensor-\\\\d+\\\\.\"` - complex pattern matching\n- **Prefix/Suffix**: `\"iot.devices.\"` or `\".critical\"`\n\n### Getting Started\n\nThe complete WASM ETL system includes priority-based pipelines, instance pooling, smart filtering, and comprehensive monitoring. For detailed setup instructions, examples, and advanced configuration:\n\n**📖 [Complete WASM ETL Deployment Guide](docs/wasm-etl-deployment-guide.md)**\n\nThis guide covers:\n- Step-by-step setup and configuration\n- Writing production-ready ETL modules in Rust\n- Building and deploying WASM modules\n- Configuring multi-stage processing pipelines\n- Advanced filtering and conditional processing\n- Performance optimization and monitoring\n- Troubleshooting and best practices\n\n## ☁️ Google Cloud Platform Setup\n\n### Step 1: GCP Project Setup\n\n```bash\n# Set your project ID\nexport PROJECT_ID=\"your-rustmq-project\"\nexport REGION=\"us-central1\"\nexport ZONE=\"us-central1-a\"\n\n# Create and configure project\ngcloud projects create $PROJECT_ID\ngcloud config set project $PROJECT_ID\ngcloud auth login\n\n# Enable required APIs\ngcloud services enable container.googleapis.com\ngcloud services enable storage-api.googleapis.com\ngcloud services enable compute.googleapis.com\ngcloud services enable cloudresourcemanager.googleapis.com\n```\n\n### Step 2: GKE Cluster Setup\n\n```bash\n# Create GKE cluster with optimized node pools\ngcloud container clusters create rustmq-cluster \\\n    --zone=$ZONE \\\n    --machine-type=n2-standard-4 \\\n    --num-nodes=3 \\\n    --enable-autorepair \\\n    --enable-autoupgrade \\\n    --enable-network-policy \\\n    --enable-ip-alias \\\n    --disk-type=pd-ssd \\\n    --disk-size=50GB \\\n    --max-nodes=10 \\\n    --min-nodes=3 \\\n    --enable-autoscaling\n\n# Get credentials\ngcloud container clusters get-credentials rustmq-cluster --zone=$ZONE\n\n# Create storage class for fast SSD\nkubectl apply -f - \u003c\u003cEOF\napiVersion: storage.k8s.io/v1\nkind: StorageClass\nmetadata:\n  name: fast-ssd\nprovisioner: kubernetes.io/gce-pd\nparameters:\n  type: pd-ssd\n  replication-type: regional-pd\n  zones: us-central1-a,us-central1-b\nallowVolumeExpansion: true\nreclaimPolicy: Delete\nvolumeBindingMode: WaitForFirstConsumer\nEOF\n```\n\n### Step 3: Cloud Storage Setup\n\n```bash\n# Create bucket for object storage\ngsutil mb -c STANDARD -l $REGION gs://$PROJECT_ID-rustmq-data\n\n# Enable versioning and lifecycle management\ngsutil versioning set on gs://$PROJECT_ID-rustmq-data\n\n# Create lifecycle policy for cost optimization\ncat \u003e lifecycle.json \u003c\u003cEOF\n{\n  \"lifecycle\": {\n    \"rule\": [\n      {\n        \"action\": {\"type\": \"SetStorageClass\", \"storageClass\": \"NEARLINE\"},\n        \"condition\": {\"age\": 30}\n      },\n      {\n        \"action\": {\"type\": \"SetStorageClass\", \"storageClass\": \"COLDLINE\"},\n        \"condition\": {\"age\": 90}\n      },\n      {\n        \"action\": {\"type\": \"Delete\"},\n        \"condition\": {\"age\": 365}\n      }\n    ]\n  }\n}\nEOF\n\ngsutil lifecycle set lifecycle.json gs://$PROJECT_ID-rustmq-data\n```\n\n### Step 4: Service Account Setup\n\n```bash\n# Create service account for RustMQ\ngcloud iam service-accounts create rustmq-sa \\\n    --display-name=\"RustMQ Service Account\" \\\n    --description=\"Service account for RustMQ cluster operations\"\n\n# Grant necessary permissions\ngcloud projects add-iam-policy-binding $PROJECT_ID \\\n    --member=\"serviceAccount:rustmq-sa@$PROJECT_ID.iam.gserviceaccount.com\" \\\n    --role=\"roles/storage.objectAdmin\"\n\ngcloud projects add-iam-policy-binding $PROJECT_ID \\\n    --member=\"serviceAccount:rustmq-sa@$PROJECT_ID.iam.gserviceaccount.com\" \\\n    --role=\"roles/monitoring.writer\"\n\n# Create and download key\ngcloud iam service-accounts keys create rustmq-key.json \\\n    --iam-account=rustmq-sa@$PROJECT_ID.iam.gserviceaccount.com\n\n# Create Kubernetes secret\nkubectl create secret generic rustmq-gcp-credentials \\\n    --from-file=key.json=rustmq-key.json\n```\n\n### Step 5: Networking Setup\n\n```bash\n# Create firewall rules for RustMQ\ngcloud compute firewall-rules create rustmq-quic \\\n    --allow tcp:9092,udp:9092 \\\n    --source-ranges 0.0.0.0/0 \\\n    --description \"RustMQ QUIC traffic\"\n\ngcloud compute firewall-rules create rustmq-rpc \\\n    --allow tcp:9093 \\\n    --source-ranges 10.0.0.0/8 \\\n    --description \"RustMQ internal RPC traffic\"\n\ngcloud compute firewall-rules create rustmq-admin \\\n    --allow tcp:9642 \\\n    --source-ranges 0.0.0.0/0 \\\n    --description \"RustMQ admin API\"\n```\n\n\n## ⚙️ Configuration\n\nRustMQ provides a comprehensive configuration system with optimized settings for development, testing, and production environments. **For detailed configuration guide, see [Configuration Guide](docs/configuration-guide.md)**. For comprehensive testing infrastructure details, see **[Testing Infrastructure Guide](docs/testing-infrastructure.md)**.\n\n### Configuration Files Overview\n\nRustMQ includes well-structured configuration files for different environments:\n\n#### 🧪 **Testing Configurations** (Ready to Use)\n- `config/test-broker.toml` - Optimized for unit/integration tests with `/tmp` storage, disabled fsync, and fast timeouts\n- `config/test-controller.toml` - Test controller with local addresses and temporary directories\n\n#### 🛠️ **Development Configurations** (Ready to Use)  \n- `config/broker-dev.toml` - Development broker with local paths and debug logging\n- `config/controller-dev.toml` - Development controller with self-signed certificates\n- `config/example-development.toml` - Comprehensive development template with detailed comments\n\n#### 🏭 **Production Configurations** (Ready to Use)\n- `config/broker.toml` - Production broker configuration\n- `config/controller.toml` - Production controller configuration\n- `config/example-production.toml` - Production template with enterprise settings\n\n### Quick Start\n\n```bash\n# Testing (use existing optimized configs)\ncargo test --lib  # Uses test configs automatically\nRUSTMQ_BROKER_CONFIG=config/test-broker.toml cargo test --test integration\n\n# Development (ready-to-use configs)  \ncargo run --bin rustmq-broker -- --config config/broker-dev.toml\ncargo run --bin rustmq-controller -- --config config/controller-dev.toml\n\n# Production (customize from templates)\ncp config/example-production.toml config/my-production.toml\n# Edit my-production.toml for your environment\ncargo run --bin rustmq-broker -- --config config/my-production.toml\n```\n\n### Key Configuration Features\n\n- **📁 Environment-Specific**: Separate configs for test/dev/prod with optimal defaults\n- **🔧 No New Files Needed**: Existing configurations cover all use cases\n- **⚡ Performance Optimized**: Test configs use `/tmp` storage and disabled fsync for speed\n- **🔒 Security Ready**: Development configs include mTLS with simplified certificate chains\n- **☁️ Cloud Native**: Production configs optimized for GCP with proper storage backends\n\n### Broker Configuration (`broker.toml`)\n\n```toml\n[broker]\nid = \"broker-001\"                    # Unique broker identifier\nrack_id = \"us-central1-a\"            # Availability zone for rack awareness\n\n[network]\nquic_listen = \"0.0.0.0:9092\"        # QUIC/HTTP3 client endpoint\nrpc_listen = \"0.0.0.0:9093\"         # Internal gRPC endpoint\nmax_connections = 10000              # Maximum concurrent connections\nconnection_timeout_ms = 30000        # Connection timeout\n\n# QUIC-specific configuration\n[network.quic_config]\nmax_concurrent_uni_streams = 1000\nmax_concurrent_bidi_streams = 1000\nmax_idle_timeout_ms = 30000\nmax_stream_data = 1024000\nmax_connection_data = 10240000\n\n[wal]\npath = \"/var/lib/rustmq/wal\"        # WAL storage path\ncapacity_bytes = 10737418240        # 10GB WAL capacity\nfsync_on_write = true               # Force sync on write (durability)\nsegment_size_bytes = 1073741824     # 1GB segment size\nbuffer_size = 65536                 # 64KB buffer size\nupload_interval_ms = 600000         # 10 minutes upload interval\nflush_interval_ms = 1000            # 1 second flush interval\n\n[cache]\nwrite_cache_size_bytes = 1073741824  # 1GB hot data cache\nread_cache_size_bytes = 2147483648   # 2GB cold data cache\neviction_policy = \"Moka\"             # Cache eviction policy (Moka/Lru/Lfu/Random)\n\n[object_storage]\nstorage_type = \"S3\"                 # Storage backend (S3/Gcs/Azure/Local)\nbucket = \"rustmq-data\"              # Storage bucket name\nregion = \"us-central1\"              # Storage region\nendpoint = \"http://minio:9000\"      # Storage endpoint\naccess_key = \"rustmq-access-key\"    # Optional: Access key\nsecret_key = \"rustmq-secret-key\"    # Optional: Secret key\nmultipart_threshold = 104857600     # 100MB multipart upload threshold\nmax_concurrent_uploads = 10         # Concurrent upload limit\n\n[controller]\nendpoints = [\"controller-1:9094\", \"controller-2:9094\", \"controller-3:9094\"]\nelection_timeout_ms = 5000          # Leader election timeout\nheartbeat_interval_ms = 1000        # Heartbeat frequency\n\n[replication]\nmin_in_sync_replicas = 2            # Minimum replicas for acknowledgment\nack_timeout_ms = 5000               # Replication acknowledgment timeout\nmax_replication_lag = 1000          # Maximum acceptable lag\nheartbeat_timeout_ms = 30000        # Follower heartbeat timeout (30 seconds)\n\n[etl]\nenabled = true                      # Enable WebAssembly ETL processing\nmemory_limit_bytes = 67108864       # 64MB memory limit per module\nexecution_timeout_ms = 5000         # Execution timeout\nmax_concurrent_executions = 100     # Concurrent execution limit\n\n[scaling]\nmax_concurrent_additions = 3        # Max brokers added simultaneously\nmax_concurrent_decommissions = 1    # Max brokers decommissioned simultaneously\nrebalance_timeout_ms = 300000       # Partition rebalancing timeout\ntraffic_migration_rate = 0.1        # Traffic migration rate per minute\nhealth_check_timeout_ms = 30000     # Health check timeout\n\n[operations]\nallow_runtime_config_updates = true # Enable runtime config updates\nupgrade_velocity = 3                # Brokers upgraded per minute\ngraceful_shutdown_timeout_ms = 30000 # Graceful shutdown timeout\n\n[operations.kubernetes]\nuse_stateful_sets = true            # Use StatefulSets for deployment\npvc_storage_class = \"fast-ssd\"      # Storage class for persistent volumes\nwal_volume_size = \"50Gi\"            # WAL volume size\nenable_pod_affinity = true          # Enable pod affinity for volume attachment\n```\n\n### Controller Configuration (`controller.toml`)\n\n```toml\n[controller]\nnode_id = \"controller-001\"               # Unique controller identifier\nraft_listen = \"0.0.0.0:9095\"           # Raft consensus endpoint\nrpc_listen = \"0.0.0.0:9094\"            # Internal gRPC endpoint\nhttp_listen = \"0.0.0.0:9642\"           # Admin REST API endpoint\n\n[raft]\npeers = [\n  \"controller-1@controller-1:9095\",\n  \"controller-2@controller-2:9095\", \n  \"controller-3@controller-3:9095\"\n]\nelection_timeout_ms = 5000              # Leader election timeout\nheartbeat_interval_ms = 1000            # Heartbeat frequency\n\n[admin]\nport = 9642                             # Admin REST API port\nhealth_check_interval_ms = 15000        # Health check interval\nhealth_timeout_ms = 30000               # Health timeout\nenable_cors = true                      # Enable CORS headers\nlog_requests = true                     # Log API requests\n\n# Rate limiting configuration for Admin REST API\n[admin.rate_limiting]\nenabled = true                          # Enable rate limiting (default: true)\nglobal_burst_size = 1000               # Global burst capacity\nglobal_refill_rate = 60                # Global requests per minute\nper_ip_burst_size = 100                # Per-IP burst capacity  \nper_ip_refill_rate = 30                # Per-IP requests per minute\ncleanup_interval_seconds = 3600        # Cleanup expired limiters (1 hour)\n\n# Endpoint-specific rate limits\n[admin.rate_limiting.endpoints]\n\"/health\" = { burst_size = 50, refill_rate = 100 }\n\"/api/v1/cluster\" = { burst_size = 50, refill_rate = 100 }\n\"/api/v1/topics\" = { burst_size = 20, refill_rate = 30 }\n\"/api/v1/brokers\" = { burst_size = 20, refill_rate = 30 }\n\"POST:/api/v1/topics\" = { burst_size = 5, refill_rate = 10 }\n\"DELETE:/api/v1/topics\" = { burst_size = 5, refill_rate = 10 }\n\n[autobalancer]\nenabled = true                          # Enable auto-balancing\ncpu_threshold = 0.80                   # CPU threshold for rebalancing\nmemory_threshold = 0.75                # Memory threshold for rebalancing\ncooldown_seconds = 300                 # Cooldown between rebalancing operations\n```\n\n### Environment Variables\n\n```bash\n# Core settings\nRUSTMQ_BROKER_ID=broker-001\nRUSTMQ_RACK_ID=us-central1-a\nRUSTMQ_LOG_LEVEL=info\n\n# Storage settings\nRUSTMQ_WAL_PATH=/var/lib/rustmq/wal\nRUSTMQ_STORAGE_BUCKET=rustmq-data\nRUSTMQ_STORAGE_REGION=us-central1\n\n# GCP settings\nGOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json\nGCP_PROJECT_ID=your-project-id\n\n# Performance tuning\nRUSTMQ_CACHE_SIZE=2147483648\nRUSTMQ_MAX_CONNECTIONS=10000\nRUSTMQ_BATCH_SIZE=1000\n\n# Admin API rate limiting settings\nRUSTMQ_ADMIN_RATE_LIMITING_ENABLED=true\nRUSTMQ_ADMIN_GLOBAL_BURST_SIZE=1000\nRUSTMQ_ADMIN_GLOBAL_REFILL_RATE=60\nRUSTMQ_ADMIN_PER_IP_BURST_SIZE=100\nRUSTMQ_ADMIN_PER_IP_REFILL_RATE=30\nRUSTMQ_ADMIN_CLEANUP_INTERVAL_SECONDS=3600\n```\n\n## 🔧 Message Broker Core API\n\nRustMQ now includes a fully implemented high-level Message Broker Core that provides intuitive producer and consumer APIs with comprehensive error handling, automatic partition management, and flexible acknowledgment levels.\n\n### Architecture Overview\n\nThe Message Broker Core is built on a modular architecture that integrates seamlessly with RustMQ's distributed storage and replication systems:\n\n```rust\nuse rustmq::broker::core::*;\n\n// Create a broker core instance with your storage backends\nlet core = MessageBrokerCore::new(\n    wal,               // Write-Ahead Log implementation\n    object_storage,    // Object storage backend (S3/GCS/Azure)\n    cache,             // Distributed cache layer\n    replication_manager, // Replication coordinator\n    network_handler,   // Network communication handler\n    broker_id,         // Unique broker identifier\n);\n```\n\n### Producer API\n\nThe Producer trait provides a simple, high-performance interface for message production:\n\n```rust\n#[async_trait]\npub trait Producer {\n    /// Send a single record to a topic-partition\n    async fn send(\u0026self, record: ProduceRecord) -\u003e Result\u003cProduceResult\u003e;\n    \n    /// Send a batch of records for optimized throughput\n    async fn send_batch(\u0026self, records: Vec\u003cProduceRecord\u003e) -\u003e Result\u003cVec\u003cProduceResult\u003e\u003e;\n    \n    /// Flush any pending records to ensure durability\n    async fn flush(\u0026self) -\u003e Result\u003c()\u003e;\n}\n```\n\n#### Single Message Production\n\n```rust\nlet producer = core.create_producer();\n\nlet record = ProduceRecord {\n    topic: \"user-events\".to_string(),\n    partition: Some(0),                    // Optional: let RustMQ choose partition\n    key: Some(b\"user123\".to_vec()),\n    value: b\"login_event\".to_vec(),\n    headers: vec![Header {\n        key: \"content-type\".to_string(),\n        value: b\"application/json\".to_vec(),\n    }],\n    acks: AcknowledgmentLevel::All,        // Wait for all replicas\n    timeout_ms: 5000,\n};\n\nlet result = producer.send(record).await?;\nprintln!(\"Message produced at offset: {}\", result.offset);\n```\n\n#### Batch Production for High Throughput\n\n```rust\nlet mut batch = Vec::new();\nfor i in 0..1000 {\n    batch.push(ProduceRecord {\n        topic: \"metrics\".to_string(),\n        partition: None,  // Auto-partition based on key hash\n        key: Some(format!(\"sensor_{}\", i % 10).into_bytes()),\n        value: format!(\"{{\\\"value\\\": {}, \\\"timestamp\\\": {}}}\", i, timestamp).into_bytes(),\n        headers: vec![],\n        acks: AcknowledgmentLevel::Leader,  // Faster acknowledgment\n        timeout_ms: 1000,\n    });\n}\n\nlet results = producer.send_batch(batch).await?;\nprintln!(\"Produced {} messages\", results.len());\n```\n\n### Consumer API\n\nThe Consumer trait provides flexible message consumption with automatic offset management:\n\n```rust\n#[async_trait]\npub trait Consumer {\n    /// Subscribe to one or more topics\n    async fn subscribe(\u0026mut self, topics: Vec\u003cTopicName\u003e) -\u003e Result\u003c()\u003e;\n    \n    /// Poll for new records with configurable timeout\n    async fn poll(\u0026mut self, timeout_ms: u32) -\u003e Result\u003cVec\u003cConsumeRecord\u003e\u003e;\n    \n    /// Commit specific offsets for durability\n    async fn commit_offsets(\u0026mut self, offsets: HashMap\u003cTopicPartition, Offset\u003e) -\u003e Result\u003c()\u003e;\n    \n    /// Seek to a specific offset for replay scenarios\n    async fn seek(\u0026mut self, topic_partition: TopicPartition, offset: Offset) -\u003e Result\u003c()\u003e;\n}\n```\n\n#### Basic Consumer Usage\n\n```rust\nlet mut consumer = core.create_consumer(\"analytics-group\".to_string());\n\n// Subscribe to topics\nconsumer.subscribe(vec![\"user-events\".to_string(), \"orders\".to_string()]).await?;\n\n// Consume messages\nloop {\n    let records = consumer.poll(1000).await?;\n    \n    for record in records {\n        println!(\"Received: topic={}, partition={}, offset={}\", \n                 record.topic_partition.topic,\n                 record.topic_partition.partition,\n                 record.offset);\n        \n        // Process your message\n        process_message(\u0026record.value).await?;\n        \n        // Optional: Manual offset commit for exactly-once processing\n        let mut offsets = HashMap::new();\n        offsets.insert(record.topic_partition.clone(), record.offset + 1);\n        consumer.commit_offsets(offsets).await?;\n    }\n}\n```\n\n#### Consumer Seek for Message Replay\n\n```rust\n// Replay messages from a specific point in time\nlet topic_partition = TopicPartition {\n    topic: \"user-events\".to_string(),\n    partition: 0,\n};\n\n// Seek to offset 1000 to replay messages\nconsumer.seek(topic_partition, 1000).await?;\n\n// Continue normal polling - will start from offset 1000\nlet records = consumer.poll(5000).await?;\n```\n\n### Acknowledgment Levels\n\nRustMQ supports flexible acknowledgment levels for different durability and performance requirements:\n\n```rust\nuse rustmq::types::AcknowledgmentLevel;\n\n// Maximum performance - fire and forget\nacks: AcknowledgmentLevel::None,\n\n// Fast acknowledgment - leader only\nacks: AcknowledgmentLevel::Leader,\n\n// High availability - majority of replicas\nacks: AcknowledgmentLevel::Majority,\n\n// Maximum durability - all replicas\nacks: AcknowledgmentLevel::All,\n\n// Custom requirement - specific number of replicas\nacks: AcknowledgmentLevel::Custom(3),\n```\n\n### Error Handling\n\nThe Broker Core provides comprehensive error handling with detailed error types:\n\n```rust\nuse rustmq::error::RustMqError;\n\nmatch producer.send(record).await {\n    Ok(result) =\u003e println!(\"Success: offset {}\", result.offset),\n    Err(RustMqError::NotLeader(partition)) =\u003e {\n        println!(\"Not leader for partition: {}\", partition);\n        // Retry with updated metadata\n    },\n    Err(RustMqError::OffsetOutOfRange(msg)) =\u003e {\n        println!(\"Offset out of range: {}\", msg);\n        // Seek to valid offset\n    },\n    Err(RustMqError::Timeout) =\u003e {\n        println!(\"Request timed out\");\n        // Implement retry logic\n    },\n    Err(e) =\u003e println!(\"Other error: {}\", e),\n}\n```\n\n### Integration with Storage Layers\n\nThe Broker Core seamlessly integrates with RustMQ's tiered storage architecture:\n\n- **Local WAL**: Recent messages are served from high-speed local NVMe storage\n- **Cache Layer**: Frequently accessed messages are cached for optimal performance  \n- **Object Storage**: Historical messages are automatically migrated to cost-effective cloud storage\n- **Intelligent Routing**: The core automatically routes read requests to the optimal storage tier\n\n### Testing and Validation\n\nThe Message Broker Core includes comprehensive test coverage:\n\n- **Unit Tests**: Core functionality with 88 passing tests\n- **Integration Tests**: End-to-end workflows with 9 comprehensive test scenarios\n- **Mock Implementations**: Complete test doubles for all dependencies\n- **Error Scenarios**: Comprehensive error condition testing\n\n### Performance Characteristics\n\n#### I/O Performance Optimizations\n\nRustMQ features advanced I/O optimizations with automatic backend selection for maximum performance:\n\n- **🔥 io_uring Backend** (Linux): True asynchronous I/O with 2-10x lower latency (0.5-2μs vs 5-20μs)\n  - **Throughput**: 3-5x higher IOPS for small random I/O operations\n  - **CPU Efficiency**: 50-80% reduction in CPU usage for I/O-heavy workloads\n  - **Memory Efficiency**: No thread pool overhead, direct kernel communication\n  - **Feature Flag**: Enable with `--features io-uring` (automatic detection on Linux 5.6+)\n\n- **🛡️ Fallback Backend**: High-performance tokio::fs implementation for cross-platform compatibility\n  - **Automatic Selection**: Runtime detection with transparent fallback\n  - **Platform Support**: Windows, macOS, Linux (when io_uring unavailable)\n  - **Consistent API**: Same performance characteristics across all platforms\n\n#### Overall Performance\n\n- **Low Latency**: Sub-millisecond produce latency for local WAL writes (optimized with io_uring)\n- **High Throughput**: Batch production for maximum throughput scenarios\n- **Automatic Partitioning**: Intelligent partition selection based on message keys\n- **Zero-Copy Operations**: Efficient memory usage throughout the message path with buffer reuse ([detailed optimization guide](docs/zero-copy-optimization.md))\n- **Async Throughout**: Non-blocking I/O for maximum concurrency\n- **Platform Adaptive**: Automatically selects optimal I/O backend based on system capabilities\n\n## 📦 Client SDKs\n\nRustMQ provides official client SDKs for multiple programming languages with production-ready features and comprehensive documentation.\n\n### 🦀 Rust SDK\n- **Location**: [`sdk/rust/`](sdk/rust/)\n- **Status**: ✅ **Fully Implemented** - Production-ready client library with comprehensive producer API\n- **Features**: \n  - **Advanced Producer API**: Builder pattern with intelligent batching, flush mechanisms, and configurable acknowledgment levels\n  - **Async/Await**: Built on Tokio with zero-copy operations and streaming support\n  - **QUIC Transport**: Modern HTTP/3 protocol for low-latency communication\n  - **Comprehensive Error Handling**: Detailed error types with retry logic and timeout management\n  - **Performance Monitoring**: Built-in metrics for messages sent/failed, batch sizes, and timing\n- **Build**: `cargo build --release`\n- **Install**: Add to `Cargo.toml`: `rustmq-client = { path = \"sdk/rust\" }`\n\n#### Producer API\n\nThe Rust SDK provides a comprehensive Producer API with intelligent batching, flush mechanisms, and production-ready features. Based on the actual implementation:\n\n##### Basic Producer Usage\n\n```rust\nuse rustmq_client::*;\nuse std::time::Duration;\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    // Create client connection\n    let config = ClientConfig {\n        brokers: vec![\"localhost:9092\".to_string()],\n        client_id: Some(\"my-app-producer\".to_string()),\n        connect_timeout: Duration::from_secs(10),\n        request_timeout: Duration::from_secs(30),\n        ..Default::default()\n    };\n    \n    let client = RustMqClient::new(config).await?;\n    \n    // Create producer with custom configuration\n    let producer_config = ProducerConfig {\n        batch_size: 100,                           // Batch up to 100 messages\n        batch_timeout: Duration::from_millis(10),  // Or send after 10ms\n        ack_level: AckLevel::All,                   // Wait for all replicas\n        producer_id: Some(\"my-app-producer\".to_string()),\n        ..Default::default()\n    };\n    \n    let producer = ProducerBuilder::new()\n        .topic(\"user-events\")\n        .config(producer_config)\n        .client(client)\n        .build()\n        .await?;\n    \n    // Send a single message and wait for acknowledgment\n    let message = Message::builder()\n        .topic(\"user-events\")\n        .payload(\"user logged in\")\n        .header(\"user-id\", \"12345\")\n        .header(\"event-type\", \"login\")\n        .build()?;\n    \n    let result = producer.send(message).await?;\n    println!(\"Message sent to partition {} at offset {}\", \n             result.partition, result.offset);\n    \n    Ok(())\n}\n```\n\n##### Fire-and-Forget Messages\n\n```rust\n// High-throughput fire-and-forget sending\nfor i in 0..1000 {\n    let message = Message::builder()\n        .topic(\"metrics\")\n        .payload(format!(\"{{\\\"value\\\": {}, \\\"timestamp\\\": {}}}\", i, timestamp))\n        .header(\"sensor-id\", \u0026format!(\"sensor-{}\", i % 10))\n        .build()?;\n    \n    // Returns immediately after queuing - no waiting for broker\n    producer.send_async(message).await?;\n}\n\n// Flush to ensure all messages are sent\nproducer.flush().await?;\n```\n\n##### Batch Operations\n\n```rust\n// Prepare a batch of messages\nlet messages: Vec\u003c_\u003e = (0..50).map(|i| {\n    Message::builder()\n        .topic(\"batch-topic\")\n        .payload(format!(\"message-{}\", i))\n        .header(\"batch-id\", \"batch-123\")\n        .build().unwrap()\n}).collect();\n\n// Send batch and wait for all acknowledgments\nlet results = producer.send_batch(messages).await?;\n\nfor result in results {\n    println!(\"Message {} sent to offset {}\", \n             result.message_id, result.offset);\n}\n```\n\n##### Producer Configuration Options\n\n```rust\nlet producer_config = ProducerConfig {\n    // Batching configuration\n    batch_size: 100,                           // Messages per batch\n    batch_timeout: Duration::from_millis(10),  // Maximum wait time\n    \n    // Reliability configuration  \n    ack_level: AckLevel::All,                   // All, Leader, or None\n    max_message_size: 1024 * 1024,             // 1MB max message size\n    idempotent: true,                           // Enable idempotent producer\n    \n    // Producer identification\n    producer_id: Some(\"my-producer\".to_string()),\n    \n    // Advanced configuration\n    compression: CompressionConfig {\n        enabled: true,\n        algorithm: CompressionAlgorithm::Lz4,\n        level: 6,\n        min_size: 1024,\n    },\n    \n    default_properties: HashMap::from([\n        (\"app\".to_string(), \"my-application\".to_string()),\n        (\"version\".to_string(), \"1.0.0\".to_string()),\n    ]),\n};\n```\n\n##### Error Handling\n\n```rust\nuse rustmq_sdk::error::ClientError;\n\nmatch producer.send(message).await {\n    Ok(result) =\u003e {\n        println!(\"Success: {} at offset {}\", result.message_id, result.offset);\n    }\n    Err(ClientError::Timeout { timeout_ms }) =\u003e {\n        println!(\"Request timed out after {}ms\", timeout_ms);\n        // Implement retry logic\n    }\n    Err(ClientError::Broker(msg)) =\u003e {\n        println!(\"Broker error: {}\", msg);\n        // Handle broker-side errors\n    }\n    Err(ClientError::MessageTooLarge { size, max_size }) =\u003e {\n        println!(\"Message too large: {} bytes (max: {})\", size, max_size);\n        // Reduce message size\n    }\n    Err(e) =\u003e {\n        println!(\"Other error: {}\", e);\n    }\n}\n```\n\n##### Monitoring and Metrics\n\n```rust\n// Get producer performance metrics\nlet metrics = producer.metrics().await;\n\nprintln!(\"Messages sent: {}\", \n         metrics.messages_sent.load(std::sync::atomic::Ordering::Relaxed));\nprintln!(\"Messages failed: {}\", \n         metrics.messages_failed.load(std::sync::atomic::Ordering::Relaxed));\nprintln!(\"Batches sent: {}\", \n         metrics.batches_sent.load(std::sync::atomic::Ordering::Relaxed));\nprintln!(\"Average batch size: {:.2}\", \n         *metrics.average_batch_size.read().await);\n\nif let Some(last_send) = *metrics.last_send_time.read().await {\n    println!(\"Last send: {:?} ago\", last_send.elapsed());\n}\n```\n\n##### Graceful Shutdown\n\n```rust\n// Proper producer shutdown\nasync fn shutdown_producer(producer: Producer) -\u003e Result\u003c(), ClientError\u003e {\n    // Flush all pending messages\n    producer.flush().await?;\n    \n    // Close producer and cleanup resources\n    producer.close().await?;\n    \n    println!(\"Producer shut down gracefully\");\n    Ok(())\n}\n```\n\n### 🐹 Go SDK  \n- **Location**: [`sdk/go/`](sdk/go/)\n- **Status**: ✅ **Fully Implemented** - Production-ready client library with sophisticated connection layer\n- **Features**: \n  - **Advanced Connection Management**: QUIC transport with intelligent connection pooling, round-robin load balancing, and automatic failover\n  - **Comprehensive TLS/mTLS Support**: Full client certificate authentication with CA validation and configurable trust stores\n  - **Health Check System**: Real-time broker health monitoring with JSON message exchange and automatic cleanup of failed connections\n  - **Robust Reconnection Logic**: Exponential backoff with jitter, per-broker state tracking, and intelligent failure recovery\n  - **Producer API with Batching**: Intelligent message batching with configurable size/timeout thresholds and compression support\n  - **Extensive Statistics**: Connection metrics, health check tracking, error monitoring, traffic analytics, and reconnection statistics\n  - **Production-Ready Features**: Concurrent-safe operations, goroutine-based processing, configurable timeouts, and comprehensive error handling\n- **Build**: `go build ./...`\n- **Install**: `import \"github.com/rustmq/rustmq/sdk/go/rustmq\"`\n\n### Go SDK Connection Layer Highlights\n\nThe Go SDK features a sophisticated connection management system designed for production environments:\n\n#### TLS/mTLS Configuration\n```go\nconfig := \u0026rustmq.ClientConfig{\n    EnableTLS: true,\n    TLSConfig: \u0026rustmq.TLSConfig{\n        CACert:     \"/etc/ssl/certs/ca.pem\",\n        ClientCert: \"/etc/ssl/certs/client.pem\",\n        ClientKey:  \"/etc/ssl/private/client.key\",\n        ServerName: \"rustmq.example.com\",\n    },\n}\n```\n\n#### Health Check \u0026 Reconnection\n```go\n// Automatic health monitoring with configurable intervals\nconfig.KeepAliveInterval = 30 * time.Second\n\n// Exponential backoff with jitter for reconnection\nconfig.RetryConfig = \u0026rustmq.RetryConfig{\n    MaxRetries: 10,\n    BaseDelay:  100 * time.Millisecond,\n    MaxDelay:   30 * time.Second,\n    Multiplier: 2.0,\n    Jitter:     true,\n}\n```\n\n#### Producer with Intelligent Batching\n```go\n// Create producer with batching configuration\nproducerConfig := \u0026rustmq.ProducerConfig{\n    BatchSize:    100,\n    BatchTimeout: 100 * time.Millisecond,\n    AckLevel:     rustmq.AckAll,\n    Idempotent:   true,\n}\n\nproducer, err := client.CreateProducer(\"topic\", producerConfig)\nif err != nil {\n    log.Fatal(err)\n}\n\n// Send message with automatic batching\nresult, err := producer.Send(ctx, message)\n```\n\n#### Connection Statistics\n```go\nstats := client.Stats()\nfmt.Printf(\"Active: %d/%d, Reconnects: %d, Health Checks: %d\", \n    stats.ActiveConnections, stats.TotalConnections,\n    stats.ReconnectAttempts, stats.HealthChecks)\n\n// Additional statistics available\nfmt.Printf(\"Bytes: Sent=%d, Received=%d, Errors=%d\", \n    stats.BytesSent, stats.BytesReceived, stats.Errors)\n```\n\n### Common SDK Features\n- **QUIC/HTTP3 Transport**: Low-latency, multiplexed connections\n- **Producer APIs**: Sync/async sending, batching, compression\n- **Consumer APIs**: Auto-commit, manual offset management, consumer groups\n- **Stream Processing**: Real-time message transformation pipelines\n- **Configuration**: Comprehensive client, producer, consumer settings\n- **Monitoring**: Built-in metrics, health checks, observability\n- **Error Handling**: Retry logic, circuit breakers, dead letter queues\n- **Security**: TLS/mTLS, authentication, authorization\n\n### Quick Start\n\n#### Rust SDK\n```bash\ncd sdk/rust\n\n# Basic producer example\ncargo run --example simple_producer\n\n# Advanced consumer with multi-partition support\ncargo run --example advanced_consumer\n\n# Stream processing example\ncargo run --example stream_processor\n```\n\n#### Go SDK\n```bash\ncd sdk/go\n\n# Basic producer example\ngo run examples/simple_producer.go\n\n# Basic consumer example\ngo run examples/simple_consumer.go\n\n# Advanced stream processing\ngo run examples/advanced_stream_processor.go\n```\n\nSee individual SDK READMEs for detailed usage, configuration, performance tuning, and API documentation.\n\n## 📚 Usage Examples\n\n### Client Examples\n\n**Note**: The following are examples of the intended client API. Current implementation is in early development stage and these clients are not yet available.\n\n#### Rust Client Example\n\n```rust\n// Cargo.toml\n[dependencies]\nrustmq-client = { path = \"sdk/rust\" }\ntokio = { version = \"1.0\", features = [\"full\"] }\nserde = { version = \"1.0\", features = [\"derive\"] }\n\n// main.rs\nuse rustmq_client::*;\nuse serde::{Serialize, Deserialize};\nuse std::time::Duration;\n\n#[derive(Serialize, Deserialize)]\nstruct OrderEvent {\n    order_id: String,\n    customer_id: String,\n    amount: f64,\n    timestamp: u64,\n}\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    // Create client connection\n    let config = ClientConfig {\n        brokers: vec![\"localhost:9092\".to_string()],\n        client_id: Some(\"order-processor\".to_string()),\n        ..Default::default()\n    };\n    \n    let client = RustMqClient::new(config).await?;\n    \n    // Create producer\n    let producer = ProducerBuilder::new()\n        .topic(\"orders\")\n        .client(client.clone())\n        .build()\n        .await?;\n\n    // Produce messages\n    let order = OrderEvent {\n        order_id: \"order-123\".to_string(),\n        customer_id: \"customer-456\".to_string(),\n        amount: 99.99,\n        timestamp: std::time::SystemTime::now()\n            .duration_since(std::time::UNIX_EPOCH)?\n            .as_secs(),\n    };\n\n    let message = Message::builder()\n        .topic(\"orders\")\n        .key(\u0026order.order_id)\n        .payload(serde_json::to_vec(\u0026order)?)\n        .header(\"content-type\", \"application/json\")\n        .build()?;\n\n    let result = producer.send(message).await?;\n    println!(\"Message produced at offset: {}\", result.offset);\n\n    // Create consumer\n    let consumer = ConsumerBuilder::new()\n        .topic(\"orders\")\n        .consumer_group(\"order-processors\")\n        .client(client)\n        .build()\n        .await?;\n    \n    // Consume with automatic offset management\n    while let Some(consumer_message) = consumer.receive().await? {\n        let message = \u0026consumer_message.message;\n        let order: OrderEvent = serde_json::from_slice(\u0026message.payload)?;\n        \n        // Process the order\n        process_order(order).await?;\n        \n        // Acknowledge message\n        consumer_message.ack().await?;\n    }\n    \n    Ok(())\n}\n\nasync fn process_order(order: OrderEvent) -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    println!(\"Processing order {} for customer {} amount ${}\", \n             order.order_id, order.customer_id, order.amount);\n    \n    // Your business logic here\n    tokio::time::sleep(Duration::from_millis(100)).await;\n    \n    Ok(())\n}\n```\n\n#### Go Client Example\n\n```go\n// go.mod\nmodule rustmq-example\n\ngo 1.21\n\nrequire (\n    github.com/rustmq/rustmq/sdk/go v0.1.0\n    github.com/google/uuid v1.3.0\n)\n\n// main.go\npackage main\n\nimport (\n    \"context\"\n    \"encoding/json\"\n    \"fmt\"\n    \"log\"\n    \"time\"\n\n    \"github.com/google/uuid\"\n    \"github.com/rustmq/rustmq/sdk/go/rustmq\"\n)\n\ntype OrderEvent struct {\n    OrderID    string  `json:\"order_id\"`\n    CustomerID string  `json:\"customer_id\"`\n    Amount     float64 `json:\"amount\"`\n    Timestamp  int64   `json:\"timestamp\"`\n}\n\nfunc main() {\n    // Create client configuration\n    config := \u0026rustmq.ClientConfig{\n        Brokers:  []string{\"localhost:9092\"},\n        ClientID: \"order-processor\",\n    }\n    \n    client, err := rustmq.NewClient(config)\n    if err != nil {\n        log.Fatal(\"Failed to create client:\", err)\n    }\n    defer client.Close()\n\n    // Producer example\n    producer, err := client.CreateProducer(\"orders\")\n    if err != nil {\n        log.Fatal(\"Failed to create producer:\", err)\n    }\n    defer producer.Close()\n\n    // Send some orders\n    for i := 0; i \u003c 10; i++ {\n        order := OrderEvent{\n            OrderID:    uuid.New().String(),\n            CustomerID: fmt.Sprintf(\"customer-%d\", i%5),\n            Amount:     float64((i + 1) * 25),\n            Timestamp:  time.Now().UnixMilli(),\n        }\n\n        orderBytes, _ := json.Marshal(order)\n        \n        message := rustmq.NewMessage().\n            Topic(\"orders\").\n            KeyString(order.OrderID).\n            Payload(orderBytes).\n            Header(\"content-type\", \"application/json\").\n            Build()\n\n        ctx := context.Background()\n        result, err := producer.Send(ctx, message)\n        if err != nil {\n            log.Printf(\"Failed to send message: %v\", err)\n            continue\n        }\n        \n        fmt.Printf(\"Message sent at offset: %d, partition: %d\\n\", \n            result.Offset, result.Partition)\n    }\n\n    // Consumer example\n    consumer, err := client.CreateConsumer(\"orders\", \"order-processors\")\n    if err != nil {\n        log.Fatal(\"Failed to create consumer:\", err)\n    }\n    defer consumer.Close()\n\n    // Consume messages\n    for i := 0; i \u003c 10; i++ {\n        ctx := context.Background()\n        message, err := consumer.Receive(ctx)\n        if err != nil {\n            log.Printf(\"Receive error: %v\", err)\n            continue\n        }\n\n        var order OrderEvent\n        if err := json.Unmarshal(message.Message.Payload, \u0026order); err != nil {\n            log.Printf(\"Failed to unmarshal order: %v\", err)\n            message.Ack()\n            continue\n        }\n\n        // Process the order\n        if err := processOrder(order); err != nil {\n            log.Printf(\"Failed to process order %s: %v\", order.OrderID, err)\n            message.Nack() // Retry\n            continue\n        }\n\n        fmt.Printf(\"Processed order %s for customer %s amount $%.2f\\n\",\n            order.OrderID, order.CustomerID, order.Amount)\n            \n        // Acknowledge successful processing\n        message.Ack()\n    }\n}\n\nfunc processOrder(order OrderEvent) error {\n    // Your business logic here\n    time.Sleep(100 * time.Millisecond)\n    return nil\n}\n```\n\n#### Admin Operations\n\n**✅ Fully Implemented**: The Admin REST API is production-ready with comprehensive cluster management capabilities.\n\n```bash\n# Create topic with custom configuration\ncurl -X POST http://localhost:8080/api/v1/topics \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"name\": \"user-events\",\n    \"partitions\": 24,\n    \"replication_factor\": 3,\n    \"retention_ms\": 604800000,\n    \"segment_bytes\": 1073741824,\n    \"compression_type\": \"lz4\"\n  }'\n\n# List topics\ncurl http://localhost:8080/api/v1/topics\n\n# Get topic details\ncurl http://localhost:8080/api/v1/topics/user-events\n\n# Delete topic\ncurl -X DELETE http://localhost:8080/api/v1/topics/user-events\n\n# Get cluster health and status\ncurl http://localhost:8080/api/v1/cluster\n\n# List brokers with health status\ncurl http://localhost:8080/api/v1/brokers\n\n# Check service health and uptime\ncurl http://localhost:8080/health\n\n# Advanced features (future implementation)\n# Partition rebalancing, ETL module management, and metrics endpoints\n# will be available in future releases\n```\n\n## 📊 Future Performance Tuning (Not Yet Implemented)\n\n### Planned Broker Optimization\n\n```toml\n# High-throughput configuration\n[wal]\ncapacity_bytes = 53687091200        # 50GB for high-volume topics\nfsync_on_write = false              # Disable for maximum throughput\nsegment_size_bytes = 2147483648     # 2GB segments\nbuffer_size = 1048576               # 1MB buffer\n\n[cache]\nwrite_cache_size_bytes = 8589934592  # 8GB hot cache\nread_cache_size_bytes = 17179869184  # 16GB cold cache\n\n[network]\nmax_connections = 50000             # Increase connection limit\nconnection_timeout_ms = 60000       # Longer timeout for slow clients\n\n[object_storage]\nmax_concurrent_uploads = 50         # More concurrent uploads\nmultipart_threshold = 52428800      # 50MB threshold\n\n# Low-latency configuration\n[wal]\nfsync_on_write = true               # Enable for durability\nbuffer_size = 4096                  # Smaller buffers for low latency\n\n[replication]\nmin_in_sync_replicas = 1            # Reduce for lower latency\nack_timeout_ms = 1000               # Faster timeouts\nheartbeat_timeout_ms = 10000        # Shorter heartbeat timeout for faster failover\n```\n\n### Planned Kubernetes Resource Tuning\n\n```yaml\n# High-performance broker configuration\nresources:\n  requests:\n    memory: \"16Gi\"\n    cpu: \"8\"\n    ephemeral-storage: \"100Gi\"\n  limits:\n    memory: \"32Gi\"\n    cpu: \"16\"\n    ephemeral-storage: \"200Gi\"\n\n# Node affinity for performance\nnodeSelector:\n  cloud.google.com/gke-nodepool: high-performance\n  \naffinity:\n  podAntiAffinity:\n    requiredDuringSchedulingIgnoredDuringExecution:\n    - labelSelector:\n        matchLabels:\n          app: rustmq-broker\n      topologyKey: kubernetes.io/hostname\n\n# Volume configuration for maximum IOPS\nvolumeClaimTemplates:\n- metadata:\n    name: wal-storage\n  spec:\n    accessModes: [\"ReadWriteOnce\"]\n    storageClassName: fast-ssd\n    resources:\n      requests:\n        storage: 500Gi\n```\n\n## 📈 Future Monitoring (Not Yet Implemented)\n\n### Planned Prometheus Configuration\n\n```yaml\n# prometheus-config.yaml - future monitoring setup\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: prometheus-config\ndata:\n  prometheus.yml: |\n    global:\n      scrape_interval: 15s\n      \n    scrape_configs:\n    - job_name: 'rustmq-brokers'\n      kubernetes_sd_configs:\n      - role: pod\n        namespaces:\n          names: [rustmq]\n      relabel_configs:\n      - source_labels: [__meta_kubernetes_pod_label_app]\n        action: keep\n        regex: rustmq-broker\n      - source_labels: [__meta_kubernetes_pod_ip]\n        target_label: __address__\n        replacement: ${1}:9642\n        \n    - job_name: 'rustmq-controllers'\n      kubernetes_sd_configs:\n      - role: pod\n        namespaces:\n          names: [rustmq]\n      relabel_configs:\n      - source_labels: [__meta_kubernetes_pod_label_app]\n        action: keep\n        regex: rustmq-controller\n      - source_labels: [__meta_kubernetes_pod_ip]\n        target_label: __address__\n        replacement: ${1}:9642\n```\n\n### Future Monitoring (Not Yet Implemented)\n\nPlanned metrics to monitor:\n\n- **Throughput**: `rate(messages_produced_total[5m])`, `rate(messages_consumed_total[5m])`\n- **Latency**: `produce_latency_seconds`, `consume_latency_seconds`\n- **Storage**: `wal_size_bytes`, `cache_hit_ratio`, `object_storage_upload_rate`\n- **Replication**: `replication_lag`, `in_sync_replicas_count`\n- **System**: `cpu_usage`, `memory_usage`, `disk_iops`, `network_throughput`\n\n### Future Alerting (Not Yet Implemented)\n\n```yaml\n# alerts.yaml - planned alerting rules\ngroups:\n- name: rustmq.rules\n  rules:\n  - alert: HighProduceLatency\n    expr: histogram_quantile(0.95, produce_latency_seconds) \u003e 0.1\n    for: 2m\n    labels:\n      severity: warning\n    annotations:\n      summary: \"High produce latency detected\"\n      \n  - alert: ReplicationLagHigh\n    expr: replication_lag \u003e 10000\n    for: 5m\n    labels:\n      severity: critical\n    annotations:\n      summary: \"Replication lag is too high\"\n      \n  - alert: BrokerDown\n    expr: up{job=\"rustmq-brokers\"} == 0\n    for: 1m\n    labels:\n      severity: critical\n    annotations:\n      summary: \"RustMQ broker is down\"\n```\n\n## 🔧 Development \u0026 Troubleshooting\n\n### 🚀 Environment Setup\n\nRustMQ provides a comprehensive environment setup script that can configure both development and production environments:\n\n#### Development Environment Setup\n\nSet up a complete local development environment with certificates, configurations, and services:\n\n```bash\n# Set up complete development environment (recommended)\n./generate-certs.sh develop\n\n# Force regenerate existing setup\n./generate-certs.sh develop --force\n\n# View available options\n./generate-certs.sh --help\n```\n\n**What the development setup provides:**\n- ✅ Self-signed development certificates with simplified CA chain\n- ✅ Development configuration files for broker, controller, and admin\n- ✅ Local data directories and startup scripts\n- ✅ Example applications and test clients\n- ✅ Ready-to-run local cluster\n\n**Quick start after setup:**\n```bash\n# Start the complete cluster\n./start-cluster-dev.sh\n\n# Or start services individually\n./start-controller-dev.sh  # Start controller first\n./start-broker-dev.sh      # Start broker\n\n# Test with examples\ncargo run --example secure_producer\ncargo run --example secure_consumer\n\n# Admin operations\ncargo run --bin rustmq-admin -- --config config/admin-dev.toml cluster status\n```\n\n#### Production Environment Setup\n\nGet comprehensive guidance for production deployment:\n\n```bash\n# Show production setup guidance\n./generate-certs.sh production\n```\n\n**What the production guidance provides:**\n- 📋 Production setup guidance and checklists\n- 📋 Security best practices and hardening\n- 📋 Deployment options (Kubernetes, Docker, systemd)\n- 📋 Monitoring and observability setup\n- 📋 Certificate management with external CA\n\n#### Environment Setup Features\n\n**Development Environment (`./generate-certs.sh develop`):**\n- **Complete Setup**: Creates certificates, configs, data directories, and startup scripts\n- **Certificate Chain**: Proper CA-signed certificates (fixes August 2025 certificate signing issues)\n- **Ready to Use**: Start developing immediately with `./start-cluster-dev.sh`\n- **Examples Included**: Secure producer/consumer examples with mTLS\n- **Validation**: Automatic setup validation and certificate verification\n\n**Production Environment (`./generate-certs.sh production`):**\n- **Security Guidance**: Enterprise-grade security setup instructions\n- **Certificate Management**: External CA integration and certificate lifecycle\n- **Deployment Options**: Kubernetes, Docker, and systemd deployment guides\n- **Monitoring Setup**: Comprehensive observability and alerting configuration\n- **Best Practices**: Production hardening and operational procedures\n\n### 🔐 Development Certificates\n\nThe development environment automatically generates certificates with simplified signing chains:\n\n- `certs/ca.pem` - Root CA certificate (self-signed for development)\n- `certs/server.pem` + `certs/server.key` - Server certificate and private key (CA-signed)\n- `certs/client.pem` + `certs/client.key` - Client certificate and private key (CA-signed)\n- `certs/admin.pem` + `certs/admin.key` - Admin certificate and private key (CA-signed)\n\n**⚠️ Security Notice**: These are development-only certificates. For production, follow the production setup guide!\n\n### Current Development Issues\n\n**Note**: Since RustMQ is in early development, most \"issues\" are actually missing implementations.\n\n1. **Services Not Responding**\n```bash\n# Both broker and controller services are now production-ready with full functionality\n# Check if they started successfully\ndocker-compose logs rustmq-broker-1\ndocker-compose logs rustmq-controller-1\n\n# Look for configuration loading messages\n# Services should log \"started successfully\" then sleep\n```\n\n2. **Build Issues**\n```bash\n# Ensure Rust toolchain is up to date\nrustup update\n\n# Clean build if needed\ncargo clean\ncargo build --release\n\n# Run tests to verify implementation\ncargo test\n```\n\n3. **Configuration Issues**\n```bash\n# Validate configuration\ncargo run --bin rustmq-broker -- --config config/broker.toml\n\n# Check configuration structure in src/config.rs\n# All fields must be present in TOML files\n```\n\n### Log Analysis\n\n```bash\n# View service logs (from docker/ directory)\ncd docker\ndocker-compose logs rustmq-broker-1\ndocker-compose logs rustmq-controller-1\n\n# Check for configuration validation errors\ndocker-compose logs | grep ERROR\n\n# Monitor BigQuery subscriber demo\ndocker-compose logs rustmq-bigquery-subscriber\n```\n\nFor complete Docker and Kubernetes deployment guides, troubleshooting, and configuration details, see [docker/README.md](docker/README.md).\n\n## 🤝 Contributing\n\nWe welcome contributions to help implement the remaining features! \n\n### Current Development Priorities\n\n1. **Message Broker Core**: Implement actual produce/consume functionality\n2. **Network Layer**: Complete QUIC/gRPC server implementations\n3. **Distributed Coordination**: Implement Raft consensus and metadata management\n4. **Client Libraries**: Build Rust and Go client libraries\n5. **Admin API**: Implement REST API for cluster management\n\n### Development Setup\n\n```bash\n# Clone and setup\ngit clone https://github.com/cloudymoma/rustmq.git\ncd rustmq\n\n# Install development dependencies\ncargo install cargo-watch cargo-audit cargo-tarpaulin\n\n# Run tests with coverage\ncargo tarpaulin --out Html\n\n# Watch for changes during development\ncargo watch -x test -x clippy\n```\n\n### Testing\n\n```bash\n# Unit tests (currently 88 tests passing)\ncargo test --lib\n\n# Integration tests (9 broker core tests + others)\ncargo test --test integration_broker_core\n\n# Run specific module tests\ncargo test storage::\ncargo test scaling::\ncargo test broker::core\n\n# Run with features\ncargo test --features \"io-uring,wasm\"\n\n# All tests\ncargo test\n```\n\n## 📄 License\n\nThis project is licensed under The Bindiego License (BDL), Version 1.0 - see the [LICENSE](LICENSE) file for details.\n\n### License Summary\n\n- ✅ **Academic Use**: Freely available for teaching, research, and educational purposes\n- ✅ **Contributions**: Welcome contributions back to the original project  \n- ❌ **Commercial Use**: Prohibited without separate commercial license\n- ❌ **Managed Services**: Cannot offer RustMQ as a hosted service\n\nFor commercial licensing inquiries, please contact the license holder through the official repository.\n\n## 🔗 Links\n\n- [Issue Tracker](https://github.com/cloudymoma/rustmq/issues)\n\n---\n\n**RustMQ** - Built with ❤️ in Rust for the cloud-native future. Optimized for Google Cloud Platform.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudymoma%2Frustmq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloudymoma%2Frustmq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudymoma%2Frustmq/lists"}