{"id":39584726,"url":"https://github.com/Mahir101/Rafka","last_synced_at":"2026-01-26T15:01:15.168Z","repository":{"id":320499099,"uuid":"1082314470","full_name":"Mahir101/Rafka","owner":"Mahir101","description":"Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.","archived":false,"fork":false,"pushed_at":"2025-11-26T09:00:02.000Z","size":239,"stargazers_count":32,"open_issues_count":6,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-13T14:33:48.695Z","etag":null,"topics":["distributed-systems","engineering-marble","kafka","memory-safe","message-broker","networking","optimized","p2p","rafka","rust","rust-lang"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/rafka-rs","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mahir101.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-24T04:02:55.000Z","updated_at":"2025-12-09T18:21:58.000Z","dependencies_parsed_at":"2025-10-24T06:19:01.167Z","dependency_job_id":null,"html_url":"https://github.com/Mahir101/Rafka","commit_stats":null,"previous_names":["mahir101/rafka"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Mahir101/Rafka","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mahir101%2FRafka","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mahir101%2FRafka/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mahir101%2FRafka/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mahir101%2FRafka/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mahir101","download_url":"https://codeload.github.com/Mahir101/Rafka/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mahir101%2FRafka/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28781308,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-26T13:55:28.044Z","status":"ssl_error","status_checked_at":"2026-01-26T13:55:26.068Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-systems","engineering-marble","kafka","memory-safe","message-broker","networking","optimized","p2p","rafka","rust","rust-lang"],"created_at":"2026-01-18T07:35:27.099Z","updated_at":"2026-01-26T15:01:15.162Z","avatar_url":"https://github.com/Mahir101.png","language":"Rust","funding_links":[],"categories":["\u003ca name=\"Rust\"\u003e\u003c/a\u003eRust"],"sub_categories":[],"readme":"# Rafka\n\n**A High-Performance Distributed Message Broker Built in Rust**\n\nRafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.\n\n## 🚀 Key Features\n\n- **High-Performance Async Architecture**: Built on Tokio for maximum concurrency and throughput\n- **gRPC Communication**: Modern protocol buffers for efficient inter-service communication\n- **Partitioned Message Processing**: Hash-based partitioning for horizontal scalability\n- **Disk-based Persistence**: Write-Ahead Log (WAL) for message durability\n- **Consumer Groups**: Load-balanced message consumption with partition assignment\n- **Replication**: Multi-replica partitions with ISR tracking for high availability\n- **Log Compaction**: Multiple strategies (KeepLatest, TimeWindow, Hybrid) for storage optimization\n- **Transactions**: Two-Phase Commit (2PC) with idempotent producer support\n- **Comprehensive Monitoring**: Health checks, heartbeat tracking, and circuit breakers\n- **Real-time Metrics**: Prometheus-compatible metrics export with latency histograms\n- **Stream Processing**: Kafka Streams-like API for message transformation and aggregation\n- **Offset Tracking**: Consumer offset management for reliable message delivery\n- **Retention Policies**: Configurable message retention based on age and size\n- **Modular Design**: Clean separation of concerns across multiple crates\n\n## 🆚 Rafka vs Apache Kafka Feature Comparison\n\n| Feature | Apache Kafka | Rafka (Current) | Status |\n|---------|--------------|-----------------|--------|\n| **Storage** | Disk-based (Persistent) | Disk-based WAL (Persistent) | ✅ Implemented |\n| **Architecture** | Leader/Follower (Zookeeper/KRaft) | P2P Mesh / Distributed | 🔄 Different Approach |\n| **Consumption Model** | Consumer Groups (Load Balancing) | Consumer Groups + Pub/Sub | ✅ Implemented |\n| **Replication** | Multi-replica with ISR | Multi-replica with ISR | ✅ Implemented |\n| **Message Safety** | WAL (Write Ahead Log) | WAL (Write Ahead Log) | ✅ Implemented |\n| **Transactions** | Exactly-once semantics | 2PC with Idempotent Producers | ✅ Implemented |\n| **Compaction** | Log Compaction | Log Compaction (Multiple Strategies) | ✅ Implemented |\n| **Ecosystem** | Connect, Streams, Schema Registry | Core Broker only | ❌ Missing |\n\n### 🔍 Feature Implementation Status\n\n#### ✅ Implemented Features\n\n1.  **Disk-based Persistence (WAL)**: Rafka now implements a Write-Ahead Log (WAL) for message durability. Messages are persisted to disk and survive broker restarts.\n2.  **Consumer Groups**: Rafka supports consumer groups with load balancing. Multiple consumers can share the load of a topic, with each partition being consumed by only one member of the group. Both Range and RoundRobin partition assignment strategies are supported.\n3.  **Replication \u0026 High Availability**: Rafka implements multi-replica partitions with In-Sync Replica (ISR) tracking and leader election for high availability.\n4.  **Log Compaction**: Rafka supports log compaction with multiple strategies (KeepLatest, TimeWindow, Hybrid) to optimize storage by keeping only the latest value for a key.\n5.  **Transactions**: Rafka implements atomic writes across multiple partitions/topics using Two-Phase Commit (2PC) protocol with idempotent producer support.\n\n#### ❌ Missing Features\n\n1.  **Ecosystem Tools**: Unlike Apache Kafka, Rafka currently lacks ecosystem tools like Kafka Connect (for data integration), Kafka Streams (for stream processing), and Schema Registry (for schema management). These would need to be developed separately to provide a complete data streaming platform.\n\n\n## 🏗️ Architecture Overview\n\n### System Architecture Diagram\n\n```mermaid\ngraph TB\n    subgraph \"Client Layer\"\n        P[Producer]\n        C[Consumer]\n    end\n    \n    subgraph \"Broker Cluster\"\n        B1[Broker 1\u003cbr/\u003ePartition 0]\n        B2[Broker 2\u003cbr/\u003ePartition 1]\n        B3[Broker 3\u003cbr/\u003ePartition 2]\n    end\n    \n    subgraph \"Storage Layer\"\n        S1[In-Memory DB\u003cbr/\u003ePartition 0]\n        S2[In-Memory DB\u003cbr/\u003ePartition 1]\n        S3[In-Memory DB\u003cbr/\u003ePartition 2]\n    end\n    \n    P --\u003e|gRPC Publish| B1\n    P --\u003e|gRPC Publish| B2\n    P --\u003e|gRPC Publish| B3\n    \n    B1 --\u003e|Store Messages| S1\n    B2 --\u003e|Store Messages| S2\n    B3 --\u003e|Store Messages| S3\n    \n    C --\u003e|gRPC Consume| B1\n    C --\u003e|gRPC Consume| B2\n    C --\u003e|gRPC Consume| B3\n    \n    B1 --\u003e|Broadcast Stream| C\n    B2 --\u003e|Broadcast Stream| C\n    B3 --\u003e|Broadcast Stream| C\n```\n\n### Message Flow Sequence Diagram\n\n```mermaid\nsequenceDiagram\n    participant P as Producer\n    participant B as Broker\n    participant S as Storage\n    participant C as Consumer\n    \n    P-\u003e\u003eB: PublishRequest(topic, key, payload)\n    B-\u003e\u003eB: Hash key for partition\n    B-\u003e\u003eB: Check partition ownership\n    B-\u003e\u003eS: Store message with offset\n    S--\u003e\u003eB: Return offset\n    B-\u003e\u003eB: Broadcast to subscribers\n    B--\u003e\u003eP: PublishResponse(message_id, offset)\n    \n    C-\u003e\u003eB: ConsumeRequest(topic)\n    B-\u003e\u003eB: Create broadcast stream\n    B--\u003e\u003eC: ConsumeResponse stream\n    \n    loop Message Processing\n        B-\u003e\u003eC: ConsumeResponse(message)\n        C-\u003e\u003eB: AcknowledgeRequest(message_id)\n        C-\u003e\u003eB: UpdateOffsetRequest(offset)\n    end\n```\n\n## 📁 Project Structure\n\n```\nrafka/\n├── Cargo.toml                 # Workspace manifest\n├── config/\n│   └── config.yml            # Configuration file\n├── scripts/                  # Demo and utility scripts\n│   ├── helloworld.sh         # Basic producer-consumer demo\n│   ├── partitioned_demo.sh   # Multi-broker partitioning demo\n│   ├── retention_demo.sh     # Message retention demo\n│   ├── offset_tracking_demo.sh # Consumer offset tracking demo\n│   └── kill.sh               # Process cleanup script\n├── src/\n│   └── bin/                  # Executable binaries\n│       ├── start_broker.rs   # Broker server\n│       ├── start_producer.rs # Producer client\n│       ├── start_consumer.rs # Consumer client\n│       └── check_metrics.rs  # Metrics monitoring\n├── crates/                   # Core library crates\n│   ├── core/                 # Core types and gRPC definitions\n│   │   ├── src/\n│   │   │   ├── lib.rs\n│   │   │   ├── message.rs    # Message structures\n│   │   │   └── proto/\n│   │   │       └── rafka.proto # gRPC service definitions\n│   │   └── build.rs          # Protocol buffer compilation\n│   ├── broker/               # Broker implementation\n│   │   └── src/\n│   │       ├── lib.rs\n│   │       └── broker.rs     # Core broker logic\n│   ├── producer/             # Producer implementation\n│   │   └── src/\n│   │       ├── lib.rs\n│   │       └── producer.rs   # Producer client\n│   ├── consumer/             # Consumer implementation\n│   │   └── src/\n│   │       ├── lib.rs\n│   │       └── consumer.rs   # Consumer client\n│   └── storage/              # Storage engine\n│       └── src/\n│           ├── lib.rs\n│           └── db.rs         # In-memory database\n├── docs/\n│   └── getting_started.md    # Getting started guide\n├── tasks/\n│   └── Roadmap.md           # Development roadmap\n├── Dockerfile               # Container configuration\n└── LICENSE                  # MIT License\n```\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- **Rust**: Latest stable version (1.70+)\n- **Cargo**: Comes with Rust installation\n- **Protocol Buffers**: For gRPC compilation\n\n### Installation\n\n1. **Clone the repository**:\n```bash\ngit clone https://github.com/yourusername/rafka.git\ncd rafka\n```\n\n2. **Build the project**:\n```bash\ncargo build --release\n```\n\n3. **Run the basic demo**:\n```bash\n./scripts/helloworld.sh\n```\n\n### Manual Setup\n\n1. **Start a broker**:\n```bash\ncargo run --bin start_broker -- --port 50051 --partition 0 --total-partitions 3\n```\n\n2. **Start a consumer**:\n```bash\ncargo run --bin start_consumer -- --port 50051\n```\n\n3. **Send messages**:\n```bash\ncargo run --bin start_producer -- --message \"Hello, Rafka!\" --key \"test-key\"\n```\n\n## 🔧 Configuration\n\n### Broker Configuration\n\nThe broker can be configured via command-line arguments:\n\n```bash\ncargo run --bin start_broker -- \\\n  --port 50051 \\\n  --partition 0 \\\n  --total-partitions 3 \\\n  --retention-seconds 604800\n```\n\n**Available Options**:\n- `--port`: Broker listening port (default: 50051)\n- `--partition`: Partition ID for this broker (default: 0)\n- `--total-partitions`: Total number of partitions (default: 1)\n- `--retention-seconds`: Message retention time in seconds (default: 7 days)\n\n### Configuration File\n\nEdit `config/config.yml` for persistent settings:\n\n```yaml\nserver:\n  host: \"127.0.0.1\"\n  port: 9092\n\nlog:\n  level: \"info\"  # debug, info, warn, error\n\nbroker:\n  replication_factor: 3\n  default_topic_partitions: 1\n\nstorage:\n  type: \"in_memory\"\n```\n\n## 🏛️ Core Components\n\n### 1. Core (`rafka-core`)\n\n**Purpose**: Defines fundamental types and gRPC service contracts.\n\n**Key Components**:\n- **Message Structures**: `Message`, `MessageAck`, `BenchmarkMetrics`\n- **gRPC Definitions**: Protocol buffer definitions for all services\n- **Serialization**: Serde-based serialization for message handling\n\n**Key Files**:\n- `message.rs`: Core message types and acknowledgment structures\n- `proto/rafka.proto`: gRPC service definitions\n\n### 2. Broker (`rafka-broker`)\n\n**Purpose**: Central message routing and coordination service.\n\n**Key Features**:\n- **Partition Management**: Hash-based message partitioning\n- **Topic Management**: Dynamic topic creation and subscription\n- **Broadcast Channels**: Efficient message distribution to consumers\n- **Offset Tracking**: Consumer offset management\n- **Retention Policies**: Configurable message retention\n- **Metrics Collection**: Real-time performance metrics\n\n**Key Operations**:\n- `publish()`: Accept messages from producers\n- `consume()`: Stream messages to consumers\n- `subscribe()`: Register consumer subscriptions\n- `acknowledge()`: Process message acknowledgments\n- `update_offset()`: Track consumer progress\n\n### 3. Producer (`rafka-producer`)\n\n**Purpose**: Client library for publishing messages to brokers.\n\n**Key Features**:\n- **Connection Management**: Automatic broker connection handling\n- **Message Publishing**: Reliable message delivery with acknowledgments\n- **Error Handling**: Comprehensive error reporting\n- **UUID Generation**: Unique message identification\n\n**Usage Example**:\n```rust\nlet mut producer = Producer::new(\"127.0.0.1:50051\").await?;\nproducer.publish(\"my-topic\".to_string(), \"Hello World\".to_string(), \"key-1\".to_string()).await?;\n```\n\n### 4. Consumer (`rafka-consumer`)\n\n**Purpose**: Client library for consuming messages from brokers.\n\n**Key Features**:\n- **Subscription Management**: Topic subscription handling\n- **Stream Processing**: Asynchronous message streaming\n- **Automatic Acknowledgment**: Built-in message acknowledgment\n- **Offset Tracking**: Automatic offset updates\n- **Channel-based API**: Clean async/await interface\n\n**Usage Example**:\n```rust\nlet mut consumer = Consumer::new(\"127.0.0.1:50051\").await?;\nconsumer.subscribe(\"my-topic\".to_string()).await?;\nlet mut rx = consumer.consume(\"my-topic\".to_string()).await?;\nwhile let Some(message) = rx.recv().await {\n    println!(\"Received: {}\", message);\n}\n```\n\n### 5. Storage (`rafka-storage`)\n\n**Purpose**: High-performance in-memory storage engine.\n\n**Key Features**:\n- **Partition-based Storage**: Separate queues per partition\n- **Retention Policies**: Age and size-based message retention\n- **Offset Management**: Efficient offset tracking and retrieval\n- **Acknowledgment Tracking**: Consumer acknowledgment management\n- **Metrics Collection**: Storage performance metrics\n- **Memory Optimization**: Efficient memory usage with cleanup\n\n**Storage Architecture**:\n```mermaid\ngraph LR\n    subgraph \"Storage Engine\"\n        T[Topic]\n        P1[Partition 0]\n        P2[Partition 1]\n        P3[Partition 2]\n        \n        T --\u003e P1\n        T --\u003e P2\n        T --\u003e P3\n        \n        P1 --\u003e Q1[Message Queue]\n        P2 --\u003e Q2[Message Queue]\n        P3 --\u003e Q3[Message Queue]\n    end\n```\n\n## 🔄 Message Flow\n\n### Publishing Flow\n\n1. **Producer** sends `PublishRequest` to **Broker**\n2. **Broker** hashes the message key to determine partition\n3. **Broker** checks partition ownership\n4. **Broker** stores message in **Storage** with unique offset\n5. **Broker** broadcasts message to subscribed consumers\n6. **Broker** returns `PublishResponse` with message ID and offset\n\n### Consumption Flow\n\n1. **Consumer** sends `ConsumeRequest` to **Broker**\n2. **Broker** creates broadcast stream for the topic\n3. **Broker** streams messages via gRPC to **Consumer**\n4. **Consumer** processes message and sends acknowledgment\n5. **Consumer** updates offset to track progress\n6. **Storage** cleans up acknowledged messages based on retention policy\n\n## 📊 Performance Features\n\n### Partitioning Strategy\n\nRafka uses hash-based partitioning for efficient message distribution:\n\n```rust\nfn hash_key(\u0026self, key: \u0026str) -\u003e u32 {\n    key.bytes().fold(0u32, |acc, b| acc.wrapping_add(b as u32))\n}\n\nfn owns_partition(\u0026self, message_key: \u0026str) -\u003e bool {\n    let hash = self.hash_key(message_key);\n    hash % self.total_partitions == self.partition_id\n}\n```\n\n### Retention Policies\n\nConfigurable message retention based on:\n- **Time-based**: Maximum age (default: 7 days)\n- **Size-based**: Maximum storage size (default: 1GB)\n\n### Metrics Collection\n\nBuilt-in metrics for monitoring:\n- Total messages stored\n- Total bytes consumed\n- Oldest message age\n- Consumer offset positions\n\n## 🧪 Demo Scripts\n\n### 1. Hello World Demo\n```bash\n./scripts/helloworld.sh\n```\nBasic producer-consumer interaction demonstration.\n\n### 2. Partitioned Demo\n```bash\n./scripts/partitioned_demo.sh\n```\nMulti-broker setup with hash-based partitioning.\n\n### 3. Retention Demo\n```bash\n./scripts/retention_demo.sh\n```\nDemonstrates message retention policies.\n\n### 4. Offset Tracking Demo\n```bash\n./scripts/offset_tracking_demo.sh\n```\nShows consumer offset management and recovery.\n\n## 🛠️ Development\n\n### Building from Source\n\n```bash\n# Clone repository\ngit clone https://github.com/yourusername/rafka.git\ncd rafka\n\n# Build all crates\ncargo build\n\n# Run tests\ncargo test\n\n# Build release version\ncargo build --release\n```\n\n### Running Tests\n\n```bash\n# Run all tests\ncargo test\n\n# Run specific crate tests\ncargo test -p rafka-storage\ncargo test -p rafka-broker\n```\n\n### Code Structure\n\nThe project follows Rust best practices with:\n- **Workspace Organization**: Multiple crates in a single workspace\n- **Separation of Concerns**: Each component in its own crate\n- **Async/Await**: Modern async Rust with Tokio\n- **Error Handling**: Comprehensive error types and handling\n- **Testing**: Unit tests for all major components\n\n## 🚧 Current Status\n\n**⚠️ Early Development - Not Production Ready**\n\nRafka is currently in active development. The current implementation provides:\n\n✅ **Completed Features**:\n- Basic message publishing and consumption\n- Hash-based partitioning\n- In-memory storage with retention policies\n- Consumer offset tracking\n- gRPC-based communication\n- Metrics collection\n- Demo scripts and examples\n\n🔄 **In Progress**:\n- Peer-to-peer mesh networking\n- Distributed consensus algorithms\n- Kubernetes deployment configurations\n- Performance optimizations\n\n📋 **Planned Features**:\n- Replication across multiple brokers\n- Fault tolerance and recovery\n- Security and authentication\n- Client SDKs for multiple languages\n- Comprehensive monitoring and alerting\n\n## 🤝 Contributing\n\nWe welcome contributions! Here are some areas where you can help:\n\n### High Priority\n- **P2P Mesh Implementation**: Distributed node discovery and communication\n- **Consensus Algorithms**: Leader election and cluster coordination\n- **Replication**: Cross-broker message replication\n- **Fault Tolerance**: Node failure detection and recovery\n\n### Medium Priority\n- **Performance Optimization**: Message batching and compression\n- **Security**: TLS encryption and authentication\n- **Monitoring**: Prometheus metrics and Grafana dashboards\n- **Documentation**: API documentation and tutorials\n\n### Getting Started\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests for new functionality\n5. Submit a pull request\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- [Apache Kafka](https://kafka.apache.org) for inspiration on messaging systems\n- [Tokio](https://tokio.rs) for the excellent async runtime\n- [Tonic](https://github.com/hyperium/tonic) for gRPC implementation\n- [@wyattgill9](https://github.com/wyattgill9) for the initial proof of concept\n- The Rust community for their excellent libraries and support\n\n---\n\n**Built with ❤️ in Rust**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMahir101%2FRafka","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMahir101%2FRafka","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMahir101%2FRafka/lists"}