https://github.com/Mahir101/Rafka
Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.
https://github.com/Mahir101/Rafka
distributed-systems engineering-marble kafka memory-safe message-broker networking optimized p2p rafka rust rust-lang
Last synced: 18 days ago
JSON representation
Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.
- Host: GitHub
- URL: https://github.com/Mahir101/Rafka
- Owner: Mahir101
- License: apache-2.0
- Created: 2025-10-24T04:02:55.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-11-26T09:00:02.000Z (3 months ago)
- Last Synced: 2025-12-13T14:33:48.695Z (2 months ago)
- Topics: distributed-systems, engineering-marble, kafka, memory-safe, message-broker, networking, optimized, p2p, rafka, rust, rust-lang
- Language: Rust
- Homepage: https://crates.io/crates/rafka-rs
- Size: 233 KB
- Stars: 32
- Watchers: 1
- Forks: 4
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome - Mahir101/Rafka - Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing. (<a name="Rust"></a>Rust)
README
# Rafka
**A High-Performance Distributed Message Broker Built in Rust**
Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.
## ๐ Key Features
- **High-Performance Async Architecture**: Built on Tokio for maximum concurrency and throughput
- **gRPC Communication**: Modern protocol buffers for efficient inter-service communication
- **Partitioned Message Processing**: Hash-based partitioning for horizontal scalability
- **Disk-based Persistence**: Write-Ahead Log (WAL) for message durability
- **Consumer Groups**: Load-balanced message consumption with partition assignment
- **Replication**: Multi-replica partitions with ISR tracking for high availability
- **Log Compaction**: Multiple strategies (KeepLatest, TimeWindow, Hybrid) for storage optimization
- **Transactions**: Two-Phase Commit (2PC) with idempotent producer support
- **Comprehensive Monitoring**: Health checks, heartbeat tracking, and circuit breakers
- **Real-time Metrics**: Prometheus-compatible metrics export with latency histograms
- **Stream Processing**: Kafka Streams-like API for message transformation and aggregation
- **Offset Tracking**: Consumer offset management for reliable message delivery
- **Retention Policies**: Configurable message retention based on age and size
- **Modular Design**: Clean separation of concerns across multiple crates
## ๐ Rafka vs Apache Kafka Feature Comparison
| Feature | Apache Kafka | Rafka (Current) | Status |
|---------|--------------|-----------------|--------|
| **Storage** | Disk-based (Persistent) | Disk-based WAL (Persistent) | โ
Implemented |
| **Architecture** | Leader/Follower (Zookeeper/KRaft) | P2P Mesh / Distributed | ๐ Different Approach |
| **Consumption Model** | Consumer Groups (Load Balancing) | Consumer Groups + Pub/Sub | โ
Implemented |
| **Replication** | Multi-replica with ISR | Multi-replica with ISR | โ
Implemented |
| **Message Safety** | WAL (Write Ahead Log) | WAL (Write Ahead Log) | โ
Implemented |
| **Transactions** | Exactly-once semantics | 2PC with Idempotent Producers | โ
Implemented |
| **Compaction** | Log Compaction | Log Compaction (Multiple Strategies) | โ
Implemented |
| **Ecosystem** | Connect, Streams, Schema Registry | Core Broker only | โ Missing |
### ๐ Feature Implementation Status
#### โ
Implemented Features
1. **Disk-based Persistence (WAL)**: Rafka now implements a Write-Ahead Log (WAL) for message durability. Messages are persisted to disk and survive broker restarts.
2. **Consumer Groups**: Rafka supports consumer groups with load balancing. Multiple consumers can share the load of a topic, with each partition being consumed by only one member of the group. Both Range and RoundRobin partition assignment strategies are supported.
3. **Replication & High Availability**: Rafka implements multi-replica partitions with In-Sync Replica (ISR) tracking and leader election for high availability.
4. **Log Compaction**: Rafka supports log compaction with multiple strategies (KeepLatest, TimeWindow, Hybrid) to optimize storage by keeping only the latest value for a key.
5. **Transactions**: Rafka implements atomic writes across multiple partitions/topics using Two-Phase Commit (2PC) protocol with idempotent producer support.
#### โ Missing Features
1. **Ecosystem Tools**: Unlike Apache Kafka, Rafka currently lacks ecosystem tools like Kafka Connect (for data integration), Kafka Streams (for stream processing), and Schema Registry (for schema management). These would need to be developed separately to provide a complete data streaming platform.
## ๐๏ธ Architecture Overview
### System Architecture Diagram
```mermaid
graph TB
subgraph "Client Layer"
P[Producer]
C[Consumer]
end
subgraph "Broker Cluster"
B1[Broker 1
Partition 0]
B2[Broker 2
Partition 1]
B3[Broker 3
Partition 2]
end
subgraph "Storage Layer"
S1[In-Memory DB
Partition 0]
S2[In-Memory DB
Partition 1]
S3[In-Memory DB
Partition 2]
end
P -->|gRPC Publish| B1
P -->|gRPC Publish| B2
P -->|gRPC Publish| B3
B1 -->|Store Messages| S1
B2 -->|Store Messages| S2
B3 -->|Store Messages| S3
C -->|gRPC Consume| B1
C -->|gRPC Consume| B2
C -->|gRPC Consume| B3
B1 -->|Broadcast Stream| C
B2 -->|Broadcast Stream| C
B3 -->|Broadcast Stream| C
```
### Message Flow Sequence Diagram
```mermaid
sequenceDiagram
participant P as Producer
participant B as Broker
participant S as Storage
participant C as Consumer
P->>B: PublishRequest(topic, key, payload)
B->>B: Hash key for partition
B->>B: Check partition ownership
B->>S: Store message with offset
S-->>B: Return offset
B->>B: Broadcast to subscribers
B-->>P: PublishResponse(message_id, offset)
C->>B: ConsumeRequest(topic)
B->>B: Create broadcast stream
B-->>C: ConsumeResponse stream
loop Message Processing
B->>C: ConsumeResponse(message)
C->>B: AcknowledgeRequest(message_id)
C->>B: UpdateOffsetRequest(offset)
end
```
## ๐ Project Structure
```
rafka/
โโโ Cargo.toml # Workspace manifest
โโโ config/
โ โโโ config.yml # Configuration file
โโโ scripts/ # Demo and utility scripts
โ โโโ helloworld.sh # Basic producer-consumer demo
โ โโโ partitioned_demo.sh # Multi-broker partitioning demo
โ โโโ retention_demo.sh # Message retention demo
โ โโโ offset_tracking_demo.sh # Consumer offset tracking demo
โ โโโ kill.sh # Process cleanup script
โโโ src/
โ โโโ bin/ # Executable binaries
โ โโโ start_broker.rs # Broker server
โ โโโ start_producer.rs # Producer client
โ โโโ start_consumer.rs # Consumer client
โ โโโ check_metrics.rs # Metrics monitoring
โโโ crates/ # Core library crates
โ โโโ core/ # Core types and gRPC definitions
โ โ โโโ src/
โ โ โ โโโ lib.rs
โ โ โ โโโ message.rs # Message structures
โ โ โ โโโ proto/
โ โ โ โโโ rafka.proto # gRPC service definitions
โ โ โโโ build.rs # Protocol buffer compilation
โ โโโ broker/ # Broker implementation
โ โ โโโ src/
โ โ โโโ lib.rs
โ โ โโโ broker.rs # Core broker logic
โ โโโ producer/ # Producer implementation
โ โ โโโ src/
โ โ โโโ lib.rs
โ โ โโโ producer.rs # Producer client
โ โโโ consumer/ # Consumer implementation
โ โ โโโ src/
โ โ โโโ lib.rs
โ โ โโโ consumer.rs # Consumer client
โ โโโ storage/ # Storage engine
โ โโโ src/
โ โโโ lib.rs
โ โโโ db.rs # In-memory database
โโโ docs/
โ โโโ getting_started.md # Getting started guide
โโโ tasks/
โ โโโ Roadmap.md # Development roadmap
โโโ Dockerfile # Container configuration
โโโ LICENSE # MIT License
```
## ๐ Quick Start
### Prerequisites
- **Rust**: Latest stable version (1.70+)
- **Cargo**: Comes with Rust installation
- **Protocol Buffers**: For gRPC compilation
### Installation
1. **Clone the repository**:
```bash
git clone https://github.com/yourusername/rafka.git
cd rafka
```
2. **Build the project**:
```bash
cargo build --release
```
3. **Run the basic demo**:
```bash
./scripts/helloworld.sh
```
### Manual Setup
1. **Start a broker**:
```bash
cargo run --bin start_broker -- --port 50051 --partition 0 --total-partitions 3
```
2. **Start a consumer**:
```bash
cargo run --bin start_consumer -- --port 50051
```
3. **Send messages**:
```bash
cargo run --bin start_producer -- --message "Hello, Rafka!" --key "test-key"
```
## ๐ง Configuration
### Broker Configuration
The broker can be configured via command-line arguments:
```bash
cargo run --bin start_broker -- \
--port 50051 \
--partition 0 \
--total-partitions 3 \
--retention-seconds 604800
```
**Available Options**:
- `--port`: Broker listening port (default: 50051)
- `--partition`: Partition ID for this broker (default: 0)
- `--total-partitions`: Total number of partitions (default: 1)
- `--retention-seconds`: Message retention time in seconds (default: 7 days)
### Configuration File
Edit `config/config.yml` for persistent settings:
```yaml
server:
host: "127.0.0.1"
port: 9092
log:
level: "info" # debug, info, warn, error
broker:
replication_factor: 3
default_topic_partitions: 1
storage:
type: "in_memory"
```
## ๐๏ธ Core Components
### 1. Core (`rafka-core`)
**Purpose**: Defines fundamental types and gRPC service contracts.
**Key Components**:
- **Message Structures**: `Message`, `MessageAck`, `BenchmarkMetrics`
- **gRPC Definitions**: Protocol buffer definitions for all services
- **Serialization**: Serde-based serialization for message handling
**Key Files**:
- `message.rs`: Core message types and acknowledgment structures
- `proto/rafka.proto`: gRPC service definitions
### 2. Broker (`rafka-broker`)
**Purpose**: Central message routing and coordination service.
**Key Features**:
- **Partition Management**: Hash-based message partitioning
- **Topic Management**: Dynamic topic creation and subscription
- **Broadcast Channels**: Efficient message distribution to consumers
- **Offset Tracking**: Consumer offset management
- **Retention Policies**: Configurable message retention
- **Metrics Collection**: Real-time performance metrics
**Key Operations**:
- `publish()`: Accept messages from producers
- `consume()`: Stream messages to consumers
- `subscribe()`: Register consumer subscriptions
- `acknowledge()`: Process message acknowledgments
- `update_offset()`: Track consumer progress
### 3. Producer (`rafka-producer`)
**Purpose**: Client library for publishing messages to brokers.
**Key Features**:
- **Connection Management**: Automatic broker connection handling
- **Message Publishing**: Reliable message delivery with acknowledgments
- **Error Handling**: Comprehensive error reporting
- **UUID Generation**: Unique message identification
**Usage Example**:
```rust
let mut producer = Producer::new("127.0.0.1:50051").await?;
producer.publish("my-topic".to_string(), "Hello World".to_string(), "key-1".to_string()).await?;
```
### 4. Consumer (`rafka-consumer`)
**Purpose**: Client library for consuming messages from brokers.
**Key Features**:
- **Subscription Management**: Topic subscription handling
- **Stream Processing**: Asynchronous message streaming
- **Automatic Acknowledgment**: Built-in message acknowledgment
- **Offset Tracking**: Automatic offset updates
- **Channel-based API**: Clean async/await interface
**Usage Example**:
```rust
let mut consumer = Consumer::new("127.0.0.1:50051").await?;
consumer.subscribe("my-topic".to_string()).await?;
let mut rx = consumer.consume("my-topic".to_string()).await?;
while let Some(message) = rx.recv().await {
println!("Received: {}", message);
}
```
### 5. Storage (`rafka-storage`)
**Purpose**: High-performance in-memory storage engine.
**Key Features**:
- **Partition-based Storage**: Separate queues per partition
- **Retention Policies**: Age and size-based message retention
- **Offset Management**: Efficient offset tracking and retrieval
- **Acknowledgment Tracking**: Consumer acknowledgment management
- **Metrics Collection**: Storage performance metrics
- **Memory Optimization**: Efficient memory usage with cleanup
**Storage Architecture**:
```mermaid
graph LR
subgraph "Storage Engine"
T[Topic]
P1[Partition 0]
P2[Partition 1]
P3[Partition 2]
T --> P1
T --> P2
T --> P3
P1 --> Q1[Message Queue]
P2 --> Q2[Message Queue]
P3 --> Q3[Message Queue]
end
```
## ๐ Message Flow
### Publishing Flow
1. **Producer** sends `PublishRequest` to **Broker**
2. **Broker** hashes the message key to determine partition
3. **Broker** checks partition ownership
4. **Broker** stores message in **Storage** with unique offset
5. **Broker** broadcasts message to subscribed consumers
6. **Broker** returns `PublishResponse` with message ID and offset
### Consumption Flow
1. **Consumer** sends `ConsumeRequest` to **Broker**
2. **Broker** creates broadcast stream for the topic
3. **Broker** streams messages via gRPC to **Consumer**
4. **Consumer** processes message and sends acknowledgment
5. **Consumer** updates offset to track progress
6. **Storage** cleans up acknowledged messages based on retention policy
## ๐ Performance Features
### Partitioning Strategy
Rafka uses hash-based partitioning for efficient message distribution:
```rust
fn hash_key(&self, key: &str) -> u32 {
key.bytes().fold(0u32, |acc, b| acc.wrapping_add(b as u32))
}
fn owns_partition(&self, message_key: &str) -> bool {
let hash = self.hash_key(message_key);
hash % self.total_partitions == self.partition_id
}
```
### Retention Policies
Configurable message retention based on:
- **Time-based**: Maximum age (default: 7 days)
- **Size-based**: Maximum storage size (default: 1GB)
### Metrics Collection
Built-in metrics for monitoring:
- Total messages stored
- Total bytes consumed
- Oldest message age
- Consumer offset positions
## ๐งช Demo Scripts
### 1. Hello World Demo
```bash
./scripts/helloworld.sh
```
Basic producer-consumer interaction demonstration.
### 2. Partitioned Demo
```bash
./scripts/partitioned_demo.sh
```
Multi-broker setup with hash-based partitioning.
### 3. Retention Demo
```bash
./scripts/retention_demo.sh
```
Demonstrates message retention policies.
### 4. Offset Tracking Demo
```bash
./scripts/offset_tracking_demo.sh
```
Shows consumer offset management and recovery.
## ๐ ๏ธ Development
### Building from Source
```bash
# Clone repository
git clone https://github.com/yourusername/rafka.git
cd rafka
# Build all crates
cargo build
# Run tests
cargo test
# Build release version
cargo build --release
```
### Running Tests
```bash
# Run all tests
cargo test
# Run specific crate tests
cargo test -p rafka-storage
cargo test -p rafka-broker
```
### Code Structure
The project follows Rust best practices with:
- **Workspace Organization**: Multiple crates in a single workspace
- **Separation of Concerns**: Each component in its own crate
- **Async/Await**: Modern async Rust with Tokio
- **Error Handling**: Comprehensive error types and handling
- **Testing**: Unit tests for all major components
## ๐ง Current Status
**โ ๏ธ Early Development - Not Production Ready**
Rafka is currently in active development. The current implementation provides:
โ
**Completed Features**:
- Basic message publishing and consumption
- Hash-based partitioning
- In-memory storage with retention policies
- Consumer offset tracking
- gRPC-based communication
- Metrics collection
- Demo scripts and examples
๐ **In Progress**:
- Peer-to-peer mesh networking
- Distributed consensus algorithms
- Kubernetes deployment configurations
- Performance optimizations
๐ **Planned Features**:
- Replication across multiple brokers
- Fault tolerance and recovery
- Security and authentication
- Client SDKs for multiple languages
- Comprehensive monitoring and alerting
## ๐ค Contributing
We welcome contributions! Here are some areas where you can help:
### High Priority
- **P2P Mesh Implementation**: Distributed node discovery and communication
- **Consensus Algorithms**: Leader election and cluster coordination
- **Replication**: Cross-broker message replication
- **Fault Tolerance**: Node failure detection and recovery
### Medium Priority
- **Performance Optimization**: Message batching and compression
- **Security**: TLS encryption and authentication
- **Monitoring**: Prometheus metrics and Grafana dashboards
- **Documentation**: API documentation and tutorials
### Getting Started
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.
## ๐ Acknowledgments
- [Apache Kafka](https://kafka.apache.org) for inspiration on messaging systems
- [Tokio](https://tokio.rs) for the excellent async runtime
- [Tonic](https://github.com/hyperium/tonic) for gRPC implementation
- [@wyattgill9](https://github.com/wyattgill9) for the initial proof of concept
- The Rust community for their excellent libraries and support
---
**Built with โค๏ธ in Rust**