https://github.com/arkflow-rs/arkflow
High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
https://github.com/arkflow-rs/arkflow
arkflow datafusion duckdb flow kafka mysql postgresql rust rust-lang sql sqlite stream tokio tokio-rs
Last synced: 9 days ago
JSON representation
High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
- Host: GitHub
- URL: https://github.com/arkflow-rs/arkflow
- Owner: arkflow-rs
- License: apache-2.0
- Created: 2025-03-01T03:02:55.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-05-11T02:47:27.000Z (14 days ago)
- Last Synced: 2025-05-11T03:29:38.502Z (14 days ago)
- Topics: arkflow, datafusion, duckdb, flow, kafka, mysql, postgresql, rust, rust-lang, sql, sqlite, stream, tokio, tokio-rs
- Language: Rust
- Homepage: https://arkflow-rs.com
- Size: 1.48 MB
- Stars: 953
- Watchers: 1
- Forks: 22
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-streaming - ArkFlow - High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors. (Table of Contents / Streaming Engine)
- trackawesomelist - ArkFlow (⭐478) - High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors. (Recently Updated / [May 11, 2025](/content/2025/05/11/README.md))
- awesome-rust - arkflow-rs/arkflow - High-performance Rust stream processing engine [](https://github.com/arkflow-rs/arkflow/actions) (Libraries / Data streaming)
README
# ArkFlow
![]()
English | [中文](README_zh.md)
[](https://github.com/arkflow-rs/arkflow/actions/workflows/rust.yml)
[](LICENSE)High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting
multiple input/output sources and processors.## Features
- **High Performance**: Built on Rust and Tokio async runtime, offering excellent performance and low latency
- **Multiple Data Sources**: Support for Kafka, MQTT, HTTP, files, and other input/output sources
- **Powerful Processing Capabilities**: Built-in SQL queries, JSON processing, Protobuf encoding/decoding, batch
processing, and other processors
- **Extensible**: Modular design, easy to extend with new input, buffer, output, and processor components## Installation
### Building from Source
```bash
# Clone the repository
git clone https://github.com/arkflow-rs/arkflow.git
cd arkflow# Build the project
cargo build --release# Run tests
cargo test
```## Quick Start
1. Create a configuration file `config.yaml`:
```yaml
logging:
level: info
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1s
batch_size: 10pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT * FROM flow WHERE value >= 10"output:
type: "stdout"
error_output:
type: "stdout"
```2. Run ArkFlow:
```bash
./target/release/arkflow --config config.yaml
```## Configuration Guide
ArkFlow uses YAML format configuration files, supporting the following main configuration items:
### Top-level Configuration
```yaml
logging:
level: info # Log level: debug, info, warn, errorstreams: # Stream definition list
- input: # Input configuration
# ...
pipeline: # Processing pipeline configuration
# ...
output: # Output configuration
# ...
error_output: # Error output configuration
# ...
buffer: # Buffer configuration
# ...
```### Input Components
ArkFlow supports multiple input sources:
- **Kafka**: Read data from Kafka topics
- **MQTT**: Subscribe to messages from MQTT topics
- **HTTP**: Receive data via HTTP
- **File**: Reading data from files(Csv,Json, Parquet, Avro, Arrow) using SQL
- **Generator**: Generate test data
- **Database**: Query data from databases(MySQL, PostgreSQL, SQLite, Duckdb)
- **Nats**: Subscribe to messages from Nats topics
- **Redis**: Subscribe to messages from Redis channels or lists
- **Websocket**: Subscribe to messages from WebSocket connectionsExample:
```yaml
input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-group
client_id: arkflow
start_from_latest: true
```### Processors
ArkFlow provides multiple data processors:
- **JSON**: JSON data processing and transformation
- **SQL**: Process data using SQL queries
- **Protobuf**: Protobuf encoding/decoding
- **Batch Processing**: Process messages in batches
- **Vrl**: Process data using [VRL](https://vector.dev/docs/reference/vrl/)Example:
```yaml
pipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value >= 10"
```### Output Components
ArkFlow supports multiple output targets:
- **Kafka**: Write data to Kafka topics
- **MQTT**: Publish messages to MQTT topics
- **HTTP**: Send data via HTTP
- **Standard Output**: Output data to the console
- **Drop**: Discard data
- **Nats**: Publish messages to Nats topicsExample:
```yaml
output:
type: kafka
brokers:
- localhost:9092
topic:
type: value
value:
type: value
value: test-topic
client_id: arkflow-producer
```### Error Output Components
ArkFlow supports multiple error output targets:
- **Kafka**: Write error data to Kafka topics
- **MQTT**: Publish error messages to MQTT topics
- **HTTP**: Send error data via HTTP
- **Standard Output**: Output error data to the console
- **Drop**: Discard error data
- **Nats**: Publish messages to Nats topicsExample:
```yaml
error_output:
type: kafka
brokers:
- localhost:9092
topic:
type: value
value: error-topic
client_id: error-arkflow-producer
```### Buffer Components
ArkFlow provides buffer capabilities to handle backpressure and temporary storage of messages:
- **Memory Buffer**: Memory buffer, for high-throughput scenarios and window aggregation.
- **Session Window**: The Session Window buffer component provides a session-based message grouping mechanism where messages are grouped based on activity gaps. It implements a session window that closes after a configurable period of inactivity.
- **Sliding Window**: The Sliding Window buffer component provides a time-based windowing mechanism for processing message batches. It implements a sliding window algorithm with configurable window size, slide interval and slide size.
- **Tumbling Window**: The Tumbling Window buffer component provides a fixed-size, non-overlapping windowing mechanism for processing message batches. It implements a tumbling window algorithm with configurable interval settings.Example:
```yaml
buffer:
type: memory
capacity: 10000 # Maximum number of messages to buffer
timeout: 10s # Maximum time to buffer messages
```## Examples
### Kafka to Kafka Data Processing
```yaml
streams:
- input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-grouppipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value > 100"output:
type: kafka
brokers:
- localhost:9092
topic:
type: value
value: test-topic
```### Generate Test Data and Process
```yaml
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1ms
batch_size: 10000pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT count(*) FROM flow WHERE value >= 10 group by sensor"output:
type: "stdout"
```## ArkFlow Plugin
[ArkFlow Plugin Examples](https://github.com/arkflow-rs/arkflow-plugin-examples)
## License
ArkFlow is licensed under the [Apache License 2.0](LICENSE).
## Community
Discord: https://discord.gg/CwKhzb8pux
If you like or are using this project to learn or start your solution, please give it a star⭐. Thanks!