https://github.com/arkflow-rs/arkflow
High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
https://github.com/arkflow-rs/arkflow
datafusion duckdb flow kafka mysql postgresql rust rust-lang sql sqlite stream tokio tokio-rs
Last synced: 22 days ago
JSON representation
High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
- Host: GitHub
- URL: https://github.com/arkflow-rs/arkflow
- Owner: arkflow-rs
- License: apache-2.0
- Created: 2025-03-01T03:02:55.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-04-12T00:47:21.000Z (22 days ago)
- Last Synced: 2025-04-12T03:48:36.943Z (22 days ago)
- Topics: datafusion, duckdb, flow, kafka, mysql, postgresql, rust, rust-lang, sql, sqlite, stream, tokio, tokio-rs
- Language: Rust
- Homepage: https://arkflow-rs.github.io/arkflow/
- Size: 1.09 MB
- Stars: 453
- Watchers: 1
- Forks: 14
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- trackawesomelist - arkflow-rs/arkflow (⭐478) - High-performance Rust stream processing engine [](https://github.com/arkflow-rs/arkflow/actions) (Recently Updated / [Apr 28, 2025](/content/2025/04/28/README.md))
- awesome-rust - arkflow-rs/arkflow - High-performance Rust stream processing engine [](https://github.com/arkflow-rs/arkflow/actions) (Libraries / Data streaming)
README
# ArkFlow
English | [中文](README_zh.md)
[](https://github.com/chenquan/arkflow/actions/workflows/rust.yml)
[](LICENSE)High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting
multiple input/output sources and processors.## Features
- **High Performance**: Built on Rust and Tokio async runtime, offering excellent performance and low latency
- **Multiple Data Sources**: Support for Kafka, MQTT, HTTP, files, and other input/output sources
- **Powerful Processing Capabilities**: Built-in SQL queries, JSON processing, Protobuf encoding/decoding, batch
processing, and other processors
- **Extensible**: Modular design, easy to extend with new input, output, and processor components## Installation
### Building from Source
```bash
# Clone the repository
git clone https://github.com/arkflow-rs/arkflow.git
cd arkflow# Build the project
cargo build --release# Run tests
cargo test
```## Quick Start
1. Create a configuration file `config.yaml`:
```yaml
logging:
level: info
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1s
batch_size: 10pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT * FROM flow WHERE value >= 10"output:
type: "stdout"
```2. Run ArkFlow:
```bash
./target/release/arkflow --config config.yaml
```## Configuration Guide
ArkFlow uses YAML format configuration files, supporting the following main configuration items:
### Top-level Configuration
```yaml
logging:
level: info # Log level: debug, info, warn, errorstreams: # Stream definition list
- input: # Input configuration
# ...
pipeline: # Processing pipeline configuration
# ...
output: # Output configuration
# ...
buffer: # Buffer configuration
# ...
```### Input Components
ArkFlow supports multiple input sources:
- **Kafka**: Read data from Kafka topics
- **MQTT**: Subscribe to messages from MQTT topics
- **HTTP**: Receive data via HTTP
- **File**: Reading data from files(Csv,Json, Parquet, Avro, Arrow) using SQL
- **Generator**: Generate test data
- **Database**: Query data from databases(MySQL, PostgreSQL, SQLite, Duckdb)Example:
```yaml
input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-group
client_id: arkflow
start_from_latest: true
```### Processors
ArkFlow provides multiple data processors:
- **JSON**: JSON data processing and transformation
- **SQL**: Process data using SQL queries
- **Protobuf**: Protobuf encoding/decoding
- **Batch Processing**: Process messages in batchesExample:
```yaml
pipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value >= 10"
```### Output Components
ArkFlow supports multiple output targets:
- **Kafka**: Write data to Kafka topics
- **MQTT**: Publish messages to MQTT topics
- **HTTP**: Send data via HTTP
- **Standard Output**: Output data to the console
- **Drop**: Discard dataExample:
```yaml
output:
type: kafka
brokers:
- localhost:9092
topic: output-topic
client_id: arkflow-producer
```### Buffer Components
ArkFlow provides buffer capabilities to handle backpressure and temporary storage of messages:
- **Memory Buffer**: Memory buffer, for high-throughput scenarios and window aggregation
Example:
```yaml
buffer:
type: memory
capacity: 10000 # Maximum number of messages to buffer
timeout: 10s # Maximum time to buffer messages
```## Examples
### Kafka to Kafka Data Processing
```yaml
streams:
- input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-grouppipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value > 100"output:
type: kafka
brokers:
- localhost:9092
topic: processed-topic
```### Generate Test Data and Process
```yaml
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1ms
batch_size: 10000pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT count(*) FROM flow WHERE value >= 10 group by sensor"output:
type: "stdout"
```## License
ArkFlow is licensed under the [Apache License 2.0](LICENSE).
## Community
Discord: https://discord.gg/CwKhzb8pux
If you like or are using this project to learn or start your solution, please give it a star⭐. Thanks!