https://github.com/arkflow-rs/arkflow
High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and intelligent analysis.
https://github.com/arkflow-rs/arkflow
ai arkflow datafusion deep-learning duckdb flow kafka machine-learning mysql nats postgresql redis rust rust-lang sql sqlite stream tokio tokio-rs websocket
Last synced: 17 days ago
JSON representation
High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and intelligent analysis.
- Host: GitHub
- URL: https://github.com/arkflow-rs/arkflow
- Owner: arkflow-rs
- License: apache-2.0
- Created: 2025-03-01T03:02:55.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-02-16T16:29:02.000Z (25 days ago)
- Last Synced: 2026-02-16T23:56:17.150Z (25 days ago)
- Topics: ai, arkflow, datafusion, deep-learning, duckdb, flow, kafka, machine-learning, mysql, nats, postgresql, redis, rust, rust-lang, sql, sqlite, stream, tokio, tokio-rs, websocket
- Language: Rust
- Homepage: https://arkflow-rs.com/
- Size: 4.26 MB
- Stars: 1,249
- Watchers: 2
- Forks: 41
- Open Issues: 27
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-streaming - ArkFlow - High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors. (Table of Contents / Streaming Engine)
- fucking-awesome-rust - arkflow-rs/arkflow - High-performance Rust stream processing engine [](https://github.com/arkflow-rs/arkflow/actions) (Libraries / Data streaming)
- trackawesomelist - ArkFlow (⭐478) - High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors. (Recently Updated / [May 11, 2025](/content/2025/05/11/README.md))
- awesome-rust - arkflow-rs/arkflow - High-performance Rust stream processing engine [](https://github.com/arkflow-rs/arkflow/actions) (Libraries / Data streaming)
- awesome-rust-with-stars - arkflow-rs/arkflow - performance Rust stream processing engine | 2026-01-30 | (Libraries / Data streaming)
- awesome - arkflow-rs/arkflow - High performance Rust stream processing engine seamlessly integrates AI capabilities, providing powerful real-time data processing and intelligent analysis. (<a name="Rust"></a>Rust)
README
# ArkFlow
English | [中文](README_zh.md)
[](https://github.com/arkflow-rs/arkflow/actions/workflows/rust.yml)
[](LICENSE)
[Latest docs](https://arkflow-rs.com/docs/intro) | [Dev docs](https://arkflow-rs.com/docs/next/intro)
High performance Rust stream processing engine seamlessly integrates AI capabilities,
providing powerful real-time data processing and intelligent analysis.
It not only supports multiple input/output sources and processors, but also enables easy loading and execution of machine learning models,
enabling streaming data and inference, anomaly detection, and complex event processing.
## Cloud Native Landscape
ArkFlow enlisted in the [CNCF Cloud Native Landscape](https://landscape.cncf.io/?item=app-definition-and-development--streaming-messaging--arkflow).
## Features
- **High Performance**: Built on Rust and Tokio async runtime, offering excellent performance and low latency
- **Multiple Data Sources**: Support for Kafka, MQTT, HTTP, files, and other input/output sources
- **Powerful Processing Capabilities**: Built-in SQL queries, Python script, JSON processing, Protobuf encoding/decoding, batch
processing, and other processors
- **Extensible**: Modular design, easy to extend with new input, buffer, output, and processor components
## Installation
### Building from Source
```bash
# Clone the repository
git clone https://github.com/arkflow-rs/arkflow.git
cd arkflow
# Build the project
cargo build --release
# Run tests
cargo test
```
## Quick Start
1. Create a configuration file `config.yaml`:
```yaml
logging:
level: info
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1s
batch_size: 10
pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT * FROM flow WHERE value >= 10"
output:
type: "stdout"
error_output:
type: "stdout"
```
2. Run ArkFlow:
```bash
./target/release/arkflow --config config.yaml
```
## Configuration Guide
ArkFlow uses YAML format configuration files, supporting the following main configuration items:
### Top-level Configuration
```yaml
logging:
level: info # Log level: debug, info, warn, error
streams: # Stream definition list
- input: # Input configuration
# ...
pipeline: # Processing pipeline configuration
# ...
output: # Output configuration
# ...
error_output: # Error output configuration
# ...
buffer: # Buffer configuration
# ...
```
### Input Components
ArkFlow supports multiple input sources:
- **Kafka**: Read data from Kafka topics
- **MQTT**: Subscribe to messages from MQTT topics
- **HTTP**: Receive data via HTTP
- **File**: Reading data from files(Csv,Json, Parquet, Avro, Arrow) using SQL
- **Generator**: Generate test data
- **Database**: Query data from databases(MySQL, PostgreSQL, SQLite, Duckdb)
- **Nats**: Subscribe to messages from Nats topics
- **Redis**: Subscribe to messages from Redis channels or lists
- **Websocket**: Subscribe to messages from WebSocket connections
- **Modbus**: Read data from Modbus devices
Example:
```yaml
input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-group
client_id: arkflow
start_from_latest: true
```
### Processors
ArkFlow provides multiple data processors:
- **JSON**: JSON data processing and transformation
- **SQL**: Process data using SQL queries
- **Protobuf**: Protobuf encoding/decoding
- **Batch Processing**: Process messages in batches
- **Vrl**: Process data using [VRL](https://vector.dev/docs/reference/vrl/)
Example:
```yaml
pipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value >= 10"
```
### Output Components
ArkFlow supports multiple output targets:
- **Kafka**: Write data to Kafka topics
- **MQTT**: Publish messages to MQTT topics
- **HTTP**: Send data via HTTP
- **Standard Output**: Output data to the console
- **Drop**: Discard data
- **Nats**: Publish messages to Nats topics
Example:
```yaml
output:
type: kafka
brokers:
- localhost:9092
topic:
type: value
value:
type: value
value: test-topic
client_id: arkflow-producer
```
### Error Output Components
ArkFlow supports multiple error output targets:
- **Kafka**: Write error data to Kafka topics
- **MQTT**: Publish error messages to MQTT topics
- **HTTP**: Send error data via HTTP
- **Standard Output**: Output error data to the console
- **Drop**: Discard error data
- **Nats**: Publish messages to Nats topics
Example:
```yaml
error_output:
type: kafka
brokers:
- localhost:9092
topic:
type: value
value: error-topic
client_id: error-arkflow-producer
```
### Buffer Components
ArkFlow provides buffer capabilities to handle backpressure and temporary storage of messages:
- **Memory Buffer**: Memory buffer, for high-throughput scenarios and window aggregation.
- **Session Window**: The Session Window buffer component provides a session-based message grouping mechanism where
messages are grouped based on activity gaps. It implements a session window that closes after a configurable period of
inactivity.
- **Sliding Window**: The Sliding Window buffer component provides a time-based windowing mechanism for processing
message batches. It implements a sliding window algorithm with configurable window size, slide interval and slide
size.
- **Tumbling Window**: The Tumbling Window buffer component provides a fixed-size, non-overlapping windowing mechanism
for processing message batches. It implements a tumbling window algorithm with configurable interval settings.
Example:
```yaml
buffer:
type: memory
capacity: 10000 # Maximum number of messages to buffer
timeout: 10s # Maximum time to buffer messages
```
## Examples
### Kafka to Kafka Data Processing
```yaml
streams:
- input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-group
pipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value > 100"
output:
type: kafka
brokers:
- localhost:9092
topic:
type: value
value: test-topic
```
### Generate Test Data and Process
```yaml
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1ms
batch_size: 10000
pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT count(*) FROM flow WHERE value >= 10 group by sensor"
output:
type: "stdout"
```
## Users
- Conalog(Country: South Korea)
## ArkFlow Plugin
[ArkFlow Plugin Examples](https://github.com/arkflow-rs/arkflow-plugin-examples)
## License
ArkFlow is licensed under the [Apache License 2.0](LICENSE).
## Community
Discord: https://discord.gg/CwKhzb8pux
If you like or are using this project to learn or start your solution, please give it a star⭐. Thanks!