https://github.com/arkflow-rs/arkflow

High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
https://github.com/arkflow-rs/arkflow

arkflow datafusion duckdb flow kafka mysql postgresql rust rust-lang sql sqlite stream tokio tokio-rs

Last synced: 5 months ago
JSON representation

High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.

Host: GitHub
URL: https://github.com/arkflow-rs/arkflow
Owner: arkflow-rs
License: apache-2.0
Created: 2025-03-01T03:02:55.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-05-11T02:47:27.000Z (5 months ago)
Last Synced: 2025-05-11T03:29:38.502Z (5 months ago)
Topics: arkflow, datafusion, duckdb, flow, kafka, mysql, postgresql, rust, rust-lang, sql, sqlite, stream, tokio, tokio-rs
Language: Rust
Homepage: https://arkflow-rs.com
Size: 1.48 MB
Stars: 953
Watchers: 1
Forks: 22
Open Issues: 17
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-streaming - ArkFlow - High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors. (Table of Contents / Streaming Engine)
fucking-awesome-rust - arkflow-rs/arkflow - High-performance Rust stream processing engine [![CI](https://github.com/arkflow-rs/arkflow/actions/workflows/rust.yml/badge.svg?branch=main)](https://github.com/arkflow-rs/arkflow/actions) (Libraries / Data streaming)
trackawesomelist - ArkFlow (⭐478) - High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors. (Recently Updated / [May 11, 2025](/content/2025/05/11/README.md))
awesome-rust - arkflow-rs/arkflow - High-performance Rust stream processing engine [![CI](https://github.com/arkflow-rs/arkflow/actions/workflows/rust.yml/badge.svg?branch=main)](https://github.com/arkflow-rs/arkflow/actions) (Libraries / Data streaming)

README

# ArkFlow

English | [中文](README_zh.md)

[![Rust](https://github.com/arkflow-rs/arkflow/actions/workflows/rust.yml/badge.svg)](https://github.com/arkflow-rs/arkflow/actions/workflows/rust.yml)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting
multiple input/output sources and processors.

## Features

- **High Performance**: Built on Rust and Tokio async runtime, offering excellent performance and low latency
- **Multiple Data Sources**: Support for Kafka, MQTT, HTTP, files, and other input/output sources
- **Powerful Processing Capabilities**: Built-in SQL queries, JSON processing, Protobuf encoding/decoding, batch
processing, and other processors
- **Extensible**: Modular design, easy to extend with new input, buffer, output, and processor components

## Installation

### Building from Source

```bash
# Clone the repository
git clone https://github.com/arkflow-rs/arkflow.git
cd arkflow

# Build the project
cargo build --release

# Run tests
cargo test
```

## Quick Start

1. Create a configuration file `config.yaml`:

```yaml
logging:
level: info
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1s
batch_size: 10

pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT * FROM flow WHERE value >= 10"

output:
type: "stdout"
error_output:
type: "stdout"
```

2. Run ArkFlow:

```bash
./target/release/arkflow --config config.yaml
```

## Configuration Guide

ArkFlow uses YAML format configuration files, supporting the following main configuration items:

### Top-level Configuration

```yaml
logging:
level: info # Log level: debug, info, warn, error

streams: # Stream definition list
- input: # Input configuration
# ...
pipeline: # Processing pipeline configuration
# ...
output: # Output configuration
# ...
error_output: # Error output configuration
# ...
buffer: # Buffer configuration
# ...
```

### Input Components

ArkFlow supports multiple input sources:

- **Kafka**: Read data from Kafka topics
- **MQTT**: Subscribe to messages from MQTT topics
- **HTTP**: Receive data via HTTP
- **File**: Reading data from files(Csv,Json, Parquet, Avro, Arrow) using SQL
- **Generator**: Generate test data
- **Database**: Query data from databases(MySQL, PostgreSQL, SQLite, Duckdb)
- **Nats**: Subscribe to messages from Nats topics
- **Redis**: Subscribe to messages from Redis channels or lists
- **Websocket**: Subscribe to messages from WebSocket connections

Example:

```yaml
input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-group
client_id: arkflow
start_from_latest: true
```

### Processors

ArkFlow provides multiple data processors:

- **JSON**: JSON data processing and transformation
- **SQL**: Process data using SQL queries
- **Protobuf**: Protobuf encoding/decoding
- **Batch Processing**: Process messages in batches
- **Vrl**: Process data using [VRL](https://vector.dev/docs/reference/vrl/)

Example:

```yaml
pipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value >= 10"
```

### Output Components

ArkFlow supports multiple output targets:

- **Kafka**: Write data to Kafka topics
- **MQTT**: Publish messages to MQTT topics
- **HTTP**: Send data via HTTP
- **Standard Output**: Output data to the console
- **Drop**: Discard data
- **Nats**: Publish messages to Nats topics

Example:

```yaml
output:
type: kafka
brokers:
- localhost:9092
topic:
type: value
value:
type: value
value: test-topic
client_id: arkflow-producer
```

### Error Output Components
ArkFlow supports multiple error output targets:
- **Kafka**: Write error data to Kafka topics
- **MQTT**: Publish error messages to MQTT topics
- **HTTP**: Send error data via HTTP
- **Standard Output**: Output error data to the console
- **Drop**: Discard error data
- **Nats**: Publish messages to Nats topics

Example:

```yaml
error_output:
type: kafka
brokers:
- localhost:9092
topic:
type: value
value: error-topic
client_id: error-arkflow-producer
```

### Buffer Components

ArkFlow provides buffer capabilities to handle backpressure and temporary storage of messages:

- **Memory Buffer**: Memory buffer, for high-throughput scenarios and window aggregation.
- **Session Window**: The Session Window buffer component provides a session-based message grouping mechanism where messages are grouped based on activity gaps. It implements a session window that closes after a configurable period of inactivity.
- **Sliding Window**: The Sliding Window buffer component provides a time-based windowing mechanism for processing message batches. It implements a sliding window algorithm with configurable window size, slide interval and slide size.
- **Tumbling Window**: The Tumbling Window buffer component provides a fixed-size, non-overlapping windowing mechanism for processing message batches. It implements a tumbling window algorithm with configurable interval settings.

Example:

```yaml
buffer:
type: memory
capacity: 10000 # Maximum number of messages to buffer
timeout: 10s # Maximum time to buffer messages
```

## Examples

### Kafka to Kafka Data Processing

```yaml
streams:
- input:
type: kafka
brokers:
- localhost:9092
topics:
- test-topic
consumer_group: test-group

pipeline:
thread_num: 4
processors:
- type: json_to_arrow
- type: sql
query: "SELECT * FROM flow WHERE value > 100"

output:
type: kafka
brokers:
- localhost:9092
topic:
type: value
value: test-topic
```

### Generate Test Data and Process

```yaml
streams:
- input:
type: "generate"
context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
interval: 1ms
batch_size: 10000

pipeline:
thread_num: 4
processors:
- type: "json_to_arrow"
- type: "sql"
query: "SELECT count(*) FROM flow WHERE value >= 10 group by sensor"

output:
type: "stdout"
```

## ArkFlow Plugin

[ArkFlow Plugin Examples](https://github.com/arkflow-rs/arkflow-plugin-examples)

## License

ArkFlow is licensed under the [Apache License 2.0](LICENSE).

## Community

Discord: https://discord.gg/CwKhzb8pux

If you like or are using this project to learn or start your solution, please give it a star⭐. Thanks!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arkflow-rs/arkflow

Awesome Lists containing this project

README