Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/argoproj-labs/old-argo-dataflow

Dataflow is a Kubernetes-native platform for executing large parallel data-processing pipelines.
https://github.com/argoproj-labs/old-argo-dataflow

data jetstream kafka kubernetes pipeline

Last synced: 13 days ago
JSON representation

Dataflow is a Kubernetes-native platform for executing large parallel data-processing pipelines.

Lists

README

        

# Dataflow

## NOTICE
Argo Dataflow has been reimplemented in the scope of a broader project focussed on real-time data processing and analytics.
Please checkout the new [numaflow project](https://github.com/numaproj/numaflow).

## Summary

Dataflow is a Kubernetes-native platform for executing large parallel data-processing pipelines.

Each pipeline is specified as a Kubernetes custom resource which consists of one or more steps which source and sink
messages from data sources such Kafka, NATS Streaming, or HTTP services.

Each step runs zero or more pods, and can scale horizontally using HPA or based on queue length using built-in scaling
rules. Steps can be scaled-to-zero, in which case they periodically briefly scale-to-one to measure queue length so they
can scale a back up.

Learn more about [features](docs/FEATURES.md).

[![Introduction to Dataflow](https://img.youtube.com/vi/afZT3aJ__jI/0.jpg)](https://youtu.be/afZT3aJ__jI)

## Use Cases

* Real-time "click" analytics
* Anomaly detection
* Fraud detection
* Operational (including IoT) analytics

## Screenshot

![Screenshot](docs/assets/screenshot.png)

## Example

```bash
pip install git+https://github.com/argoproj-labs/argo-dataflow#subdirectory=dsls/python
```

```python
from argo_dataflow import cron, pipeline

if __name__ == '__main__':
(pipeline('hello')
.namespace('argo-dataflow-system')
.step(
(cron('*/3 * * * * *')
.cat()
.log())
)
.run())
```

## Documentation

Read in order:

Beginner:

* [Quick start](docs/QUICK_START.md)
* [Concepts](docs/CONCEPTS.md)
* [Sources](docs/SOURCES.md)
* [Processors](docs/PROCESSORS.md)
* [Sinks](docs/SINKS.md)
* [Examples](docs/EXAMPLES.md)

Intermediate:

* [Handlers](docs/CODE.md)
* [Git usage](docs/GIT.md)
* [Expression syntax](docs/EXPRESSIONS.md)
* [Garbage collection](docs/GC.md)
* [Scaling](docs/SCALING.md)
* [Command line](docs/CLI.md)
* [Kubectl](docs/KUBECTL.md)
* [Events interop](docs/EVENTS_INTEROP.md)
* [Workflow interop](docs/WORKFLOW_INTEROP.md)
* [Meta-data](docs/META.md)
* [Idempotence](docs/IDEMPOTENCE.md)

Advanced

* [Configuration](docs/CONFIGURATION.md)
* [Features](docs/FEATURES.md)
* [Limitations](docs/LIMITATIONS.md)
* [Reliability](docs/RELIABILITY.md)
* [Metrics](docs/METRICS.md)
* [Image contract](docs/IMAGE_CONTRACT.md)
* [Jaeger tracing](docs/JAEGER.md)
* [Reading material](docs/READING.md)
* [Security](docs/SECURITY.md)
* [Dataflow vs X](docs/DATAFLOW_VS_X.md)
* [Contributing](docs/CONTRIBUTING.md)

### Architecture Diagram

[![Architecture](docs/assets/architecture.png)](https://docs.google.com/drawings/d/1Dk7mgZ3jKpBg_DQ3c8og04ULoKpGTGUt52pBE-Vet2o/edit)