{"id":38042392,"url":"https://github.com/elastiflow/pipelines","last_synced_at":"2026-01-16T19:55:26.348Z","repository":{"id":286257900,"uuid":"832972749","full_name":"elastiflow/pipelines","owner":"elastiflow","description":"A lightweight Go framework for building stateful, real-time data pipelines. It processes both streams and batches with a declarative API to create scalable data applications, all in pure Go.","archived":false,"fork":false,"pushed_at":"2025-12-29T15:42:22.000Z","size":1586,"stargazers_count":6,"open_issues_count":4,"forks_count":3,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-01-01T19:41:46.182Z","etag":null,"topics":["data-pipeline","stream-processing"],"latest_commit_sha":null,"homepage":"https://pkg.go.dev/github.com/elastiflow/pipelines","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elastiflow.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2024-07-24T05:34:30.000Z","updated_at":"2025-12-29T15:42:25.000Z","dependencies_parsed_at":"2025-12-15T21:16:05.548Z","dependency_job_id":null,"html_url":"https://github.com/elastiflow/pipelines","commit_stats":null,"previous_names":["elastiflow/pipelines"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/elastiflow/pipelines","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elastiflow%2Fpipelines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elastiflow%2Fpipelines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elastiflow%2Fpipelines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elastiflow%2Fpipelines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elastiflow","download_url":"https://codeload.github.com/elastiflow/pipelines/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elastiflow%2Fpipelines/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28482133,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-pipeline","stream-processing"],"created_at":"2026-01-16T19:55:25.481Z","updated_at":"2026-01-16T19:55:26.342Z","avatar_url":"https://github.com/elastiflow.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Pipelines logo](docs/img/pipelines.png)](#)\n\n[![Go checks](https://github.com/elastiflow/pipelines/actions/workflows/go_checks.yml/badge.svg)](https://github.com/elastiflow/pipelines/actions/workflows/go_checks.yml)\n[![Go Reference](https://pkg.go.dev/badge/github.com/elastiflow/pipelines.svg)](https://pkg.go.dev/github.com/elastiflow/pipelines)\n[![Go Report Card](https://goreportcard.com/badge/github.com/elastiflow/pipelines)](https://goreportcard.com/report/github.com/elastiflow/pipelines)\n[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=elastiflow_pipelines\u0026metric=coverage\u0026token=ccd0571925ac27b1722b132131ef8a3906b93277)](https://sonarcloud.io/summary/new_code?id=elastiflow_pipelines)\n---\n\nThe `pipelines` module is a Go library designed to facilitate the creation and management of:\n\n1. Data processing pipelines\n2. Reactive streaming applications leveraging Go's concurrency primitives\n\nIt provides a set of tools for **flow control**, **error handling**, and **pipeline processes**. \nUnder the hood, it uses Go's channels and goroutines to enable concurrency at each stage of the pipeline.\n\n## Setup\n\nTo get started with the `pipelines` module, follow these steps:\n\n1. Install the `pipelines` module:\n\n    ```sh\n    go get github.com/elastiflow/pipelines\n    ```\n\n2. (Optional) To view local documentation via `godoc`:\n    ```sh\n    go install -v golang.org/x/tools/cmd/godoc@latest\n    make docs\n    ```\n   \n3. Once running, visit [GoDocs](http://localhost:6060/pkg/github.com/elastiflow/pipelines/) to view the \nlatest documentation locally.\n\n## Real World Applicability\n\n### When to Use\n\n- **ETL (Extract, Transform, Load)** style scenarios where data arrives in a stream, and you want to apply \ntransformations or filtering in a concurrent manner.\n- **Complex concurrency flows**: easily fan out, fan in, or broadcast data streams.\n- **Reactive Streaming Applications**: serves as a light framework for Go native reactive streaming applications.\n\n### Channel Management\n\n- **Sources** (e.g. `FromArray`, `FromChannel`) produce data into a channel.\n- **DataStream** transformations (e.g. `Run`, `Filter`, `Map`) read from inbound channels and write results to outbound channels.\n- **Sinks** (e.g. `ToChannel`) consume data from the final output channels.\n- Each method typically spins up **one or more goroutines** which connect these channels together, allowing parallel processing.\n\n## High-Level Details\n\n### Pipelines\n\nThe `Pipelines` type represents a collection of `Pipeline` instances. It is designed to simplify managing multiple concurrent pipelines, especially when you split a single data source into several parallel streams (i.e., using `Broadcast`). This allows you to control a group of pipelines as a single unit, for example, starting or stopping them all at once.\n\n### Pipeline\n\nA `Pipeline` is a series of data processing stages connected by channels. Each stage (`datastreams.DataStream`) is a function that performs a specific task and passes its output to the next stage. The `pipelines` module provides a flexible way to define and manage these stages.\n\n### DataStream\n\nThe `datastreams.DataStream` struct is the core of the `pipelines` module. It manages the flow of data through the pipeline stages and handles errors according to the provided parameters.\n\n### Key Components\n\n#### Functions\n\n- **ProcessFunc**\nA user-defined function type used in a given `DataStream` stage via the `DataStream.Run()` method.\nFor instance:\n    ```go \n    ds = ds.Run(func(v int) (int, error) {\n        if v \u003c 0 {\n            return 0, fmt.Errorf(\"negative number: %d\", v)\n        }\n        return v + 1, nil\n    })\n    ```\n\n- **TransformFunc**\nA user-defined function type `func(T) (U, error)` used with the `Map()` method to convert from type `T` to a different type `U`.  \nFor instance:\n    ```go\n    ds = ds.Map(func(i int) (string, error) {\n        return fmt.Sprintf(\"Number: %d\", i), nil\n    })\n    ```\n  \n- **FilterFunc**\nA user-defined function type func(T) (bool, error) used with the Filter() method to decide if an item should pass through (true) or be dropped (false).\nFor instance:\n    ```go\n    ds = ds.Filter(func(i int) (bool, error) {\n        return i % 2 == 0, nil\n    })\n    ```\n\n**KeyByFunc**: A user-defined function type used to partition the data stream into different segments based on a key. This is useful for grouping data before applying transformations or aggregations.\nFor instance:\n```go\n    kds := ds.KeyBy[testStruct, int](\n        New[testStruct](ctx, input, errCh).WithWaitGroup(\u0026sync.WaitGroup{}),\n        func(i int) (int, error) {\n            return i % 2, nil\n        },\n        Params{\n            BufferSize: 50,\n            Num:        1, // only 1 output channel per key\n        },\n    )\n```\n\n**WindowFunc**: A user-defined function to process batched data in a window. This is useful for aggregating data over time or count-based windows.\nFor instance:\n```go\n    kds = ds.Window[testStruct, string, *testInference](\n        datastreams.KeyBy[*SensorReading, string](p, keyFunc),\n        TumblingWindowFunc,\n        partitionFactory,\n        datastreams.Params{\n            BufferSize: 50,\n        },\n    )\n```\n\n#### Sources\n- **FromArray([]T)**: Convert a Go slice/array into a Sourcer\n- **FromChannel(\u003c-chan T)**: Convert an existing channel into a Sourcer\n- **FromDataStream(DataStream[T])**: Convert an existing DataStream into a Sourcer\n\n#### Sinks\n- **ToChannel(chan\u003c- T)**: Write DataStream output into a channel\n\n#### Windows\nWindow performs time- or count-based aggregation on a partitioned stream.\n- **NewTumblingFactory[T]**: Creates fixed-size windows that do not overlap.\n- **NewSlidingFactory[T]**: Creates overlapping windows.\n- **NewIntervalFactory[T]**: Creates windows based on a time interval.\n  \n#### Methods\n- **Run(ProcessFunc[T]) DataStream[T]**: Process each item with a user function\n- **Filter(FilterFunc[T]) DataStream[T]**: Filter items by user-defined condition\n- **Map(TransformFunc[T,U]) DataStream[U]**: Transform each item from T to U\n- **KeyBy(KeyByFunc[T]) DataStream[T]**: Partition the stream by a key\n- **Window(WindowFunc[T]) DataStream[T]**: Apply a window function to the stream\n- **Expand(ExpandFunc[T]) DataStream[T]**: Explode each item into multiple items\n- **FanOut() DataStream[T]**: Create multiple parallel output channels\n- **FanIn() DataStream[T]**: Merge multiple channels into one\n- **Broadcast() DataStream[T]**: Duplicate each item to multiple outputs\n- **Tee() (DataStream[T], DataStream[T])**: Split into two DataStreams\n- **Take(Params{Num: N}) DataStream[T]**: Take only N items\n- **OrDone() DataStream[T]**: Terminates if upstream is closed\n- **Out() \u003c-chan T**: Underlying output channel\n- **Sink(Sinker[T]) DataStream[T]**: Push items to a sink\n\n#### Method Params\n- **Params**:\n  Used to pass arguments into `DataStream` methods.\n    - Options\n        - **SkipError (bool)**: If true, any error in ProcessFunc / TransformFunc / FilterFunc causes that item to be skipped rather than stopping the pipeline.\n        - **Num (int)**: Used by methods like FanOut, Broadcast, and Take to specify how many parallel channels or how many items to consume.\n        - **BufferSize (int)**: Controls the size of the buffered channels created for that stage. Larger buffers can reduce blocking but use more memory.\n        - **SegmentName (string)**: Tag a pipeline stage name, useful for logging or debugging errors (e.g. “segment: broadcast-2”).\n\n\n### Examples\n\nBelow is an example of how to use the `pipelines` module to create simple pipelines.\nAdditional examples can be found in the godocs.\n\n#### Squaring Numbers\n\nThis example demonstrates how to set up a pipeline that takes a stream of integers, squares each odd integer, and outputs the results.\n\n```go\npackage main\n\nimport (\n\t\"context\"\n\t\"fmt\"\n\t\"log/slog\"\n\n\t\"github.com/elastiflow/pipelines\"\n\t\"github.com/elastiflow/pipelines/datastreams\"\n\t\"github.com/elastiflow/pipelines/datastreams/sources\"\n)\n\nfunc createIntArr(num int) []int {\n\tvar arr []int\n\tfor i := 0; i \u003c num; i++ {\n\t\tarr = append(arr, i)\n\t}\n\treturn arr\n}\n\nfunc squareOdds(v int) (int, error) {\n\tif v%2 == 0 {\n\t\treturn v, fmt.Errorf(\"even number error: %v\", v)\n\t}\n\treturn v * v, nil\n}\n\nfunc exProcess(p datastreams.DataStream[int]) datastreams.DataStream[int] {\n\treturn p.OrDone().FanOut(\n\t\tdatastreams.Params{Num: 2},\n\t).Run(\n\t\tsquareOdds,\n\t)\n}\n\nfunc main() {\n\terrChan := make(chan error, 10)\n\tdefer close(errChan)\n\n\tpl := pipelines.New[int, int]( // Create a new Pipeline\n\t\tcontext.Background(),\n\t\tsources.FromArray(createIntArr(10)), // Create a source to start the pipeline\n\t\terrChan,\n\t).Start(exProcess)\n\n\tgo func(errReceiver \u003c-chan error) { // Handle Pipeline errors\n\t\tdefer pl.Close()\n\t\tfor err := range errReceiver {\n\t\t\tif err != nil {\n\t\t\t\tslog.Error(\"demo error: \" + err.Error())\n\t\t\t\t// return if you wanted to close the pipeline during error handling.\n\t\t\t}\n\t\t}\n\t}(pl.Errors())\n\tfor out := range pl.Out() { // Read Pipeline output\n\t\tslog.Info(\"received simple pipeline output\", slog.Int(\"out\", out))\n\t}\n}\n```\n\n## Contributing\nWe welcome your contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details on how to open issues, submit pull requests, and propose new features.\n\n## License\nThis project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felastiflow%2Fpipelines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felastiflow%2Fpipelines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felastiflow%2Fpipelines/lists"}