{"id":18840217,"url":"https://github.com/maxim2266/pump","last_synced_at":"2026-02-18T13:03:53.717Z","repository":{"id":153394350,"uuid":"628625455","full_name":"maxim2266/pump","owner":"maxim2266","description":"A minimalist framework for assembling data processing pipelines.","archived":false,"fork":false,"pushed_at":"2026-02-12T18:23:50.000Z","size":117,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-02-13T01:58:28.685Z","etag":null,"topics":["callback","etl-pipeline","functional-components","golang","iterator","pipelines"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxim2266.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-04-16T14:36:23.000Z","updated_at":"2026-02-12T18:23:54.000Z","dependencies_parsed_at":null,"dependency_job_id":"e99a71cc-940a-4be2-8b85-3062d04e3206","html_url":"https://github.com/maxim2266/pump","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/maxim2266/pump","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fpump","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fpump/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fpump/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fpump/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxim2266","download_url":"https://codeload.github.com/maxim2266/pump/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxim2266%2Fpump/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29580650,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T08:38:15.585Z","status":"ssl_error","status_checked_at":"2026-02-18T08:38:14.917Z","response_time":162,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["callback","etl-pipeline","functional-components","golang","iterator","pipelines"],"created_at":"2024-11-08T02:46:47.969Z","updated_at":"2026-02-18T13:03:53.704Z","avatar_url":"https://github.com/maxim2266.png","language":"Go","readme":"## pump: a minimalist framework for assembling data processing pipelines.\n\n[![GoDoc](https://godoc.org/github.com/maxim2266/pump?status.svg)](https://godoc.org/github.com/maxim2266/pump)\n[![Go Report Card](https://goreportcard.com/badge/github.com/maxim2266/pump)](https://goreportcard.com/report/github.com/maxim2266/pump)\n[![License: BSD 3-Clause](https://img.shields.io/badge/License-BSD_3--Clause-yellow.svg)](https://opensource.org/licenses/BSD-3-Clause)\n\nPackage `pump` provides a minimalist framework for composing data processing pipelines.\nThe pipelines are type-safe, impose little overhead, and can be composed either statically,\nor dynamically (for example, as a function of configuration). A running pipeline stops on and\nreturns the first error encountered.\n\nThe package defines two generic types:\n\n  - Data generator `Gen[T]`: a callback-based (\"push\") iterator that supplies a stream of data of\n    any type `T`, and\n  - Pipeline stage `Stage[T,U]`: a function that invokes input generator `Gen[T]`, does whatever processing\n    it is programmed to do, and feeds the supplied callback with data items of type `U`.\n\nThe package also provides a basic set of functions for composing pipeline stages and binding stages\nto generators, as well as support for pipelining and parallel execution.\n\nFor API details see [documentation](https://godoc.org/github.com/maxim2266/pump).\n\n#### Concept\nThe library is built around two data types: generator `Gen[T any]` and stage `Stage[T,U any]`.\nGenerator is a function that passes data items to its argument - a callback function `func(T) error`.\nIt is defined as\n```Go\ntype Gen[T any] func(func(T) error) error\n```\nThis is very similar to `iter.Seq` type from Go v1.23, except that the callback function\nreturns `error` instead of a boolean. Any implementation of the generator function should stop\non the first error returned from the callback, or on any internal error encountered during iteration.\nHere is a (simplified) example of a constructor that creates a generator iterating over the given slice:\n```Go\nfunc fromSlice[T any](src []T) Gen[T] {\n    return func(yield func(T) error) error {\n        for _, item := range src {\n            if err := yield(item); err != nil {\n                return err\n            }\n        }\n\n        return nil\n    }\n}\n```\n_Note_: the library provides its own `FromSlice` function implementation. Also, in practice\ngenerators are more likely to read data from more complex sources, such as files, sockets,\ndatabase queries, etc.\n\nThe second type, `Stage`, is a function that is expected to invoke the given generator,\nprocess each data item of type `T` and possibly forward each result (of type `U`) to the given\ncallback. The `Stage` type is defined as\n```Go\ntype Stage[T, U any] func(Gen[T], func(U) error) error\n```\nJust to give a simple example, this is a stage that increments every integer from its generator:\n```Go\nfunc increment(src Gen[int], yield func(int) error) error {\n    return src(func(x int) error {\n        return yield(x + 1)\n    })\n}\n```\nAs a note, the library provides a more succinct way of defining such a simple stage (see below).\n\nThe signature of the stage function is designed to allow for full control over when and how\nthe source generator is invoked. For example, suppose we want to have a pipeline stage where\nprocessing of each input item involves database queries, and we also want to establish a\ndatabase connection before the iteration, and close it afterwards. This can be achieved using\nthe following stage function (for some already defined types `T` and `U`):\n```Go\nfunc process(src pump.Gen[T], yield func(U) error) error {\n    conn, err := connectToDatabase()\n\n    if err != nil {\n        return err\n    }\n\n    defer conn.Close()\n\n    return src(func(item T) error { // this actually invokes the source generator\n        // produce a result of type U\n        result, err := produceResult(item, conn)\n\n        if err != nil {\n            return err\n        }\n\n        // pass the result further down the pipeline\n        return yield(result)\n    })\n}\n```\nThe rest of the library is essentially about constructing and composing stages. Multiple stages\ncan be composed into one using `Chain*` family of functions, for example:\n```Go\npipe := Chain3(increment, times2, modulo5)\n```\nGiven a suitable generator, a pipe can be invoked directly:\n```Go\ngen := FromSlice([]int{ 1, 2, 3 }) // input data generator\nerr := pipe(gen, func(x int) error {\n    _, e := fmt.Println(x)\n    return e\n})\n\nif err != nil { ... }\n```\nOr it can be used in a for-range loop:\n```Go\nit := Bind(FromSlice([]int{ 1, 2, 3 }), pipe)\n\nvar err error\n\nsum := 0\n\nfor x := range it.All(\u0026err) {\n    sum += x\n}\n\nif err != nil { ... }\n```\n\nTo assist with writing simple stage functions (like `increment` above) the library provides\na number of constructors, for example:\n```Go\ninrement := Map(func(x int) int { return x + 1 })\ntimes2   := Map(func(x int) int { return x * 2 })\nmodulo5  := Map(func(x int) int { return x % 5 })\n\npipe := Chain3(increment, times2, modulo5)\n```\nOr, alternatively:\n```Go\npipe := Chain3(\n           Map(func(x int) int { return x + 1 }),\n           Map(func(x int) int { return x * 2 }),\n           Map(func(x int) int { return x % 5 }),\n        )\n```\n\nIn fact, a stage function can convert any input type `T` to any output type `U`, so the above\npipeline can be modified to produce strings instead of integers:\n```Go\npipe := Chain4(\n           inrement,\n           times2,\n           modulo5,\n           Map(strconv.Itoa),\n        )\n```\n\nOr the input data can be filtered to skip odd numbers:\n```Go\npipe := Chain4(\n           Filter(func(x int) bool { return x \u0026 1 == 0 }),\n           inrement,\n           times2,\n           modulo5,\n        )\n```\n\nTo deal with parallelisation the library provides two helpers: `Pipe` and `Parallel`.\n`Pipe` runs all stages before it in a separate goroutine, for example:\n```Go\npipe := Chain4(\n           inrement,\n           Pipe,\n           times2,\n           modulo5,\n        )\n```\nWhen this pipeline is invoked, its generator and `increment` stage will be running\nin a dedicated goroutine, while the rest will be executed in the current goroutine.\n\n`Parallel` executes the given stage in the specified number of goroutines, in parallel.\nAll stages before `Parallel` are also run in a dedicated goroutine. Example:\n```Go\npipe := Chain3(\n           inrement,\n           Parallel(5, times2),\n           modulo5,\n        )\n```\nUpon invocation of this pipeline, its generator and `increment` stage will be running\nin a dedicated goroutine, the `times2` stage will be running in 5 goroutines in parallel,\nand the last stage will be in the calling goroutine.\n\nThe above pipeline can also be rearranged to run all stages in parallel:\n```Go\npipe := Parallel(5, Chain3(\n           inrement,\n           times2,\n           modulo5,\n        ))\n```\n\n_Note_: `Parallel` stage does not preserve the order of data items.\n\nIn general, pipelines can be assembled either statically (i.e., when `pipe` is literally\na static variable), or dynamically, for example, as a function of configuration. Also,\nseparation between processing stages and their composition often reduces the number of\ncode modifications required to implement new requirements.\n\n#### Benchmarks\nAll benchmarks below simply pump integers through stages with no processing at all, thus only\nmeasuring the overhead associated with running stages themselves. The first two benchmarks show\nthat the iteration is generally quite efficient, especially using \"for-range\" loop. Results\nfor `Pipe` and `Parallel` stages show higher overhead because of the Go channels used internally\n(one channel for `Pipe` stage, and two for `Parallel`).\n```\n▶ go test -bench .\ngoos: linux\ngoarch: amd64\npkg: github.com/maxim2266/pump\ncpu: Intel(R) Core(TM) i5-8500T CPU @ 2.10GHz\nBenchmarkSimple-6       575758419         1.754 ns/op\nBenchmarkRangeFunc-6    1000000000        0.4368 ns/op\nBenchmarkPipe-6          7921405        157.2 ns/op\nBenchmarkParallel-6      2367049        522.8 ns/op\n```\nThe numbers are obtained with Go compiler version 1.26.0 on Linux Mint 22.3.\n\n#### License\nBSD-3-Clause\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim2266%2Fpump","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxim2266%2Fpump","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxim2266%2Fpump/lists"}