{"id":42833977,"url":"https://github.com/n0rdy/pippin","last_synced_at":"2026-01-30T11:37:56.658Z","repository":{"id":207964259,"uuid":"720510160","full_name":"n0rdy/pippin","owner":"n0rdy","description":"Go library to create and manage data pipelines on your machine","archived":false,"fork":false,"pushed_at":"2025-04-06T07:49:52.000Z","size":66,"stargazers_count":14,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-18T22:05:17.411Z","etag":null,"topics":["async","asynchronous","data","data-engineering","data-pipeline","data-processing","go","golang","golang-library","golang-package","goroutines","pipeline"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/n0rdy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-18T17:52:04.000Z","updated_at":"2024-12-10T21:28:51.000Z","dependencies_parsed_at":"2023-11-18T19:24:51.450Z","dependency_job_id":"550ad94f-69b4-4211-8a4d-6d42d02b8723","html_url":"https://github.com/n0rdy/pippin","commit_stats":{"total_commits":21,"total_committers":3,"mean_commits":7.0,"dds":"0.19047619047619047","last_synced_commit":"7b0b00d95d6c228a27996d6155003551c443932c"},"previous_names":["n0rdy/pippin"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/n0rdy/pippin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0rdy%2Fpippin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0rdy%2Fpippin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0rdy%2Fpippin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0rdy%2Fpippin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/n0rdy","download_url":"https://codeload.github.com/n0rdy/pippin/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/n0rdy%2Fpippin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28911825,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-30T08:15:08.179Z","status":"ssl_error","status_checked_at":"2026-01-30T08:14:31.507Z","response_time":66,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["async","asynchronous","data","data-engineering","data-pipeline","data-processing","go","golang","golang-library","golang-package","goroutines","pipeline"],"created_at":"2026-01-30T11:37:55.939Z","updated_at":"2026-01-30T11:37:56.652Z","avatar_url":"https://github.com/n0rdy.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pippin\n\nPippin is a simple, lightweight, and (hopefully) easy to use Go library for creating and managing data pipelines on your machine.\n\nThe library heavily relies on goroutines and channels, but this complexity is hidden from the user behind a simple API.\nBasically, this is the main purpose why I implemented this library in the first place.\n\nIt has no external dependencies except for the two Go standard library experimental packages:\n- `golang.org/x/exp`\n- `golang.org/x/sync`\n\nand one external dependency for testing goroutine leaks:\n- `go.uber.org/goleak`\n\nPlease, note that the library is still in the early development stage, so the API might change in the future.\nThere still might be some bugs, so please, feel free to report them.\n\n### But we already have [insert library here]!\n\nWe've had one, yes. But what about second ~~breakfast~~ library?\n\n## Table of contents\n* [Installation](#installation)\n* [Usage](#usage)\n    * [Simple example](#simple-example)\n    * [More detailed example](#more-detailed-example)\n* [Documentation](#documentation)\n* [Concepts](#concepts)\n    * [Pipeline](#pipeline)\n        * [Creation](#creation)\n        * [Configuration](#configuration)\n        * [Manual start](#manual-start)\n        * [Interrupting](#interrupting)\n    * [Stage](#stage)\n        * [Transformation](#transformation)\n        * [Aggregation](#aggregation)\n        * [Future](#future)\n        * [Configuration](#configuration-1)\n\n\n## Installation\n\n```bash\ngo get github.com/n0rdy/pippin\n```\n\n## Usage\n\n### Simple example\n\n```go\n// creates a new pipeline from a slice of integers:\np := pipeline.FromSlice[int]([]int{1, 2, 3, 4, 5})\n\n// filters out all even numbers:\nfilteringStage := transform.Filter[int](p.InitStage, func(i int) bool {\n    return i % 2 == 0\n})\n\n// multiplies each number by 2:\nmappingStage := transform.Map[int, int](filteringStage, func(i int) int {\n    return i * 2\n})\n\n// sums all numbers:\nres, err := aggregate.Sum[int](mappingStage)\nif err != nil {\n    fmt.Println(err)\n} else {\n    fmt.Println(*res)\t\n}\n\n// the output is:\n// 12\n```\n\n### More detailed example\n\n```go\n// creates a new pipeline from a slice of integers:\np := pipeline.FromSlice(\n\t[]string{\"1\", \"a\", \"2\", \"-3\", \"4\", \"5\", \"b\"},\n)\n// result:\n// \"1\", \"a\", \"2\", \"-3\", \"4\", \"5\", \"b\"\n\natoiStage := transform.MapWithError(\n\tp.InitStage,\n\tfunc(input string) (int, error) {\n\t\treturn strconv.Atoi(input)\n\t},\n\tfunc(err error) {\n\t\tfmt.Println(err)\n\t},\n)\n// result:\n// 1, 2, -3, 4, 5\n// printed to the console: \n// strconv.Atoi: parsing \"a\": invalid syntax \n// strconv.Atoi: parsing \"b\": invalid syntax\n\noddNumsStage := transform.Filter(atoiStage, func(input int) bool {\n\treturn input%2 != 0\n})\n// result:\n// 1, -3, 5\n\nmultipliedByTwoStage := transform.Map(oddNumsStage, func(input int) int {\n\treturn input * 2\n})\n// result:\n// 2, -6, 10\n\ntoMatrixStage := transform.MapWithErrorMapper(\n\tmultipliedByTwoStage,\n\tfunc(input int) ([]int, error) {\n\t\tif input \u003c 0 {\n\t\t\treturn nil, fmt.Errorf(\"negative number %d\", input)\n\t\t}\n\n\t\tres := make([]int, input)\n\t\tfor i := 0; i \u003c input; i++ {\n\t\t\tres[i] = input * i\n\t\t}\n\t\treturn res, nil\n\t},\n\tfunc(err error) []int {\n\t\treturn []int{42}\n\t},\n)\n// result:\n// [0, 2], [42], [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]\n\nplusOneStage := transform.FlatMapWithError(\n\ttoMatrixStage,\n\tfunc(input int) ([]int, error) {\n\t\tif input == 0 {\n\t\t\treturn nil, fmt.Errorf(\"zero\")\n\t\t}\n\n\t\treturn []int{input + 1}, nil\n\t},\n\tfunc(err error) {\n\t\tfmt.Println(err)\n\t},\n)\n// result:\n// [3], [43], [11], [21], [31], [41], [51], [61], [71], [81], [91]\n// printed to the console:\n// zero\n// zero\n\ngreaterThan42Stage := transform.FlatMapWithErrorMapper(\n\tplusOneStage,\n\tfunc(input int) ([]int, error) {\n\t\tif input \u003c= 42 {\n\t\t\treturn nil, fmt.Errorf(\"42\")\n\t\t}\n\t\treturn []int{input}, nil\n\t},\n\tfunc(err error) []int {\n\t\treturn []int{0}\n\t},\n)\n// result:\n// [0], [43], [0], [0], [0], [0], [51], [61], [71], [81], [91]\n\nflattenedStage := transform.FlatMap(greaterThan42Stage, func(input int) int {\n\treturn input\n})\n// result:\n// [0, 43, 0, 0, 0, 0, 51, 61, 71, 81, 91]\n\nfutureSum := asyncaggregate.Sum(flattenedStage)\n// result:\n// 398\n\nresult, err := futureSum.GetWithTimeout(time.Duration(10)*time.Second)\nif err != nil {\n\tfmt.Println(err)\n} else {\n\tfmt.Println(*result)\n}\n// printed to the console:\n// 398\n```\n\n## Documentation\n\nFind the full documentation [here](https://pkg.go.dev/github.com/n0rdy/pippin).\n\n## Concepts\n\nThe main concepts of Pippin are:\n- pipeline\n- stage\n\n### Pipeline\n\nPipeline is a sequence of stages.\nIt is the first and the key object in the library. Its API provides a possibility to interrupt the entire pipeline.\n\nThe user can see the pipeline current status by accessing the `Status` field. \nPlease, note that the status is updated asynchronously, so it may not be up-to-date right away after the change - eventual consistency.\n\n#### Creation\n\nIt is created from a variety of sources, such as:\n- slice\n- map\n- channel\n\nTo create a pipeline from a slice:\n```go\np := pipeline.FromSlice[int]([]int{1, 2, 3, 4, 5})\n```\n\nTo create a pipeline from a map:\n```go\nm := map[string]int{\n    \"one\": 1,\n    \"two\": 2,\n    \"three\": 3,\n}\np := pipeline.FromMap[string, int](m)\n```\n\nTo create a pipeline from a channel:\n```go\nch := make(chan int)\np := pipeline.FromChannel[int](ch)\n```\n\n#### Configuration\n\nIt is possible to configure the pipeline by using the `configs.PipelineConfig` struct, which contains the following config options:\n- `ManualStart` - is a boolean that indicates whether the pipeline should be started manually. \nIf it is passed as `true`, the pipeline will not start automatically on creation, and it's up to the user to start it by calling the `pipeline.Pipeline.Start` method.\n- `MaxGoroutinesTotal` - is an integer that indicates the maximum number of goroutines that can be spawned by the pipeline. \nIf it is passed as `0` or less, then there is no limit. \nPlease, note that the real number of goroutines is always greater than the defined size, as there are service goroutines that are not limited by the rate limiter, \nand even if the pipeline rate limiter is full, the program will spawn a new goroutine if there is no workers for the current stage.\n- `MaxGoroutinesPerStage` - is an integer that indicates the maximum number of goroutines that can be spawned by each stage. \nIf it is passed as `0` or less, then there is no limit. \nIt is possible to change the limit for each stage individually - see `configs.StageConfig.MaxGoroutines`.\n- `Timeout` - indicates the timeout for the entire pipeline. If it is passed as `0` or less, then there is no timeout.\n- `Logger` is a logger that will be used by the pipeline. \nIf it is passed as nil, then the `logging.NoOpsLogger` logger will be used that does nothing.\nCheck `logging` package for more details and predefined loggers.\n- `InitStageConfig` is a configuration for the initial stage. See `configs.StageConfig` for more details.\n\nIf you pipeline performs any network calls within its transformation/aggregation logic, I'd suggest configuring the maximum number of goroutines to prevent the possible DDoS attack on the target server or reaching the maximum number of open files on the client machine.\n\nTo create a pipeline with a custom configuration:\n```go\np := pipeline.FromSlice[int]([]int{1, 2, 3, 4, 5}, configs.PipelineConfig{\n    ManualStart: true,\n    MaxGoroutinesTotal: 100,\n    MaxGoroutinesPerStage: 10,\n    Timeout: duration.Duration(1000) * time.Millisecond,\n\tLogger: logging.NewConsoleLogger(loglevels.DEBUG),\n})\n```\n\nPlease, note that even though it is technically possible to pass more than one configuration option, only the first one will be used.\n\n#### Manual start\n\nIf the pipeline is configured to be started manually via the `ManualStart` option, it won't start automatically on creation. \nIn order to start it, do:\n```go\np := pipeline.FromSlice[int]([]int{1, 2, 3, 4, 5}, configs.PipelineConfig{\n    ManualStart: true,\n})\n\n// some code here\n\np.Start()\n```\n\n#### Interrupting\n\nIt is possible to interrupt the pipeline by:\n```go\np := pipeline.FromSlice[int]([]int{1, 2, 3, 4, 5})\n\n// some code here\n\np.Interrupt()\n```\n\nThis method gracefully tries to interrupt the pipeline. There is no guarantee that the pipeline will be interrupted immediately.\n\n### Stage\n\nStage is a single step in the pipeline. It is created either by a pipeline (the initial stage only), or by another stage. It contains no values within itself.\n\nThe high-level picture is the following: first the user needs to create a pipeline object and then perform some actions on it that will lead to the creation of stages.\n\nThere are two types of actions the user can perform:\n- transformation\n- aggregation\n\n#### Transformation\n\nTransformation is an intermediate action that transforms the data in the pipeline. It is performed by the `transform` package.\n\nIf you are coming from the JVM world, you can think of it as `Stream` transformations in Java 8+/Scala.\n\nTo create a transformation, provide a stage, a transformation function, and an optional configuration.\nAs a result, a new stage will be created with the type of the transformation function's output.\n\nPippin provides the following transformation functions:\n- `Map`\n- `MapWithError`\n- `MapWithErrorMapper`\n- `FlatMap`\n- `FlatMapWithError`\n- `FlatMapWithErrorMapper`\n- `Filter`\n\n`Map`, `FlatMap` and `Filter` are the same as in Java/Scala/Kotlin. `WithError` and `WithErrorMapper` functions are the same as their counterparts, but they also provide a possibility to handle errors:\n- `WithError` handles errors by performing a function with a side effect on each error\n- `WithErrorMapper` handles errors by mapping them to the output type\n\nTo simplify this, use `WithError` if you'd like to, for example, log the error and continue the pipeline by ignoring the input element that caused the error.\n```go\np := pipeline.FromSlice[string]([]string{\"1\", \"2\", \"a\", \"3\"})\n\n// converts each string to an integer\n// when an error happens, it will be logged to the console\natoiStage := transform.MapWithError(\n    p.InitStage,\n    func(s string) (int, error) {\n        return strconv.Atoi(s)\n    },\n    func(err error) {\n        fmt.Println(\"error happened\", err)\n    },\n)\n```\nUse `WithErrorMapper` if you'd like to provide a default output value for the error.\n```go\np := pipeline.FromSlice[string]([]string{\"1\", \"2\", \"a\", \"3\"})\n\n// converts each string to an integer\n// when an error happens, a default value of 42 will be used\natoiStage := transform.MapWithErrorMapper(\n    p.InitStage,\n    func(s string) (int, error) {\n    \treturn strconv.Atoi(s)\n    },\n    func(err error) int {\n    \treturn 42\n    },\n)\n```\n\n#### Aggregation\n\nAs mentioned above, the transformations are the intermediate actions. It means that it is possible to chain them together in one by one fashion in order to create a pipeline. However, transformation doesn't return any result, only a stage. \nIn order to get the result, the user needs to perform an aggregation, which is the last step in the pipeline.\n\nPippin provides the following aggregation functions:\n- `Sum`\n- `SumComplexType`\n- `Avg`\n- `AvgComplexType`\n- `Max`\n- `Min`\n- `Count`\n- `Sort`\n- `SortDesc`\n- `GroupBy`\n- `Reduce`\n- `AsSlice`\n- `AsMap`\n- `AsMultiMap`\n- `ForEach`\n- `Distinc`\n- `DistinctCount`\n\nHopefully, the names are self-explanatory. The only thing to note is that `Sum` and `Avg` functions are for numeric types only, while `SumComplexType` and `AvgComplexType` are for complex types such as `complex64` and `complex128`.\n\nTo create an aggregation, provide a stage and an optional configuration.\n```go\np := pipeline.FromSlice[int]([]int{1, 2, 3, 4, 5})\n\n// multiplies each number by 2:\nmappingStage := transform.Map[int, int](filteringStage, func(i int) int {\n    return i * 2\n})\n\n// sums all numbers:\nres, err := aggregate.Sum[int](mappingStage)\n```\n\nPippin implements two types of aggregations: \n- synchronous - `aggregation` package\n- asynchronous - `asyncaggregation` package\n\nThe name of the functions and the arguments are the same for both packages, but the return types are different:\n- synchronous returns the pointer to the result and the error\n- asynchronous returns a `types.Future` object that contains either the pointer to the result or the error within\n\nThe key difference between the two is the fact that the synchronous aggregation blocks the current goroutine until the result is ready, while the asynchronous one doesn't.\nThat's why async one returns `Future` object.\n\nIf the pipeline is interrupted before the result is ready, the synchronous aggregation will return an error, while the asynchronous one will return a `Future` object with an error within.\n\nPlease, note that it is not possible to set up the delayed manual start for the pipeline if the synchronous aggregation is used - the code will panic.\n\n#### Future\n\n`Future` object is the concept similar to Java/Scala Future-s or JavaScript Promises. \nIt is an object that represents the result of an asynchronous computation that is going to be available in the future. \nThis is the way to avoid blocking the execution and a way to early return from a function.\n\nThere are two ways to do that:\n- by calling `Get()` method. This method will block until the value is available. It returns either the pointer to the value or an error.\nIn Pippin the error means that the pipeline was interrupted before it could complete that's why the value is not available.\n- by calling `GetWithTimeout(timeout time.Duration)` method. This method will block until the value is available or the timeout is reached.\n\nThe recommended way to obtain the value is by calling `GetWithTimeout` method, as otherwise the execution might be blocked forever.\n\nIt is possible to manually check whether the future is done or not by calling `IsDone` method.\nThis method return a boolean value indicating whether the future is done or not. It is not blocking.\n\nPlease, note that since it's the async operation, the value might not be available immediately.\n\n#### Configuration\n\nBoth transformation and aggregation functions accept an optional configuration argument similar to the pipeline configuration.\nIt is represented by the `configs.StageConfig` struct, which contains the following config options:\n- `MaxGoroutines` - is an integer that indicates the maximum number of goroutines that can be spawned by the stage. \nIf it is passed as `0` or less, then there is no limit. \nThis config option can be used to change the limit for each stage that comes from the `configs.PipelineConfig.MaxGoroutinesPerStage` option (if provided).\nPlease, note that the real number of goroutines might be higher than the number specified here, as the library spawns additional goroutines for internal purposes.\n- `Timeout` - indicates the timeout for the stage. If it is passed as `0` or less, then there is no timeout.\n- `StageConfig.CustomId` - is a custom ID for the stage. If it is passed as 0, then the stage will be assigned an ID automatically. \nAuto-generated IDs are calculated as follows: 1 + the ID of the previous stage. \nThe initial stage (the one that is created first) has an ID of 1. It is recommended to either rely on the auto-generated IDs or to provide a custom ID for each stage, otherwise the IDs might be messed up due to the (1 + the ID of the previous stage) logic mentioned above.\n- `Logger` is a logger that will be used by the pipeline.\nIf it is passed as nil, then the `logging.NoOpsLogger` logger will be used that does nothing.\nCheck `logging` package for more details and predefined loggers.\nThis config option can be used to change the logger for each stage that comes from the `configs.PipelineConfig.Logger` option (if provided).\n\nTo create a transformation with a custom configuration:\n```go\np := pipeline.FromSlice[int]([]int{1, 2, 3, 4, 5})\n\n// multiplies each number by 2:\nmappingStage := transform.Map[int, int](filteringStage, func(i int) int {\n    return i * 2\n}, configs.StageConfig{\n    MaxGoroutines: 10,\n    Timeout: time.Duration(1000) * time.Millisecond,\n    CustomId: 1,\n    Logger: logging.NewConsoleLogger(loglevels.INFO),\n})\n```\n\nPlease, note that even though it is technically possible to pass more than one configuration option, only the first one will be used.\n\n\nHave fun =)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fn0rdy%2Fpippin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fn0rdy%2Fpippin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fn0rdy%2Fpippin/lists"}