Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/deliveryhero/pipeline
A library to help you create pipelines in Golang
https://github.com/deliveryhero/pipeline
Last synced: about 1 month ago
JSON representation
A library to help you create pipelines in Golang
- Host: GitHub
- URL: https://github.com/deliveryhero/pipeline
- Owner: deliveryhero
- License: mit
- Created: 2021-04-14T07:44:47.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-01-24T11:57:49.000Z (11 months ago)
- Last Synced: 2024-08-02T20:47:39.565Z (4 months ago)
- Language: Go
- Homepage:
- Size: 157 KB
- Stars: 188
- Watchers: 15
- Forks: 13
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: CODEOWNERS
- Security: SECURITY.md
Awesome Lists containing this project
- my-awesome - deliveryhero/pipeline - 01 star:0.2k fork:0.0k A library to help you create pipelines in Golang (Go)
README
# pipeline
[![Build](https://github.com/deliveryhero/pipeline/actions/workflows/ci.yml/badge.svg)](https://github.com/deliveryhero/pipeline/actions/workflows/ci.yml)
[![GoDoc](https://img.shields.io/badge/pkg.go.dev-doc-blue)](http://pkg.go.dev/github.com/deliveryhero/pipeline/v2)
[![Go Report Card](https://goreportcard.com/badge/deliveryhero/pipeline)](https://goreportcard.com/report/deliveryhero/pipeline)Pipeline is a go library that helps you build pipelines without worrying about channel management and concurrency.
It contains common fan-in and fan-out operations as well as useful utility funcs for batch processing and scaling.If you have another common use case you would like to see covered by this package, please [open a feature request](https://github.com/deliveryhero/pipeline/issues).
## Cookbook
* [How to run a pipeline until the container is killed](https://github.com/deliveryhero/pipeline#PipelineShutsDownWhenContainerIsKilled)
* [How to shut down a pipeline when there is a error](https://github.com/deliveryhero/pipeline#PipelineShutsDownOnError)
* [How to shut down a pipeline after it has finished processing a batch of data](https://github.com/deliveryhero/pipeline#PipelineShutsDownWhenInputChannelIsClosed)## Functions
### func [Apply](/apply.go#L34)
`func Apply[A, B, C any](a Processor[A, []B], b Processor[B, C]) Processor[A, []C]`
Apply connects two processes, applying the second to each item of the first output
```golang
transform := pipeline.NewProcessor(func(_ context.Context, s string) ([]string, error) {
return strings.Split(s, ","), nil
}, nil)double := pipeline.NewProcessor(func(_ context.Context, s string) (string, error) {
return s + s, nil
}, nil)addLeadingZero := pipeline.NewProcessor(func(_ context.Context, s string) (string, error) {
return "0" + s, nil
}, nil)apply := pipeline.Apply(
transform,
pipeline.Sequence(
double,
addLeadingZero,
double,
),
)input := "1,2,3,4,5"
for out := range pipeline.Process(context.Background(), apply, pipeline.Emit(input)) {
for j := range out {
fmt.Printf("process: %s\n", out[j])
}
}
```Output:
```
process: 011011
process: 022022
process: 033033
process: 044044
process: 055055
```### func [Buffer](/buffer.go#L5)
`func Buffer[Item any](size int, in <-chan Item) <-chan Item`
Buffer creates a buffered channel that will close after the input
is closed and the buffer is fully drained### func [Cancel](/cancel.go#L9)
`func Cancel[Item any](ctx context.Context, cancel func(Item, error), in <-chan Item) <-chan Item`
Cancel passes an `Item any` from the `in <-chan Item` directly to the out `<-chan Item` until the `Context` is canceled.
After the context is canceled, everything from `in <-chan Item` is sent to the `cancel` func instead with the `ctx.Err()`.```golang
// Create a context that lasts for 1 second
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()// Create a basic pipeline that emits one int every 250ms
p := pipeline.Delay(ctx, time.Second/4,
pipeline.Emit(1, 2, 3, 4, 5),
)// If the context is canceled, pass the ints to the cancel func for teardown
p = pipeline.Cancel(ctx, func(i int, err error) {
fmt.Printf("%+v could not be processed, %s\n", i, err)
}, p)// Otherwise, process the inputs
for out := range p {
fmt.Printf("process: %+v\n", out)
}
```Output:
```
process: 1
process: 2
process: 3
process: 4
5 could not be processed, context deadline exceeded
```### func [Collect](/collect.go#L13)
`func Collect[Item any](ctx context.Context, maxSize int, maxDuration time.Duration, in <-chan Item) <-chan []Item`
Collect collects `[Item any]`s from its in channel and returns `[]Item` from its out channel.
It will collect up to `maxSize` inputs from the `in <-chan Item` over up to `maxDuration` before returning them as `[]Item`.
That means when `maxSize` is reached before `maxDuration`, `[maxSize]Item` will be passed to the out channel.
But if `maxDuration` is reached before `maxSize` inputs are collected, `[< maxSize]Item` will be passed to the out channel.
When the `context` is canceled, everything in the buffer will be flushed to the out channel.### func [Delay](/delay.go#L10)
`func Delay[Item any](ctx context.Context, duration time.Duration, in <-chan Item) <-chan Item`
Delay delays reading each input by `duration`.
If the context is canceled, the delay will not be applied.### func [Drain](/drain.go#L4)
`func Drain[Item any](in <-chan Item)`
Drain empties the input and blocks until the channel is closed
### func [Emit](/emit.go#L6)
`func Emit[Item any](is ...Item) <-chan Item`
Emit fans `is ...Item`` out to a `<-chan Item`
### func [Emitter](/emit.go#L19)
`func Emitter[Item any](ctx context.Context, next func() Item) <-chan Item`
Emitter continuously emits new items generated by the next func
until the context is canceled### func [Merge](/merge.go#L6)
`func Merge[Item any](ins ...<-chan Item) <-chan Item`
Merge fans multiple channels in to a single channel
```golang
one := pipeline.Emit(1)
two := pipeline.Emit(2, 2)
three := pipeline.Emit(3, 3, 3)for i := range pipeline.Merge(one, two, three) {
fmt.Printf("output: %d\n", i)
}fmt.Println("done")
```Output:
```
Output:: 1
Output:: 3
Output:: 2
Output:: 2
Output:: 3
Output:: 3
done
```### func [Process](/process.go#L13)
`func Process[Input, Output any](ctx context.Context, processor Processor[Input, Output], in <-chan Input) <-chan Output`
Process takes each input from the `in <-chan Input` and calls `Processor.Process` on it.
When `Processor.Process` returns an `Output`, it will be sent to the output `<-chan Output`.
If `Processor.Process` returns an error, `Processor.Cancel` will be called with the corresponding input and error message.
Finally, if the `Context` is canceled, all inputs remaining in the `in <-chan Input` will go directly to `Processor.Cancel`.```golang
// Create a context that times out after 5 seconds
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()// Create a pipeline that emits 1-6 at a rate of one int per second
p := pipeline.Delay(ctx, time.Second, pipeline.Emit(1, 2, 3, 4, 5, 6))// Multiply each number by 10
p = pipeline.Process(ctx, pipeline.NewProcessor(func(ctx context.Context, in int) (int, error) {
return in * 10, nil
}, func(i int, err error) {
fmt.Printf("error: could not multiply %v, %s\n", i, err)
}), p)// Finally, lets print the results and see what happened
for result := range p {
fmt.Printf("result: %d\n", result)
}
```Output:
```
result: 10
result: 20
result: 30
result: 40
result: 50
error: could not multiply 6, context deadline exceeded
```### func [ProcessBatch](/process_batch.go#L14)
`func ProcessBatch[Input, Output any](
ctx context.Context,
maxSize int,
maxDuration time.Duration,
processor Processor[[]Input, []Output],
in <-chan Input,
) <-chan Output`ProcessBatch collects up to maxSize elements over maxDuration and processes them together as a slice of `Input`s.
It passed an []Output to the `Processor.Process` method and expects a []Input back.
It passes []Input batches of inputs to the `Processor.Cancel` method.
If the receiver is backed up, ProcessBatch can holds up to 2x maxSize.```golang
// Create a context that times out after 5 seconds
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()// Create a pipeline that emits 1-6 at a rate of one int per second
p := pipeline.Delay(ctx, time.Second, pipeline.Emit(1, 2, 3, 4, 5, 6))// Multiply every 2 adjacent numbers together
p = pipeline.ProcessBatch(ctx, 2, time.Minute, pipeline.NewProcessor(func(ctx context.Context, is []int) ([]int, error) {
o := 1
for _, i := range is {
o *= i
}
return []int{o}, nil
}, func(is []int, err error) {
fmt.Printf("error: could not multiply %v, %s\n", is, err)
}), p)// Finally, lets print the results and see what happened
for result := range p {
fmt.Printf("result: %d\n", result)
}
```Output:
```
result: 2
result: 12
error: could not multiply [5 6], context deadline exceeded
```### func [ProcessBatchConcurrently](/process_batch.go#L35)
`func ProcessBatchConcurrently[Input, Output any](
ctx context.Context,
concurrently,
maxSize int,
maxDuration time.Duration,
processor Processor[[]Input, []Output],
in <-chan Input,
) <-chan Output`ProcessBatchConcurrently fans the in channel out to multiple batch Processors running concurrently,
then it fans the out channels of the batch Processors back into a single out chan```golang
// Create a context that times out after 5 seconds
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()// Create a pipeline that emits 1-9
p := pipeline.Emit(1, 2, 3, 4, 5, 6, 7, 8, 9)// Add a 1 second delay to each number
p = pipeline.Delay(ctx, time.Second, p)// Group two inputs at a time
p = pipeline.ProcessBatchConcurrently(ctx, 2, 2, time.Minute, pipeline.NewProcessor(func(ctx context.Context, ins []int) ([]int, error) {
return ins, nil
}, func(i []int, err error) {
fmt.Printf("error: could not process %v, %s\n", i, err)
}), p)// Finally, lets print the results and see what happened
for result := range p {
fmt.Printf("result: %d\n", result)
}
```Output:
```
result: 1
result: 2
result: 3
result: 5
error: could not process [7 8], context deadline exceeded
error: could not process [4 6], context deadline exceeded
error: could not process [9], context deadline exceeded
```### func [ProcessConcurrently](/process.go#L26)
`func ProcessConcurrently[Input, Output any](ctx context.Context, concurrently int, p Processor[Input, Output], in <-chan Input) <-chan Output`
ProcessConcurrently fans the in channel out to multiple Processors running concurrently,
then it fans the out channels of the Processors back into a single out chan```golang
// Create a context that times out after 5 seconds
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()// Create a pipeline that emits 1-7
p := pipeline.Emit(1, 2, 3, 4, 5, 6, 7)// Add a two second delay to each number
p = pipeline.Delay(ctx, 2*time.Second, p)// Add two concurrent processors that pass input numbers to the output
p = pipeline.ProcessConcurrently(ctx, 2, pipeline.NewProcessor(func(ctx context.Context, in int) (int, error) {
return in, nil
}, func(i int, err error) {
fmt.Printf("error: could not process %v, %s\n", i, err)
}), p)// Finally, lets print the results and see what happened
for result := range p {
log.Printf("result: %d\n", result)
}
```Output:
```
result: 2
result: 1
result: 4
result: 3
error: could not process 6, process was canceled
error: could not process 5, process was canceled
error: could not process 7, context deadline exceeded
```### func [Split](/split.go#L4)
`func Split[Item any](in <-chan []Item) <-chan Item`
Split takes an interface from Collect and splits it back out into individual elements
## Examples
### PipelineShutsDownOnError
The following example shows how you can shutdown a pipeline
gracefully when it receives an error message```golang
// Create a context that can be canceled
ctx, cancel := context.WithCancel(context.Background())
defer cancel()// Create a pipeline that emits 1-10
p := pipeline.Emit(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)// A step that will shutdown the pipeline if the number is greater than 1
p = pipeline.Process(ctx, pipeline.NewProcessor(func(ctx context.Context, i int) (int, error) {
// Shut down the pipeline by canceling the context
if i != 1 {
cancel()
return i, fmt.Errorf("%d caused the shutdown", i)
}
return i, nil
}, func(i int, err error) {
// The cancel func is called when an error is returned by the process func or the context is canceled
fmt.Printf("could not process %d: %s\n", i, err)
}), p)// Finally, lets print the results and see what happened
for result := range p {
fmt.Printf("result: %d\n", result)
}fmt.Println("exiting the pipeline after all data is processed")
```Output:
```
could not process 2: 2 caused the shutdown
result: 1
could not process 3: context canceled
could not process 4: context canceled
could not process 5: context canceled
could not process 6: context canceled
could not process 7: context canceled
could not process 8: context canceled
could not process 9: context canceled
could not process 10: context canceled
exiting the pipeline after all data is processed
```### PipelineShutsDownWhenContainerIsKilled
This example demonstrates a pipline
that runs until the os / container the pipline is running in kills it```golang
// Gracefully shutdown the pipeline when the the system is shutting down
// by canceling the context when os.Kill or os.Interrupt signal is sent
ctx, cancel := signal.NotifyContext(context.Background(), os.Kill, os.Interrupt)
defer cancel()// Create a pipeline that keeps emitting numbers sequentially until the context is canceled
var count int
p := pipeline.Emitter(ctx, func() int {
count++
return count
})// Filter out only even numbers
p = pipeline.Process(ctx, pipeline.NewProcessor(func(ctx context.Context, i int) (int, error) {
if i%2 == 0 {
return i, nil
}
return i, fmt.Errorf("'%d' is an odd number", i)
}, func(i int, err error) {
fmt.Printf("error processing '%v': %s\n", i, err)
}), p)// Wait a few nanoseconds an simulate the os.Interrupt signal
go func() {
time.Sleep(time.Millisecond / 10)
fmt.Print("\n--- os kills the app ---\n\n")
syscall.Kill(syscall.Getpid(), syscall.SIGINT)
}()// Finally, lets print the results and see what happened
for result := range p {
fmt.Printf("result: %d\n", result)
}fmt.Println("exiting after the input channel is closed")
```Output:
```
error processing '1': '1' is an odd number
result: 2--- os kills the app ---
error processing '3': '3' is an odd number
error processing '4': context canceled
exiting after the input channel is closed
```### PipelineShutsDownWhenInputChannelIsClosed
The following example demonstrates a pipeline
that naturally finishes its run when the input channel is closed```golang
// Create a context that can be canceled
ctx, cancel := context.WithCancel(context.Background())
defer cancel()// Create a pipeline that emits 1-10 and then closes its output channel
p := pipeline.Emit(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)// Multiply every number by 2
p = pipeline.Process(ctx, pipeline.NewProcessor(func(ctx context.Context, i int) (int, error) {
return i * 2, nil
}, func(i int, err error) {
fmt.Printf("could not multiply %d: %s\n", i, err)
}), p)// Finally, lets print the results and see what happened
for result := range p {
fmt.Printf("result: %d\n", result)
}fmt.Println("exiting after the input channel is closed")
```Output:
```
result: 2
result: 4
result: 6
result: 8
result: 10
result: 12
result: 14
result: 16
result: 18
result: 20
exiting after the input channel is closed
```