https://github.com/mikestefanello/batcher
Type-safe, automatic, asynchronous batch processing.
https://github.com/mikestefanello/batcher
batch batch-processing concurrency data goroutines
Last synced: 10 days ago
JSON representation
Type-safe, automatic, asynchronous batch processing.
- Host: GitHub
- URL: https://github.com/mikestefanello/batcher
- Owner: mikestefanello
- License: mit
- Created: 2023-03-25T23:36:59.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-06-17T21:30:05.000Z (over 1 year ago)
- Last Synced: 2025-09-26T01:35:58.105Z (5 months ago)
- Topics: batch, batch-processing, concurrency, data, goroutines
- Language: Go
- Homepage:
- Size: 13.7 KB
- Stars: 18
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Batcher
[](https://goreportcard.com/report/github.com/mikestefanello/batcher)
[](https://github.com/mikestefanello/batcher/actions/workflows/test.yml)
[](https://opensource.org/licenses/MIT)
[](https://pkg.go.dev/github.com/mikestefanello/batcher)
[](https://go.dev)
## Overview
_Batcher_ is a Go library that provides a type-safe, easy way to batch together arbitrary groups of items to be automatically and asynchronously processed. An item can be of any type that you want to pass to your processor. Items are queued within groups (using a _string_ as a name). If you do not have a need to separately group items, you can specify the same, or empty, group name for all items. Each group of items is separately sent to the processor callback that you specify.
The queue can be configured to automatically execute the batch operation under any of the following circumstances:
1) A specified duration has elapsed since the last process executed
2) The amount of total items queued (across all groups) has exceeded a given threshold
3) The amount of _groups_ of items has exceeded a given threshold
## Installation
`go get github.com/mikestefanello/batcher`
## Usage
As an example, we'll create a batcher to process items of type `Log` which will be grouped together via their `Level` then written in bulk.
```go
b, err := batcher.NewBatcher[Log](Config[Log]{
// Process the batch if we've queued 5 different Level values
GroupCountThreshold: 5,
// Process the batch if we've queued 100 Log items
ItemCountThreshold: 100,
// Process the queue every 30 seconds
DelayThreshold: 30 * time.Second,
// Use 3 Goroutines to process the queue groups
NumGoroutines: 3,
// Execute this func for each queued group
Processor: func(group string, log []Log) {
writeLogs(logs)
},
})
```
Add a `Log` to the batch queue.
```go
b.Add(log.Level, log)
```
## Origin
This concept was originally devised to handle some challenges faced with PubSub messages but was made entirely generic for this library so it could be used for any purpose.
1) Group incoming messages by a unique identifier, ie, a user ID, to deduplicate requests and ensure a given pod was only ever executing operations for a given ID in isolation.
2) Group all incoming messages to bulk-export to persistent storage for archiving.
To _roughly_ illustrate an example in code:
```go
func main() {
b, err := batcher.NewBatcher[*pubsub.Message](Config[*pubsub.Message]{
GroupCountThreshold: 10,
ItemCountThreshold: 100,
DelayThreshold: 10 * time.Second,
NumGoroutines: 3,
Processor: func(group string, items []*pubsub.Message) {
err := doUserOperation(group)
for _, m := range items {
if err != nil {
m.Nack()
} else {
m.Ack()
}
}
},
})
// Consume the messages from PubSub
err = subscription.Receive(ctx, func(ctx context.Context, message *pubsub.Message) {
// For this example, Data is the user ID
b.Add(string(message.Data), message)
})
}
```