https://github.com/koykov/batch_query
Batch operations abstraction over various storages.
https://github.com/koykov/batch_query
batch-processing system-design
Last synced: about 2 months ago
JSON representation
Batch operations abstraction over various storages.
- Host: GitHub
- URL: https://github.com/koykov/batch_query
- Owner: koykov
- License: mit
- Created: 2023-06-06T18:57:03.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2026-02-06T21:00:41.000Z (4 months ago)
- Last Synced: 2026-02-07T05:44:15.212Z (4 months ago)
- Topics: batch-processing, system-design
- Language: Go
- Homepage:
- Size: 92.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: readme.md
- License: license.md
Awesome Lists containing this project
README
# Batch Query
This library helps to combine many small, similar queries to various resources (databases, caches, networks) into batches,
thereby reducing overhead on connections and data transport.
The library initially developed in response to permanent issues between a high-load service and an Aerospike server.
Aerospike's own metrics showed it was underloaded, while the application's metrics indicated huge query latencies to Aerospike.
The library's creation was also inspired by the
[singleflight](https://pkg.go.dev/golang.org/x/sync/singleflight) package - the main idea of reducing the number of small queries,
as a logical continuation, led to the thought of eliminating them totally using batching.
The solution's concept is very simple: collect small queries until either an amount limit (maximum batch size) is reached
or the batch collection time expires (from the moment the first small query enters to the batch).
## Usage
The central component of the library is the `BatchQuery` structure, which does all the work. Before use, it must be configured,
and the [`Config`](config.go) structure serves this purpose. Let's examine its fields:
* `BatchSize` - how many small queries a batch can contain. Optional field, default value is `64`.
* `CollectInterval` - the maximum duration for collecting a batch. Starts counting from the moment the first query enters the batch. Default value is `1` second.
* `TimeoutInterval` - a limit on collection, sending the batch request, and post-processing. Must be greater than `CollectInterval`.
* `Batcher` - an abstraction for a specific storage, see description below. Mandatory parameter.
* `Buffer` - size of storage for collected batches, ready to be sent and processed.
* `Workers` - number of workers for sending/processing batches. They read from the buffer (see `Buffer`).
* `MetricsWriter` - abstraction for a specific TSDB solution.
* `Logger` - abstraction for an internal process logger. Useful for debugging, not recommended for production.
Thus, a usage example looks like this:
```go
package main
import (
"log"
"math/rand"
"os"
"time"
as "github.com/aerospike/aerospike-client-go"
"github.com/koykov/batch_query"
promw "github.com/koykov/batch_query/metrics/prometheus"
"github.com/koykov/batch_query/mods/aerospike"
)
func main() {
policy := as.NewBatchPolicy()
client, _ := as.NewClientWithPolicy(as.NewClientPolicy(), "localhost", 3000)
// Prepare config for query.
conf := batch_query.Config{
BatchSize: 100,
CollectInterval: 500 * time.Microsecond,
TimeoutInterval: 5 * time.Millisecond,
Workers: 10,
Buffer: 4,
// Declare Aerospike batcher with specific params.
Batcher: aerospike.Batcher{
Namespace: "my_ns",
SetName: "my_set",
Bins: []string{"bin01", "bin02", "binNN"},
Policy: policy,
Client: client,
},
// Declare writer to export metrics.
MetricsWriter: promw.NewWriter("my_query", promw.WithPrecision(time.Millisecond)),
// Declare logger for debugging purposes.
Logger: log.New(os.Stderr, "", log.LstdFlags),
}
// Initialize the query.
bq, _ := batch_query.New(&conf)
// Start 10k goroutines to fetch small keys.
for i := 0; i < 10000; i++ {
go func() {
for {
resp, _ := bq.Fetch(i + rand.Intn(i))
_ = resp.(*as.Record)
}
}()
}
c := make(chan struct{})
<-c
}
```
In this example, 10k goroutines read single keys, which will be combined into batches and processed in bulk by the query.
At the same time, each goroutine will receive a response specifically for the key it requested, or an error.
## Modules
Currently, the library supports three data storages via the [Batcher](batcher.go) abstraction:
* [Aerospike](mods/aerospike)
* [Redis](mods/redis)
* [SQL](mods/sql)
The interface itself is quite simple, and if necessary, it's fairly straightforward to write your own version for the required storage.
## Metrics
To evaluate the query's efficiency and/or tune configuration parameters, you can set a component for writing and exporting
metrics via the [MetricsWriter](metrics.go) abstraction. Currently, the following TSDBs are supported:
* [Prometheus](https://github.com/koykov/batch_query/tree/master/metrics/prometheus)
* [VictoriaMetrics](https://github.com/koykov/batch_query/tree/master/metrics/victoria)
Using them is very simple - you need to set a unique queue name and, optionally, the timestamp precision
(by default, one nanosecond, but it's more reasonable to set one millisecond, see the usage example).