https://github.com/bobheadxi/streamline
✏️ Transform and handle your data, line by line
https://github.com/bobheadxi/streamline
go golang io streaming
Last synced: over 1 year ago
JSON representation
✏️ Transform and handle your data, line by line
- Host: GitHub
- URL: https://github.com/bobheadxi/streamline
- Owner: bobheadxi
- License: mit
- Created: 2023-01-09T07:23:50.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-05T16:09:57.000Z (about 2 years ago)
- Last Synced: 2025-02-27T18:12:29.905Z (over 1 year ago)
- Topics: go, golang, io, streaming
- Language: Go
- Homepage: https://pkg.go.dev/go.bobheadxi.dev/streamline
- Size: 429 KB
- Stars: 10
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# streamline [](https://pkg.go.dev/go.bobheadxi.dev/streamline) [](https://sourcegraph.com/github.com/bobheadxi/streamline)
[](https://github.com/bobheadxi/streamline/actions/workflows/pipeline.yaml)
[](https://codecov.io/gh/bobheadxi/streamline)
[](https://goreportcard.com/report/go.bobheadxi.dev/streamline)
[](https://bobheadxi.dev/streamline)
Transform and handle your data, line by line.
```sh
go get go.bobheadxi.dev/streamline
```
## Overview
[`streamline`](https://pkg.go.dev/go.bobheadxi.dev/streamline) offers a variety of primitives to make working with data line by line a breeze:
- [`streamline.Stream`](https://pkg.go.dev/go.bobheadxi.dev/streamline#Stream) offers the ability to add hooks that handle an `io.Reader` line-by-line with `(*Stream).Stream`, `(*Stream).StreamBytes`, and other utilities.
- [`pipeline.Pipeline`](https://pkg.go.dev/go.bobheadxi.dev/streamline/pipeline#Pipeline) offers a way to build pipelines that transform the data in a `streamline.Stream`, such as cleaning, filtering, mapping, or sampling data.
- [`jq.Pipeline`](https://pkg.go.dev/go.bobheadxi.dev/streamline/jq#Pipeline) can be used to map every line to the output of a JQ query, for example.
- [`streamline.Stream` implements standard `io` interfaces like `io.Reader`](https://pkg.go.dev/go.bobheadxi.dev/streamline#Stream.Read), so `pipeline.Pipeline` can be used for general-purpose data manipulation as well.
- [`pipe.NewStream`](https://pkg.go.dev/go.bobheadxi.dev/streamline/pipe#NewStream) offers a way to create a buffered pipe between a writer and a `Stream`.
- [`streamexec.Start`](https://pkg.go.dev/go.bobheadxi.dev/streamline/streamexec#Start) uses this to attach a `Stream` to an `exec.Cmd` to work with command output.
When working with data streams in Go, you typically get an `io.Reader`, which is great for arbitrary data - but in many cases, especially when scripting, it's common to either end up with data and outputs that are structured line by line, or want to handle data line by line, for example to send to a structured logging library. You can set up a `bufio.Reader` or `bufio.Scanner` to do this, but for cases like `exec.Cmd` you will also need boilerplate to configure the command and set up pipes, and for additional functionality like transforming, filtering, or sampling output you will need to write your own additional handlers. `streamline` aims to provide succint ways to do all of the above and more.
### Add prefixes to command output
bufio.Scanner
streamline/streamexec
```go
func PrefixOutput(cmd *exec.Cmd) error {
reader, writer := io.Pipe()
cmd.Stdout = writer
cmd.Stderr = writer
if err := cmd.Start(); err != nil {
return err
}
errC := make(chan error)
go func() {
err := cmd.Wait()
writer.Close()
errC <- err
}()
s := bufio.NewScanner(reader)
for s.Scan() {
println("PREFIX: ", s.Text())
}
if err := s.Err(); err != nil {
return err
}
return <-errC
}
```
```go
func PrefixOutput(cmd *exec.Cmd) error {
stream, err := streamexec.Start(cmd)
if err != nil {
return err
}
return stream.Stream(func(line string) {
println("PREFIX: ", line)
})
}
```
### Process JSON on the fly
bufio.Scanner
streamline
```go
func GetMessages(r io.Reader) error {
s := bufio.NewScanner(r)
for s.Scan() {
var result bytes.Buffer
cmd := exec.Command("jq", ".msg")
cmd.Stdin = bytes.NewReader(s.Bytes())
cmd.Stdout = &result
if err := cmd.Run(); err != nil {
return err
}
print(result.String())
}
return s.Err()
}
```
```go
func GetMessages(r io.Reader) error {
return streamline.New(r).
WithPipeline(jq.Pipeline(".msg")).
Stream(func(line string) {
println(line)
})
}
```
### Sample noisy output
bufio.Scanner
streamline
```go
func PrintEvery10th(r io.Reader) error {
s := bufio.NewScanner(r)
var count int
for s.Scan() {
count++
if count%10 != 0 {
continue
}
println(s.Text())
}
return s.Err()
}
```
```go
func PrintEvery10th(r io.Reader) error {
return streamline.New(r).
WithPipeline(pipeline.Sample(10)).
Stream(func(line string) {
println(line)
})
}
```
### Transform specific lines
This particular example is a somewhat realistic one - [GCP Cloud SQL cannot accept `pgdump` output that contains certain `EXTENSION`-related statements](https://cloud.google.com/sql/docs/postgres/import-export/import-export-dmp#external-server), so to `pgdump` a PostgreSQL database and upload the dump in a bucket for import into Cloud SQL, one must pre-process their dumps to remove offending statements.
bufio.Scanner
streamline
```go
var unwanted = []byte("COMMENT ON EXTENSION")
func Upload(pgdump *os.File, dst io.Writer) error {
s := bufio.NewScanner(pgdump)
for s.Scan() {
line := s.Bytes()
var err error
if bytes.Contains(line, unwanted) {
_, err = dst.Write(
// comment out this line
append([]byte("-- "), line...))
} else {
_, err = dst.Write(line)
}
if err != nil {
return err
}
}
return s.Err()
}
```
```go
var unwanted = []byte("COMMENT ON EXTENSION")
func Upload(pgdump *os.File, dst io.Writer) error {
_, err := streamline.New(pgdump).
WithPipeline(pipeline.Map(func(line []byte) []byte {
if bytes.Contains(line, unwanted) {
// comment out this line
return append([]byte("-- "), line...)
}
return line
})).
WriteTo(dst)
return err
}
```
## Background
Some of the ideas in this package started in [`sourcegraph/run`](https://github.com/sourcegraph/run), which started as a project trying to build utilities that [made it easier to write bash-esque scripts using Go](https://github.com/sourcegraph/sourcegraph/blob/main/doc/dev/adr/1652433602-use-go-for-scripting.md) - namely being able to do things you would often to in scripts such as grepping and iterating over lines. `streamline` generalizes on the ideas used in `sourcegraph/run` for working with command output to work on arbitrary inputs, and `sourcegraph/run` now uses `streamline` internally.