Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/DanielMcSheehy/parallel-pipeline

Blazing fast parallel text data pipeline for large files
https://github.com/DanielMcSheehy/parallel-pipeline

Last synced: about 1 month ago
JSON representation

Blazing fast parallel text data pipeline for large files

Awesome Lists containing this project

README

        

# ![example workflow](https://github.com/DanielMcSheehy/parallel-pipeline/actions/workflows/tests.yaml/badge.svg) Parallel Pipeline
A blazing fast library that allows data pipelines to work in parallel. This can traverse and transform extremely large text files (100GB or more) in seconds.
## Usage

```go
import "github.com/DanielMcSheehy/parallel-pipeline/pipeline"
```
Add some text transformations
```go
// example text transformation
func RemoveAllSmileyFaces() *pipeline.Transformer {
return &pipeline.Transformer{
Transform: func(input string) string {
return strings.ReplaceAll(input, "😀", "")
},
}
}
```
start the data pipeline
```go
func main() {
mainPipeline := pipeline.New(workerCount)
mainPipeline.RegisterTransformers(
RemoveAllSmileyFaces(),
)
mainPipeline.Execute(directory, ouputDirectory)
}
```