Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/DanielMcSheehy/parallel-pipeline
Blazing fast parallel text data pipeline for large files
https://github.com/DanielMcSheehy/parallel-pipeline
Last synced: 2 months ago
JSON representation
Blazing fast parallel text data pipeline for large files
- Host: GitHub
- URL: https://github.com/DanielMcSheehy/parallel-pipeline
- Owner: DanielMcSheehy
- Created: 2021-10-06T20:31:16.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-10-13T13:38:40.000Z (over 3 years ago)
- Last Synced: 2024-08-02T05:12:23.031Z (6 months ago)
- Language: Go
- Homepage:
- Size: 64.5 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-blazingly-fast - parallel-pipeline - Blazing fast parallel text data pipeline for large files (Go)
README
# ![example workflow](https://github.com/DanielMcSheehy/parallel-pipeline/actions/workflows/tests.yaml/badge.svg) Parallel Pipeline
A blazing fast library that allows data pipelines to work in parallel. This can traverse and transform extremely large text files (100GB or more) in seconds.
## Usage```go
import "github.com/DanielMcSheehy/parallel-pipeline/pipeline"
```
Add some text transformations
```go
// example text transformation
func RemoveAllSmileyFaces() *pipeline.Transformer {
return &pipeline.Transformer{
Transform: func(input string) string {
return strings.ReplaceAll(input, "😀", "")
},
}
}
```
start the data pipeline
```go
func main() {
mainPipeline := pipeline.New(workerCount)
mainPipeline.RegisterTransformers(
RemoveAllSmileyFaces(),
)
mainPipeline.Execute(directory, ouputDirectory)
}
```