{"id":21620742,"url":"https://github.com/miku/parallel","last_synced_at":"2025-06-13T18:38:33.985Z","repository":{"id":57481414,"uuid":"95438025","full_name":"miku/parallel","owner":"miku","description":"Process lines in parallel.","archived":false,"fork":false,"pushed_at":"2025-01-23T14:52:36.000Z","size":8647,"stargazers_count":20,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-11T10:05:50.500Z","etag":null,"topics":["golang","io","parallel"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miku.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-06-26T11:04:54.000Z","updated_at":"2025-01-23T14:52:42.000Z","dependencies_parsed_at":"2025-04-11T09:13:32.971Z","dependency_job_id":"8d31c54d-cb7b-479b-a3a8-8fbf506b473f","html_url":"https://github.com/miku/parallel","commit_stats":{"total_commits":14,"total_committers":1,"mean_commits":14.0,"dds":0.0,"last_synced_commit":"6020962e560aa3ee6332ccf73b69421c617f8c00"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/miku/parallel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fparallel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fparallel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fparallel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fparallel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miku","download_url":"https://codeload.github.com/miku/parallel/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miku%2Fparallel/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259699967,"owners_count":22898361,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","io","parallel"],"created_at":"2024-11-24T23:12:38.196Z","updated_at":"2025-06-13T18:38:33.926Z","avatar_url":"https://github.com/miku.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# parallel\n\nProcess lines or records in parallel.\n\nThis package helps to increase the performance of command line filters, that\ntransform data and where data is read in a line or record oriented fashion.\n\nNote: The *order* of the input lines is not preserved in the output.\n\nThe main type is a\n[parallel.Processor](https://github.com/miku/parallel/blob/fa00b8c221050cc7a84a666f124c9a8c9f0cd471/processor.go#L68-L76),\nwhich reads from an [io.Reader](https://golang.org/pkg/io/#Reader), applies a\nfunction to each input line (separated by a newline by default) and writes the\nresult to an [io.Writer](https://golang.org/pkg/io/#Writer).\n\nThe [transformation function](https://github.com/miku/parallel/blob/fa00b8c221050cc7a84a666f124c9a8c9f0cd471/processor.go#L56-L58) takes a byte slice and therefore does not assume\nany specific format, so the input may be plain lines, CSV, newline delimited\nJSON or similar line oriented formats. The output is just bytes and can again\nassume any format.\n\nAn example for the identity transform:\n\n```go\nfunc Noop(b []byte) ([]byte, error) {\n\treturn b, nil\n}\n```\n\nWe can connect this function to IO and let it run:\n\n```go\np := parallel.NewProcessor(os.Stdin, os.Stdout, Noop)\nif err := p.Run(); err != nil {\n\tlog.Fatal(err)\n}\n```\n\nThat's all the setup needed. For details and self contained programs, see\n[examples](https://github.com/miku/parallel/tree/master/examples).\n\n# Adjusting the processor\n\nThe processor has a few attributes, that can be adjusted prior to running:\n\n```go\np := parallel.NewProcessor(os.Stdin, os.Stdout, parallel.ToTransformerFunc(bytes.ToUpper))\n\n// Adjust processor options.\np.NumWorkers = 4          // number of workers (default to runtime.NumCPU())\np.BatchSize = 10000       // how many records to batch, before sending to a worker\np.RecordSeparator = '\\n'  // record separator (must be a byte at the moment)\n\nif err := p.Run(); err != nil {\n\tlog.Fatal(err)\n}\n```\n\nThe defaults should work for most cases. Batches are kept in memory, so\nhigher batch sizes will need more memory but will decrease the coordination\noverhead. Sometimes, a batch size of one can be [useful\ntoo](https://github.com/miku/parallel/blob/fa00b8c221050cc7a84a666f124c9a8c9f0cd471/examples/fetchall.go#L166).\n\n# Record support\n\nIt is possible to parallelize record oriented data, too. There is a\n[record.Processor](https://github.com/miku/parallel/blob/11f067737e71ef854339f14b25b83c2194234311/record/record.go#L25-L35)\nadditionally takes a\n[Split](https://github.com/miku/parallel/blob/11f067737e71ef854339f14b25b83c2194234311/record/record.go#L37-L40)\nfunction, that is passed internally to a\n[bufio.Scanner](https://pkg.go.dev/bufio#Scanner), which will parse the input\nand will concatenate a number of records into a batch, which is then passed to\nthe conversion function.\n\nThe [bufio](https://pkg.go.dev/bufio) package contains a number of split\nfunctions, like [ScanWords](https://pkg.go.dev/bufio#ScanWords) and others.\nOriginally, we implemented record support for fast XML processing. For that, we\nadded a\n[TagSplitter](https://github.com/miku/parallel/blob/11f067737e71ef854339f14b25b83c2194234311/record/split.go#L28-L55)\nwhich can split input on XML tags.\n\n\n# Random performance data point\n\nCombining parallel with a fast JSON library, such as\n[jsoniter](https://github.com/json-iterator/go), one can process up to 100000\nJSON documents (of about 1K in size) per second. Here is an [example\nsnippet](https://gist.github.com/miku/62f64de2016dc38186e21270715e8016#file-main-go).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiku%2Fparallel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiku%2Fparallel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiku%2Fparallel/lists"}