An open API service indexing awesome lists of open source software.

https://github.com/mylanconnolly/parallel

GNU Parallel replacement in Go
https://github.com/mylanconnolly/parallel

golang parallel

Last synced: about 2 months ago
JSON representation

GNU Parallel replacement in Go

Awesome Lists containing this project

README

          

# Parallel

This is meant to be a replacement for GNU parallel written in Go. This started
as a learning exercise in dealing with parallelism in Go, but has since become
a tool that I regularly use.

The tool will start a worker for each CPU and work through the list of jobs that
you give it. The amount of workers is configurable.

This tool is striving to only use stdlib packages.

## Usage

Install using `go get github.com/mylanconnolly/parallel` or some other means.

### Simple usage

The most straightforward usage would be:

```shell
# Want to calculate the MD5 sum of every file in /etc?
$ find /etc -type f | parallel md5sum

# Want to only use two workers for the same thing?
$ find /etc -type f | parallel -j 2 md5sum
```

### Command templating

You can utilize Go templates when performing a command using the `-t` flag. When
using the `-t` flag, you do not need to specify the command (it will be ignored
if you do).

The following fields are available when using templates:

| Field | Definition |
| :----------- | :------------------------------------------------------------ |
| `{{.Cmd}}` | The path of the command specified, for example echo or md5sum |
| `{{.Input}}` | The current input that we received via stdin or input file |
| `{{.Start}}` | The time that parallel was started |
| `{{.Time}}` | The time that the current operation began |

In addition, the following functions are available in templates:

| Function | Help |
| :------------- | :------------------------------------- |
| `toUpper` | Transform the string to uppercase |
| `toLower` | Transform the string to lowercase |
| `absolutePath` | Get the absolute path of a filename |
| `basename` | Get the basename of a file path |
| `dirname` | Get the directory of a file path |
| `ext` | Get the extension of a file |
| `noExt` | Get the file path without an extension |

Some examples below:

```shell
# Copy some files up a level (utilizing template pipelines).
parallel -a ./files.txt -t 'cp {{.Input}} {{.Input | dirname | dirname}}'

# Create a directory named after the file (without extension).
parallel -a ./files.txt -t 'mkdir -p {{.Input}} {{noExt .Input}}'

# Echo the base name of the file without the extension (utilizing template
# pipelines).
parallel -a ./files.txt -t 'mkdir -p {{.Input}} {{.Input | basename | noExt}}'
```

For more general information about Go templates, check
[here](https://golang.org/pkg/text/template/#pkg-overview).

## Real world examples

Here are some benchmarks using the `time` command. The benchmark I put together
is to run `md5sum` for every file in the Go source repository as of commit
14bec27743.

Below is the timing for the GNU version:

```
$ time find ~/src/go -type f | parallel md5sum > /dev/null
noglob find ~/src/go -type f 0.01s user 0.07s system 0% cpu 22.580 total
parallel md5sum > /dev/null 22.65s user 42.48s system 246% cpu 26.432 total
```

Below is the timing for this version:

```
$ time find ~/src/go -type f | ./parallel md5sum > /dev/null
noglob find ~/src/go -type f 0.02s user 0.05s system 3% cpu 1.845 total
./parallel md5sum > /dev/null 7.46s user 2.72s system 396% cpu 2.569 total
```

In this example it took GNU parallel around 10 times longer to complete the same
amount of work.

A few notes on my test environment:

- Thinkpad A485
- AMD Ryzen Pro 2700U
- 16GB of RAM
- 256GB NVMe SSD (though I believe it might be a pretty low-quality one)
- Ubuntu 20.04 LTS (kernel version 5.4.0-21-generic)