{"id":13577746,"url":"https://github.com/rofl0r/jobflow","last_synced_at":"2026-03-08T08:33:27.325Z","repository":{"id":2237015,"uuid":"3191076","full_name":"rofl0r/jobflow","owner":"rofl0r","description":"distribute and coordinate work using parallel processes (like GNU parallel, but much faster and memory-efficient)","archived":false,"fork":false,"pushed_at":"2021-12-19T18:47:35.000Z","size":116,"stargazers_count":92,"open_issues_count":2,"forks_count":7,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-10-14T12:46:33.647Z","etag":null,"topics":["c","fast","gnu-parallel","lightweight","parallel","pipes","process","unix"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rofl0r.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-01-16T14:43:42.000Z","updated_at":"2024-08-22T16:28:08.000Z","dependencies_parsed_at":"2022-09-09T18:00:12.639Z","dependency_job_id":null,"html_url":"https://github.com/rofl0r/jobflow","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rofl0r%2Fjobflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rofl0r%2Fjobflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rofl0r%2Fjobflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rofl0r%2Fjobflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rofl0r","download_url":"https://codeload.github.com/rofl0r/jobflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221672307,"owners_count":16861433,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","fast","gnu-parallel","lightweight","parallel","pipes","process","unix"],"created_at":"2024-08-01T15:01:23.990Z","updated_at":"2026-03-08T08:33:27.277Z","avatar_url":"https://github.com/rofl0r.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":"jobflow by rofl0r\n=================\n\nthis program is inspired by the functionality of GNU parallel, but tries\nto keep low overhead and follow the UNIX philosophy of doing one thing well.\n\nhow it works\n------------\n\nbasically, it works by processing stdin, launching one process per line.\nthe actual line can be passed to the started program as an argv.\nthis allows for easy parallelization of standard unix tasks.\n\nit is possible to save the current processed line, so when the task is killed\nit can be continued later.\n\nexample usage\n-------------\n\nyou have a list of things, and a tool that processes a single thing.\n\n    cat things.list | jobflow -threads=8 -exec ./mytask {}\n\n    seq 100 | jobflow -threads=100 -exec echo {}\n\n    cat urls.txt | jobflow -threads=32 -exec wget {}\n\n    find . -name '*.bmp' | jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg\n\nrun jobflow without arguments to see a list of possible command line options,\nand argument permutations.\n\nstarting from version 1.3.1, jobflow can also be used to extract a range of\nlines, e.g.:\n\n    seq 100 | jobflow -skip 10 -count 10  # print lines 11 to 20\n\nComparison with GNU parallel\n----------------------------\n\nGNU parallel is written in perl, which has the following disadvantages:\n- requires a perl installation\n  even though most people already have perl installed anyway, installing it\n  just for this purpose requires up to 50 MB storage (and potentially up to\n  several hours of time to compile it from source on slow devices)\n- requires a lot of time on startup (parsing sources, etc)\n- requires a lot of memory (typically between 5-60 MB)\n\njobflow OTOH is written in C, which has numerous advantages.\n- once compiled to a tiny static binary, can be used without 3rd party stuff\n- very little and constant memory usage (typically a few KB)\n- no startup overhead\n- much higher execution speed\n\napart from the chosen language and related performance differences, the\nfollowing other differences exist between GNU parallel and jobflow:\n\n+ supports rlimits passed to started processes\n- doesn't support ssh (usage of remote cpus)\n- doesn't support all kinds of argument permutations:\n  while GNU parallel has a rich set of options to permute the input,\n  this doesn't adhere to the UNIX philosophy.\n  jobflow can achieve the same result by passing the unmodified input\n  to a user-created script that does the required permutations with other\n  standard tools.\n\navailable command line options\n------------------------------\n\n    -skip N -threads N -resume -statefile=/tmp/state -delayedflush\n    -delayedspinup N -buffered -joinoutput -limits mem=16M,cpu=10\n    -eof=XXX\n    -exec ./mycommand {}\n\n-skip N\n\n    N=number of entries to skip\n-count N\n\n    N=only process count lines (after skipping)\n-threads N (alternative: -j N)\n\n    N=number of parallel processes to spawn\n-resume\n\n    resume from last jobnumber stored in statefile\n-eof XXX\n\n    use XXX as the EOF marker on stdin\n    if the marker is encountered, behave as if stdin was closed\n    not compatible with pipe/bulk mode\n-statefile XXX\n\n    XXX=filename\n    saves last launched jobnumber into a file\n-delayedflush\n\n    only write to statefile whenever all processes are busy,\n    and at program end\n-delayedspinup N\n\n    N=maximum amount of milliseconds\n    ...to wait when spinning up a fresh set of processes\n    a random value between 0 and the chosen amount is used to delay initial\n    spinup.\n    this can be handy to circumvent an I/O lockdown because of a burst of\n    activity on program startup\n-buffered\n\n    store the stdout and stderr of launched processes into a temporary file\n    which will be printed after a process has finished.\n    this prevents mixing up of output of different processes.\n-joinoutput\n\n    if -buffered, write both stdout and stderr into the same file.\n    this saves the chronological order of the output, and the combined output\n    will only be printed to stdout.\n-bulk N\n\n    do bulk copies with a buffer of N bytes. only usable in pipe mode.\n    this passes (almost) the entire buffer to the next scheduled job.\n    the passed buffer will be truncated to the last line break boundary,\n    so jobs always get entire lines to work with.\n    this option is useful when you have huge input files and relatively short\n    task runtimes. by using it, syscall overhead can be reduced to a minimum.\n    N must be a multiple of 4KB. the suffixes G/M/K are detected.\n    actual memory allocation will be twice the amount passed.\n    note that pipe buffer size is limited to 64K on linux, so anything higher\n    than that probably doesn't make sense.\n-limits [mem=N,cpu=N,stack=N,fsize=N,nofiles=N]\n\n    sets the rlimit of the new created processes.\n    see \"man setrlimit\" for an explanation. the suffixes G/M/K are detected.\n-exec command with args\n\n    everything past -exec is treated as the command to execute on each line of\n    stdin received. the line can be passed as an argument using {}.\n    {.} passes everything before the last dot in a line as an argument.\n    it is possible to use multiple substitutions inside a single argument,\n    but currently only of one type.\n    if -exec is omitted, input will merely be dumped to stdout (like cat).\n\n\nBUILD\n-----\n\njust run `make`.\n\nyou may override variables used in the Makefile and set optimization\nCFLAGS and similar thing using a file called `config.mak`, e.g.:\n\n    echo \"CFLAGS=-O2 -g\" \u003e config.mak\n    make -j2\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frofl0r%2Fjobflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frofl0r%2Fjobflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frofl0r%2Fjobflow/lists"}