Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/KholdStare/plumbingplusplus

"unix pipes" or automatic concurrent pipelining in C++
https://github.com/KholdStare/plumbingplusplus

Last synced: 3 months ago
JSON representation

"unix pipes" or automatic concurrent pipelining in C++

Awesome Lists containing this project

README

        

# Plumbing++

Most programmers are familiar with the concept of a Unix pipe. Besides allowing
easy function composition, they provide an easy way to implicitly run each task
concurrently. This library aims to do just that in C++, allowing for
type-safe, concurrent and parallelized pipelines of functions "for free".

## Synopsis (API Proposal)

As an example, say you have a nice set of composable functions, and you want to
feed data through them:

Image loadImage(std::string path);
Image processImage(Image const& image);
void saveImage(Imageconst& image);

To process several images, the simple imperative approach is a for loop:

std::vector paths{"tree.jpg", "mountain.jpg", "car.jpg"};

for (auto&& path : paths)
{
Image image = loadImage(path);
saveImage(processImage(image));
}

It gets the job done but is single threaded and not concurrent. Most of the time
the program will be blocked, waiting for IO to complete. Since it is single
threaded, parallelism cannot be exploited. Wouldn't it be nice if one image was
being read, while another was being processed, while another was being saved?

Enter Plumbing++, which automatically creates threads for each processing step,
and joins each with a concurrent FIFO:

std::vector paths{"tree.jpg", "mountain.jpg", "car.jpg"};

(Plumbing::makeSource(paths) >> loadImage >> processImage >> saveImage).wait();

Done! Each connection create a concurrent FIFO, called a Pipe. The read side of
this Pipe is returned in the form of an iterable object:

Plumbing::PipeSource> imageSource = (paths >> loadImage);

These can be freely iterated over, or used in STL algorithms:

std::vector> imageCopies;
std::copy(imageSource.begin(), imageSource.end(), std::back_inserter(imageCopies));

for (auto&& elem : anotherSource)
{
std::cout << elem << std::endl;
}

Each PipeSource (being iterable) thus becomes the input to the next
connection. If the last function in the pipeline returns void, the connect call
returns a future, so the caller can wait for the whole computation to complete:

std::future future = (source >> loadImage >> processImage >> saveImage);
future.wait();

Any exceptions thrown from processing functions are caught in their threads and
propagated through the pipeline, cleanly closing all threads, and leaving the
culprit exception in the future:

std::future future = (source >> loadImage >> processImage >> saveImage);
try {
future.get();
}
catch (std::runtime_exception& e)
{
// deal with exception
}

For those opposed to operator overloading, a variadic function "connect" that
does the same thing:

Plumbing::connect(Plumbing::makeSource(paths),
loadImage, processImage, saveImage).wait();

Any single input/single output callable object is supported, including
ordinary functions, lambdas, and any function object.

If one of your functions is required to be assymetric, such as combining
several input values into one output, this can be done using iterators. As an
example, to merge several images into a single HDR, you might have a function
object:

// As an example, creates one image for every three,
// effectively merging them
struct mergeHDR
{
template
void operator() (InputIterator first,
InputIterator last,
OutputIterator dest);
}

Such a function object can now be used in a pipeline:

std::futute fut =
Plumbing::makeSource(paths)
>> loadImage
>> Plumbing::makeIteratorFilter, Image>(mergeHDR())
>> saveImage;

fut.get();

Unfortunately, since working with iterators requires function templates,
input/return types cannot be deduced automatically as with other examples, and
have to be provided manually as template arguments to makeIteratorFilter.

## Progress/Flaws

The above features are currently implemented, but I am seeking constructive
criticism on the API and implementation. There is a reddit discussion thread
[here](http://www.reddit.com/r/programming/comments/14hgne/plumbing_small_library_that_automatically_makes/)
with lots of ideas and suggestions. Everything after this point is sort of a
"status report meets stream of conciousness" ramble.

Currently, several improvements can be made to the implementation regarding move
semantics, to save expensive copies. It should be possible to move objects all
the way through the pipeline.

### Operator overloading

A concern with the API is the use of operator overloading. The operator >>
template function may accept too broad a range of inputs, and also leaks into
the global name space. Not cool.

As mentioned above, a variadic template function "connect" exists that avoids the problem of
operator overloading. As suggested in the reddit thread, the influence of operator >>
can be narrowed by only allowing it to act on a special object, requiring the user to create
that object to jump-start the process, e.g.:

( Plumbing::MakeSink(vals) >> getFirstChar >> printLine ).wait();

This means operator >> would essentially only act on these Sinks:

template
// return type elided to spare you of ugly template voodoo
operator >> (Sink& sink, Func func);

This is nice, however this Sink type now has to wrap all types of iterable
objects (not just Pipes as it does currently). This can be done through type
erasure, but would incur an extra virtual method call for every dereference of
the underlying iterable (be it a Pipe<T>, or std::vector<int>).

The alternative is to have two distinct types:
* A Sink<T> that encapsulates the end of Pipe<T>
* An Input<Container<T>> that is a thin wrapper around an iterable
object in order to restrict when operator >> gets triggered.

This is less elegant requiring more types and more functions definitions, but
will be more efficient, incurring no calls to virtual methods, while restricting
the use of operator >>

*UPDATE* I have decided to kill two birds with one stone. The new Source class
encapsulates any iterable, including a Pipe- however, it is polymorphic at
compile time, so has zero overhead. A PipeSource is just a type alias for a
Source over Pipes.

## Future Work/Ideas

This section covers some "TODOs", and things to look into.

### Connecting non-trivial functions

Single input, single output functions are a small subset of the functions one
would want to pipeline, so here are a few ideas on how to incorporate those into
the library as well.

Some functions could take several arguments of different types, and return
something else:

/**
* Takes a background, and a foreground with transparency and overlays the
* two to make a new image
*/
Image overlay(Image& background, Image& foregroundWithMask);

Plumbing::combine(backgroundSink, foregroundSink) >> overlay;

### Other ways of connecting

User plhk in this [this comment on
reddit](http://www.reddit.com/r/programming/comments/14hgne/plumbing_small_library_that_automatically_makes/c7dp7x9)
suggests using the (>>=) operator to assemble the pipeline. This is very
fitting as it is the bind operator in Haskell (which this library is inspired
from), effectively porting Haskell syntax directly to C++.

However since this is C++, the (>>=) operator is right associative, so at
present you would have to write:

(((paths >>= loadImage) >>= processImage) >>= saveImage).wait();

Of course, this can be "remedied" by changing the semantics of the operator so
it aggregates functions from right to left until you finally bind to an
iterable, at which point it will trigger execution. This is useful in
"pre assembling" pipelines and then reusing/combining them later:

pipeline = (loadImage >>= processImage >>= saveImage);
(paths >>= pipeline);
(otherPaths >>= pipeline);

This could also allow aggregation of the execution graph of the whole pipeline,
that can be used for task scheduling, if/when work-stealing is implemented.