{"id":25053918,"url":"https://github.com/benman1/pipes","last_synced_at":"2025-03-31T07:25:00.475Z","repository":{"id":90551561,"uuid":"181351179","full_name":"benman1/pipes","owner":"benman1","description":"minimal workflow engine for data processing (POC)","archived":false,"fork":false,"pushed_at":"2019-04-26T21:41:13.000Z","size":34,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-06T11:59:14.384Z","etag":null,"topics":["c-plus-plus","cpp11","dataflow","feature-engineering","flow-based-programming","header-only","machine-learning","stream-processing","transformers"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/benman1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-14T18:23:57.000Z","updated_at":"2019-04-26T21:41:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"a0e98366-3c73-487f-849d-3b847d759516","html_url":"https://github.com/benman1/pipes","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benman1%2Fpipes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benman1%2Fpipes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benman1%2Fpipes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/benman1%2Fpipes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/benman1","download_url":"https://codeload.github.com/benman1/pipes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246430774,"owners_count":20776088,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","cpp11","dataflow","feature-engineering","flow-based-programming","header-only","machine-learning","stream-processing","transformers"],"created_at":"2025-02-06T11:55:56.445Z","updated_at":"2025-03-31T07:25:00.467Z","avatar_url":"https://github.com/benman1.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pipelining\nCreate a dynamic workflow pipeline. Pipeline steps can be loaded as libraries (e.g. .so files).\n\nThe pipeline can be defined in terms of transformations that get a vector and pass a vector (by reference, obviously). In the pipeline.cpp file, a file, pipeline.lst, is read that contains a processing step in each line; shared library file name and a name for the transformation are separated by space (pipeline.lst):\n```bash\nstep1.so step1\n```\n\nThese steps are then applied one by one. I might add a more complex architecture later.\n\nPipeline steps, transformers, are classes; they can have a state including parameters that adapt based on data. I might add a more complete example for this later.\n\n## Dependencies\n* dlopen library\n\n## Walkthrough\nFor the dynamic loading to work, make sure you expose factory functions that return your transformer class using C linkage (using \"extern).\n\nFor example, step1.cpp:\n```cpp\n#include \"pipeline.hpp\"\n\nclass multiplier : public transformer {\n   public:\n    DataPoint\u003c\u003e* transform(DataPoint\u003c\u003e* input) {\n        for (unsigned i = 0; i \u003c input-\u003ex.size(); i++) {\n            input-\u003ex[i] *= 5.0;\n        }\n        return input;\n    }\n    multiplier() {\n        name = \"Multiplier\";\n    }\n};\n\nextern \"C\" transformer* transformer_factory() { return new multiplier; }\n```\n\nCompile this into a shared library like so:\n```bash\ng++ -fPIC transformers/step1.cpp -shared -o step1.so -std=c++1z\n```\n\nIn order to run the pipeline, provide a configuration, define a vector, and execute (run_pipeline.cpp):\n```cpp\n#include \"pipeline.hpp\"\n\nint main() {\n    // initialize vector with some elements\n    Vector\u003cdouble\u003e my_vector(10, 0.0);\n    Vector\u003cdouble\u003e targets(5, 1.0);\n    DataPoint\u003cdouble\u003e* row = new DataPoint\u003cdouble\u003e(my_vector, targets);\n    for (unsigned i = 0; i \u003c 10; i++) {\n        my_vector.at(i) = 0.1 * i;\n    }\n\n    Pipeline* pipeline = new Pipeline(\"pipeline.lst\");\n    for(unsigned epoch=0; epoch\u003c100; epoch++)\n        pipeline-\u003eexe(row);\n    return 0;\n}\n```\n\nCompile run_pipeline.cpp as follows:\n```bash\ng++ run_pipeline.cpp -o run_pipeline -std=c++1z\n```\n\nExecuting you initialize a vector and then apply the step(s):\n```\n./run_pipeline\nLoading transformer class from step1.so\n0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9\n\n0 0.5 1 1.5 2 2.5 3 3.5 4 4.5\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenman1%2Fpipes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenman1%2Fpipes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenman1%2Fpipes/lists"}