{"id":15047538,"url":"https://github.com/haptork/easylambda","last_synced_at":"2025-04-06T01:09:10.058Z","repository":{"id":176798042,"uuid":"53853200","full_name":"haptork/easyLambda","owner":"haptork","description":"distributed dataflows with functional list operations for data processing with C++14","archived":false,"fork":false,"pushed_at":"2019-11-16T09:37:25.000Z","size":2052,"stargazers_count":497,"open_issues_count":2,"forks_count":43,"subscribers_count":36,"default_branch":"master","last_synced_at":"2024-10-19T20:25:03.510Z","etag":null,"topics":["cpp14","dataflow-programming","distributed-computing","functional-programming","hpc","mpi","parallel"],"latest_commit_sha":null,"homepage":"https://haptork.github.io/easyLambda/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/haptork.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":"contributing.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-03-14T12:08:04.000Z","updated_at":"2024-10-19T18:30:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"bea9e448-838b-4ae1-9ac5-8159e9a9d5f0","html_url":"https://github.com/haptork/easyLambda","commit_stats":null,"previous_names":["haptork/easylambda"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haptork%2FeasyLambda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haptork%2FeasyLambda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haptork%2FeasyLambda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haptork%2FeasyLambda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/haptork","download_url":"https://codeload.github.com/haptork/easyLambda/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247419860,"owners_count":20936012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp14","dataflow-programming","distributed-computing","functional-programming","hpc","mpi","parallel"],"created_at":"2024-09-24T20:59:54.191Z","updated_at":"2025-04-06T01:09:10.040Z","avatar_url":"https://github.com/haptork.png","language":"C++","readme":"[![Build Status](https://travis-ci.org/haptork/easyLambda.svg?branch=master)](https://travis-ci.org/haptork/easyLambda)\n[![codecov](https://codecov.io/gh/haptork/easyLambda/branch/master/graph/badge.svg)](https://codecov.io/gh/haptork/easyLambda)\n\n# ezl: easyLambda\n\u003e Parallel data processing made easy using functional and dataflow programming with modern C++\n\nWelcome to easyLambda and thanks for your interest. The site aims to be a\ncomprehensive guide for easyLambda.\n\n## What is easyLambda\n\nEasyLambda is *header only C++14* library for data processing in parallel with\n*functional list operations* (map, filter, reduce, scan, zip) that are tied\ntogether in *type--safe dataflow*. \n\nEasyLambda is parallel, it scales from multiple cores to *hundreds of distributed*\nnodes *without any need to deal with parallelism* in user code.\n\nEasyLambda is fast. It has *minimal overhead in serial execution* and builds upon\nhigh performance MPI parallelism that is known to be *more efficient than any\nother comparable work*\n[[1]](http://www.sciencedirect.com/science/article/pii/S1877050915017895).\n\nEasyLambda is *expressive and succinct*, thanks to the *column selection* for\ncomposition of functions and many *generic algorithms* such as *configurable\nparallel file reader*, predicates, correlation, summary etc.\n\nEasyLambda is intuitive and easy to understand with its *uniform property based*\n(or [ExpressionBuilder](http://martinfowler.com/bliki/ExpressionBuilder.html))\ninterface for everything from configuring parallelism to changing behavior of\ngeneric algorithms to routing dataflow.\n\nEasyLambda is easily *interoperable* with other libraries like standard library or\nraw MPI code, since it uses *standard data types* and enforces no special\nstructure, data-types or requirements on the user functions.\n\n## Why easyLambda\n\nEasyLambda is a good fit for the following tasks:\n+ table/list processing and analysis from CSV or flat text files.\n+ post-processing of scientific simulation results.\n+ running iterative machine learning algorithms.\n+ parallel type-safe data reading.\n+ to play with dataflow programming and functional list operations.\n\nSince, it can smoothly interoperate with other libraries, it is possible to\nadd distributed parallelism using easyLambda to the existing libraries or\ncodebase when its programming abstraction fits well e.g. it can be used along\nwith bare MPI code or with a machine learning library to add distributed training\nand testing.\n\nEasyLambda will also interest you if you \n+ are a modern C++ enthusiast\n+ want to dabble with metaprogramming\n+ like functional and dataflow programming\n+ have cluster resources that you want to put to use in everyday tasks without much effort.\n+ have always wanted a high-level MPI interface.\n\n#### Benchmarks\n\nEasyLambda combines the efficiency of MPI with a high level programming\nabstraction. With easyLambda you get easy to understand code with good\nrun-time performance. Check out the benchmarks and comparisons for performance\nand ease of use.\n\n[![benchmarks](doc/benchmarks.png)](https://haptork.github.io/easyLambda/docs/benchmarks/)\n\n\n## Getting Started\n\nCheck out the [Getting Started](https://haptork.github.io/easyLambda/docs/quick-start-guide/)\nsection of the library webpage to know how to install and begin with easyLambda. The library\ncan also be used on aws elastic cloud or single instance. \n\n\n# Examples\n\nA detailed walkthrough of the library is given [here](https://haptork.github.io/easyLambda/docs/hello-world/),\nThe [examples directory](examples) contains various examples and demonstrations with explanations\nof features and options.\n\nHere we mention some examples in short.\n\n\n## [Example wordcount](examples/wordcount.cpp)\n\nThe following program calculates frequency of each word in the data files.\n\n```cpp\nauto reader = fromFile\u003cstring\u003e(argv[1]).rowSeparator('s').colSeparator(\"\");\nezl::rise(reader)\n  .reduce\u003c1\u003e(ezl::count(), 0).dump()\n  .run();\n```\n\nThe dataflow pipeline starts with `rise` and subsequent operations are added to it.\nIn the above example, the pipeline begins by reading in data from the specified \nfile(s). `fromFile` is a library function that takes column types and the specified \nfile(s) glob pattern as input and reads the file(s) in parallel. It has a lot of\nproperties for controlling data-format, parallelism, denormalization etc\n(shown in [demoFromFile](examples/demoFromFile.cpp)).\n\nIn `reduce` we pass the index of the key column to group by, the library function\nfor counting and initial value of the result.\n\n\n\n## [Example pi (Monte-Carlo)](examples/pi.cpp)\n\nFollowing is a dataflow for calculating pi using Monte-Carlo method.\n\n```cpp\nezl::rise(ezl::kick(10000)) // 10000 trials shared over all processes\n  .map([] { \n    return pow(rnd(), 2) + pow(rnd(), 2);\n  })\n  .filter(ezl::lt(1.))\n  .reduce(ezl::count(), 0)\n  .map([](int inCircleCount) { \n    return (4.0 * inCircleCount / 10000); \n  }).dump()\n  .run();\n```\n\nThe dataflow starts with rise in which we pass a library function to call the\nnext unit a number of times. The steps in the algorithm have been expressed\nwith the composition of small operations, some are common library functions\nlike `count()`, `lt()` (less-than) and some are user-defined functions specific\nto the problem.\n\n\n\n## [Example CSV stats](examples/cods2016.cpp)\n\nHere is another example from\n[cods2016](http://ikdd.acm.org/Site/CoDS2016/datachallenge.html). A stripped\nversion of the input data-file is given with ezl\n[here](data/datachallenge_cods2016/train.csv). The data contains student\nprofiles with scores, gender, job-salary, city etc.\n\n```cpp\nauto scores = ezl::fromFile\u003cchar, array\u003cfloat, 3\u003e\u003e(fileName)\n                .cols({\"Gender\", \"English\", \"Logical\", \"Domain\"})\n                .colSeparator(\"\\t\");\n\nezl::rise(scores)\n  .filter\u003c2\u003e(ezl::gtAr\u003c3\u003e(0.F))   // filter valid domain scores \u003e 0\n  .map\u003c1\u003e([] (char gender) {      // transforming with 0/1 for isMale\n    return float(gender == 'm');\n  }).colsTransform()\n  .reduceAll(ezl::corr\u003c1\u003e())\n    .dump(\"\", \"Corr. of gender with scores\\n(gender|E|L|D)\")\n  .run();\n```\n\nThe above example prints the correlation of English, logical and domain scores\nwith respect to gender. We can find similarity of the above code with steps in\na spreadsheet analysis or with SQL query. We select the columns to work with\nviz. gender and three scores. We filter the rows based on a column and predicate.\nNext, we transform a selected column in-place and then find an aggregate property\n(correlation) for all the rows.\n\n----\n\n## Contributing\n\nSuggestions and feedback are welcome. Feel free to contact via mail or issues\nfor any query.\n\nSome of the possible directions of improvement:\n\n+ compile time optimization\n+ use of specialized data structures in various units like reduce etc.\n+ addition of more examples e.g. neural nets, simulations etc.\n+ design simplifications\n+ parallelism optimization\n+ code reviews\n+ documentation\n\nPossible ideas for future extenstions:\n\n+ fault tolerance\n+ algorithms / functions to plot streaming and buffered data\n+ domain specific algorithms \n+ MPI single-sided communications\n+ Experiments to extend current programming abstraction to cover more problems like domain-decomposition etc. \n\nCheck [internals](https://haptork.github.io/easyLambda/docs/internals) and\n[blog](https://haptork.github.io/easyLambda/posts/) for design and\nimplementation details.\n\n\n## Acknowledgments\n\nA big thanks to cppcon, meetingc++ and other conferences and all C++ expert\nspeakers, committee members and compiler implementers for modernising C++ and\nteaching it with so much enthusiasm. I had fun implementing this, hoping you\nwill have fun using it. Looking forward to learn more from the community.\n\nI wish to thank [eicossa](https://github.com/eicossa) and Nitesh for their\n(less online, more offline :P) contributions.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaptork%2Feasylambda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhaptork%2Feasylambda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaptork%2Feasylambda/lists"}