{"id":27042265,"url":"https://github.com/d2si-oss/ooso","last_synced_at":"2025-10-13T19:20:03.172Z","repository":{"id":68923876,"uuid":"90984229","full_name":"d2si-oss/ooso","owner":"d2si-oss","description":"Java library for running Serverless MapReduce jobs","archived":false,"fork":false,"pushed_at":"2017-08-10T11:18:59.000Z","size":86946,"stargazers_count":24,"open_issues_count":0,"forks_count":2,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-07-22T10:33:27.893Z","etag":null,"topics":["aws","java","lambda","library","mapreduce","serverless"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/d2si-oss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2017-05-11T13:47:02.000Z","updated_at":"2025-01-25T08:04:07.000Z","dependencies_parsed_at":"2023-02-21T14:31:28.825Z","dependency_job_id":null,"html_url":"https://github.com/d2si-oss/ooso","commit_stats":null,"previous_names":["d2si-oss/demo-aws-lambda-mapreduce"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/d2si-oss/ooso","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d2si-oss%2Fooso","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d2si-oss%2Fooso/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d2si-oss%2Fooso/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d2si-oss%2Fooso/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/d2si-oss","download_url":"https://codeload.github.com/d2si-oss/ooso/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d2si-oss%2Fooso/sbom","scorecard":{"id":315647,"data":{"date":"2025-08-11","repo":{"name":"github.com/d2si-oss/ooso","commit":"cd7b300da4a1021fddbcc69f670bbe9c7f41c0c8"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":1.7,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/9 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: ISC License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v0.0.4 not signed: https://api.github.com/repos/d2si-oss/ooso/releases/6889478","Warn: release artifact v0.0.3 not signed: https://api.github.com/repos/d2si-oss/ooso/releases/6660377","Warn: release artifact v0.0.2 not signed: https://api.github.com/repos/d2si-oss/ooso/releases/6616801","Warn: release artifact 0.0.1 not signed: https://api.github.com/repos/d2si-oss/ooso/releases/6582314","Warn: release artifact v0.0.4 does not have provenance: https://api.github.com/repos/d2si-oss/ooso/releases/6889478","Warn: release artifact v0.0.3 does not have provenance: https://api.github.com/repos/d2si-oss/ooso/releases/6660377","Warn: release artifact v0.0.2 does not have provenance: https://api.github.com/repos/d2si-oss/ooso/releases/6616801","Warn: release artifact 0.0.1 does not have provenance: https://api.github.com/repos/d2si-oss/ooso/releases/6582314"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 27 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Vulnerabilities","score":0,"reason":"58 existing vulnerabilities detected","details":["Warn: Project is vulnerable to: GHSA-c28r-hw5m-5gv3","Warn: Project is vulnerable to: GHSA-h46c-h94j-95f3","Warn: Project is vulnerable to: GHSA-wf8f-6423-gfxg","Warn: Project is vulnerable to: GHSA-288c-cq4h-88gq","Warn: Project is vulnerable to: GHSA-4gq5-ch57-c2mg","Warn: Project is vulnerable to: GHSA-4w82-r329-3q67","Warn: Project is vulnerable to: GHSA-57j2-w4cx-62h2","Warn: Project is vulnerable to: GHSA-5949-rw7g-wx7w","Warn: Project is vulnerable to: GHSA-5r5r-6hpj-8gg9","Warn: Project is vulnerable to: GHSA-5ww9-j83m-q7qx","Warn: Project is vulnerable to: GHSA-645p-88qh-w398","Warn: Project is vulnerable to: GHSA-6fpp-rgj9-8rwc","Warn: Project is vulnerable to: GHSA-85cw-hj65-qqv9","Warn: Project is vulnerable to: GHSA-89qr-369f-5m5x","Warn: Project is vulnerable to: GHSA-8c4j-34r4-xr8g","Warn: Project is vulnerable to: GHSA-8w26-6f25-cm9x","Warn: Project is vulnerable to: GHSA-9gph-22xh-8x98","Warn: Project is vulnerable to: GHSA-9m6f-7xcq-8vf8","Warn: Project is vulnerable to: GHSA-c8hm-7hpq-7jhg","Warn: Project is vulnerable to: GHSA-cf6r-3wgc-h863","Warn: Project is vulnerable to: GHSA-cggj-fvv3-cqwv","Warn: Project is vulnerable to: GHSA-cjjf-94ff-43w7","Warn: Project is vulnerable to: GHSA-cmfg-87vq-g5g4","Warn: Project is vulnerable to: GHSA-cvm9-fjm9-3572","Warn: Project is vulnerable to: GHSA-f3j5-rmmp-3fc5","Warn: Project is vulnerable to: GHSA-f9xh-2qgp-cq57","Warn: Project is vulnerable to: GHSA-fmmc-742q-jg75","Warn: Project is vulnerable to: GHSA-fqwf-pjwf-7vqv","Warn: Project is vulnerable to: GHSA-gjmw-vf9h-g25v","Warn: Project is vulnerable to: GHSA-gwp4-hfv6-p7hw","Warn: Project is vulnerable to: GHSA-gww7-p5w4-wrfv","Warn: Project is vulnerable to: GHSA-h3cw-g4mq-c5x2","Warn: Project is vulnerable to: GHSA-h592-38cm-4ggp","Warn: Project is vulnerable to: GHSA-h822-r4r5-v8jg","Warn: Project is vulnerable to: GHSA-jjjh-jjxp-wpff","Warn: Project is vulnerable to: GHSA-m6x4-97wx-4q27","Warn: Project is vulnerable to: GHSA-mph4-vhrx-mv67","Warn: Project is vulnerable to: GHSA-mx7p-6679-8g3q","Warn: Project is vulnerable to: GHSA-p43x-xfjf-5jhr","Warn: Project is vulnerable to: GHSA-q93h-jc49-78gg","Warn: Project is vulnerable to: GHSA-qjw2-hr98-qgfh","Warn: Project is vulnerable to: GHSA-qr7j-h6gg-jmgc","Warn: Project is vulnerable to: GHSA-qxxx-2pp7-5hmx","Warn: Project is vulnerable to: GHSA-r3gr-cxrf-hg25","Warn: Project is vulnerable to: GHSA-r695-7vr9-jgc2","Warn: Project is vulnerable to: GHSA-rfx6-vp9g-rh7v","Warn: Project is vulnerable to: GHSA-rgv9-q543-rqg4","Warn: Project is vulnerable to: GHSA-rpr3-cw39-3pxh","Warn: Project is vulnerable to: GHSA-v585-23hc-c647","Warn: Project is vulnerable to: GHSA-vfqx-33qm-g869","Warn: Project is vulnerable to: GHSA-w3f4-3q6j-rh82","Warn: Project is vulnerable to: GHSA-wh8g-3j2c-rqj5","Warn: Project is vulnerable to: GHSA-4jrv-ppp4-jm57","Warn: Project is vulnerable to: GHSA-5mg8-w23w-74h3","Warn: Project is vulnerable to: GHSA-7g45-4rm6-3mm3","Warn: Project is vulnerable to: GHSA-mvr2-9pj6-7w5j","Warn: Project is vulnerable to: GHSA-7r82-7xv7-xcpj","Warn: Project is vulnerable to: GHSA-264p-99wq-f4j6"],"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}}]},"last_synced_at":"2025-08-18T00:07:51.199Z","repository_id":68923876,"created_at":"2025-08-18T00:07:51.199Z","updated_at":"2025-08-18T00:07:51.199Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279016897,"owners_count":26085885,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","java","lambda","library","mapreduce","serverless"],"created_at":"2025-04-05T04:22:31.770Z","updated_at":"2025-10-13T19:20:03.125Z","avatar_url":"https://github.com/d2si-oss.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/library-logo.png\" width=\"35%\"/\u003e\n\u003c/p\u003e\n\n[![Build Status](https://travis-ci.org/d2si-oss/ooso.svg?branch=master)](https://travis-ci.org/d2si-oss/ooso)\n[![Maven Central](https://maven-badges.herokuapp.com/maven-central/fr.d2-si/ooso/badge.svg)](https://search.maven.org/#artifactdetails%7Cfr.d2-si%7Cooso%7C0.0.3%7Cjar)\n\nOoso lets you run MapReduce jobs in a serverless way.\nIt is based on managed cloud services, [Amazon S3](https://aws.amazon.com/s3/) and [AWS Lambda](https://aws.amazon.com/lambda/) and is mainly an alternative to standard ad-hoc querying and batch processing tools such as [Hadoop](http://hadoop.apache.org/) and [Spark](http://spark.apache.org/).\n\n## Table of contents\n\n  * [I. Architecture and workflow](#i-architecture-and-workflow)\n  * [II. How to use the library](#ii-how-to-use-the-library)\n    * [1. Project structure](#1-project-structure)\n    * [2. Library dependency](#2-library-dependency)\n    * [3. Classes to implement](#3-classes-to-implement)\n    * [4. Configuration file](#4-configuration-file)\n    * [5. Project packaging](#5-project-packaging)\n  * [III. AWS Infrastructure](#iii-aws-infrastructure)\n    * [1. S3 Buckets](#1-s3-buckets)\n    * [2. IAM Roles and policies](#2-iam-roles-and-policies)\n    * [3. Lambda functions](#3-lambda-functions)\n    * [4. Deployment](#4-deployment)\n  * [IV. Running the job](#iv-running-the-job)\n\n___\n\n## I. Architecture and workflow\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/MyArchitecture.png\"/\u003e\n\u003c/p\u003e\n\nThe library workflow is as follows:\n\n\u003col type=\"a\"\u003e\n  \u003cli\u003eThe workflow begins by invoking the \u003ccode\u003eMappers Driver\u003c/code\u003e lambda function\u003c/li\u003e\n  \u003cli\u003eThe \u003ccode\u003eMappers Driver\u003c/code\u003e does two things:\n    \u003col type=\"i\"\u003e\n        \u003cli\u003eIt computes batches of data splits and assigns each batch to a \u003ccode\u003eMapper\u003c/code\u003e\u003c/li\u003e\n        \u003cli\u003eIt invokes a \u003ccode\u003eMappers Listener\u003c/code\u003e lambda function which is responsible of detecting the end of the map phase\u003c/li\u003e\n    \u003c/ol\u003e\n  \u003c/li\u003e\n  \u003cli\u003eOnce the \u003ccode\u003eMappers Listener\u003c/code\u003e detects the end of the map phase, it invokes a first instance of the \u003ccode\u003eReducers Driver\u003c/code\u003e function\u003c/li\u003e\n  \u003cli\u003eThe \u003ccode\u003eReducers Driver\u003c/code\u003e is somewhat similar to the \u003ccode\u003eMappers Driver\u003c/code\u003e:\n    \u003col type=\"i\"\u003e\n        \u003cli\u003eIt computes batches from either the \u003ccode\u003eMap Output Bucket\u003c/code\u003e if we are in the first step of the reduce phase, or from previous reducers outputs located in the \u003ccode\u003eReduce Output Bucket\u003c/code\u003e. It then assigns each batch to a \u003ccode\u003eReducer\u003c/code\u003e\u003c/li\u003e\n        \u003cli\u003eIt also invokes a \u003ccode\u003eReducers Listener\u003c/code\u003e for each step of the reduce phase.\u003c/li\u003e\n    \u003c/ol\u003e\n  \u003c/li\u003e\n\u003cli\u003eOnce the \u003ccode\u003eReducers Listener\u003c/code\u003e detects the end of a reduce step, it decides whether to invoke the next \u003ccode\u003eReducers Driver\u003c/code\u003e if the previous reduce step produced more than one file. Otherwise, there is no need to invoke a \u003ccode\u003eReducers Driver\u003c/code\u003e, because the previous step would have produced one single file which is the result of the job\u003c/li\u003e\n\u003c/ol\u003e\n\n___\n\n## II. How to use the library\n### 1. Project Structure\nThe easiest way is to clone the repository and use the provided [example project](./examples/ad-hoc-example-1) directory which has the following structure:\n\n```\n.\n├── package.sh\n├── provide_job_info.py\n├── pom.xml\n└── src\n    └── main\n        ├── java\n        │   ├── mapper\n        │   │   └── Mapper.java\n        │   └── reducer\n        │       └── Reducer.java\n        └── resources\n            └── jobInfo.json\n```\n\n### 2. Library dependency\nDeclare the library dependency in the `pom.xml` file\n\n```xml\n    \u003cdependencies\u003e\n    ...\n        \u003cdependency\u003e\n            \u003cgroupId\u003efr.d2-si\u003c/groupId\u003e\n            \u003cartifactId\u003eooso\u003c/artifactId\u003e\n            \u003cversion\u003e0.0.4\u003c/version\u003e\n        \u003c/dependency\u003e\n    ...\n    \u003c/dependencies\u003e\n```\n\n### 3. Classes to implement\nImplement your `Mapper`, `Reducer` and `Launcher`:\n- The class [Mapper](examples/ad-hoc-example-1/src/main/java/mapper/Mapper.java) is the implementation of your mappers. It must extend the `fr.d2si.ooso.mapper.MapperAbstract` class which looks like the following:\n    ```java\n    public abstract class MapperAbstract {\n        public abstract String map(BufferedReader objectBufferedReader);\n    }\n    ```\n    The `map` method receives a `BufferedReader` as a parameter which is a reader of the batch part that the mapper lambda processes. The Reader closing is done internally for you.\n\n- The class [Reducer](examples/ad-hoc-example-1/src/main/java/reducer/Reducer.java) is the implementation of your reducers. It must extend the `fr.d2si.ooso.reducer.ReducerAbstract` class which looks like the following:\n    ```java\n    public abstract class ReducerAbstract {\n        public abstract String reduce(List\u003cObjectInfoSimple\u003e batch);\n    }\n    ```\n    The `reduce` method receives a list of `ObjectInfoSimple` instances, which encapsulate information about the objects to be reduced.\n    In order to get a reader from an  `ObjectInfoSimple` instance, you can do something like this:\n    ```java\n    public String reduce(List\u003cObjectInfoSimple\u003e batch) {\n\n        for (ObjectInfoSimple objectInfo : batch) {\n\n            BufferedReader objectBufferedReader = Commons.getReaderFromObjectInfo(objectInfo);\n\n            //do something with the reader then close it\n            objectBufferedReader.close();\n        }\n    }\n    ```\n    **For the reducer, you are responsible of closing the opened readers.**\n- The [Launcher](examples/ad-hoc-example-1/src/main/java/job/jobLauncher.java) is responsible of starting your job.\n    Under the hood, it serializes your `Mapper` and `Reducer` and sends them to your `Mappers Driver` which then propagates them to the rest of the lambdas.\n\n    All you need to do is to create a class with a main method and instantiate a `Launcher` that points to your `Mapper` and `Reducer`. Your class should look like this:\n    ```java\n    public class JobLauncher {\n        public static void main(String[] args) {\n            //setup your launcher\n            Launcher myLauncher = new Launcher()\n                                            .withMapper(new Mapper())\n                                            .withReducer(new Reducer());\n            //launch your job\n            myLauncher.launchJob();\n        }\n    }\n    ```\n    In order to make our jar package executable, you need to set the main class that serves as the application entry point.\n\n    If you are using the maven shade plugin, you can do so as follows:\n    ```xml\n    \u003cbuild\u003e\n        ...\n            \u003cplugins\u003e\n                \u003cplugin\u003e\n                    ...\n                    \u003cconfiguration\u003e\n                        \u003ctransformers\u003e\n                            \u003ctransformer implementation=\"org.apache.maven.plugins.shade.resource.ManifestResourceTransformer\"\u003e\n                                \u003cmainClass\u003ejob.JobLauncher\u003c/mainClass\u003e\n                            \u003c/transformer\u003e\n                        \u003c/transformers\u003e\n                    \u003c/configuration\u003e\n                    ...\n                \u003c/plugin\u003e\n            \u003c/plugins\u003e\n        ...\n        \u003c/build\u003e\n    ```\n    Please take a look at one of our example [pom.xml](./examples/ad-hoc-example-1/pom.xml) files for further details on how to configure your maven project.\n\n### 4. Configuration file\nEdit the `jobInfo.json` file located at `src/main/resources` to reflect your [infrastructure](#iii-aws-infrastructure) details:\n```json\n{\n  \"jobId\": \"your-job-id\",\n  \"jobInputBucket\": \"input\",\n  \"mapperOutputBucket\": \"mapper-output\",\n  \"reducerOutputBucket\": \"reducer-output\",\n  \"mapperFunctionName\": \"mapper\",\n  \"mappersDriverFunctionName\": \"mappers_driver\",\n  \"mappersListenerFunctionName\": \"mappers_listener\",\n  \"reducerFunctionName\": \"reducer\",\n  \"reducersDriverFunctionName\": \"reducers_driver\",\n  \"reducersListenerFunctionName\": \"reducers_listener\",\n  \"mapperForceBatchSize\": \"-1\",\n  \"reducerForceBatchSize\": \"-1\",\n  \"mapperMemory\": \"1536\",\n  \"reducerMemory\": \"1536\",\n  \"disableReducer\": \"false\"\n}\n```\n\nBelow is the description of some attributes (the rest is self explanatory).\n\n| Attribute| Description|\n|-------------|-------------|\n|jobId|Used to identify a job and separate outputs in order to avoid overwriting data between jobs|\n|jobInputBucket|Contains the dataset splits that each `Mapper` will process|\n|mapperOutputBucket|The bucket where the mappers will put their results|\n|reducerOutputBucket|The bucket where the reducers will put their results|\n|reducerMemory and mapperMemory|The amount of memory(and therefore other resources) allocated to the lambda functions. They are used internally by the library to compute the batch size that each mapper/reducer will process.|\n|mapperForceBatchSize and reducerForceBatchSize|Used to force the library to use the specified batch size instead of automatically computing it. **`reducerForceBatchSize` must be greater or equal than 2**|\n|disableReducer|If set to \"true\", disables the reducer|\n\n### 5. Project packaging\nIn order to generate the [jar](https://en.wikipedia.org/wiki/JAR_(file_format)) file used during the [deployment](#4-deployment) of the lambda, you need to [install maven](https://maven.apache.org/install.html).\n\nThen, run `package.sh` script to create the project jar:\n```\n./package.sh\n```\n___\n\n## III. AWS Infrastructure\nBefore diving into the infrastructure details, please have a look at the [deployment](#4-deployment) section.\n### 1. S3 Buckets\nOur lambda functions use S3 Buckets to fetch needed files and put the result of their processing.\n\nYou need three buckets:\n\n   - An input bucket containing your data splits\n   - Two intermediary buckets used by the mappers and reducers\n\nYou must use the same bucket names used in the [configuration](#4-configuration-file) step above.\n\nYou may create the buckets using the [console](http://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html), the [command line](http://docs.aws.amazon.com/cli/latest/reference/s3api/create-bucket.html) or our [Terraform template](./example-project/terraform/lambda.tf).\n\n### 2. IAM Roles and policies\n\na. Create an IAM role with the following trust policy\n\n```json\n{\n    \"Version\": \"2012-10-17\",\n    \"Statement\": [\n        {\n            \"Effect\": \"Allow\",\n            \"Principal\": {\n                \"Service\": \"lambda.amazonaws.com\"\n            },\n            \"Action\": \"sts:AssumeRole\"\n        }\n    ]\n}\n```\n\nYou may create the IAM role using the [console](http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html#roles-creatingrole-service-console), the [command line](http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html#roles-creatingrole-service-cli) or our [Terraform template](./examples/ad-hoc-example-1/terraform/lambda.tf).\n\nb. Attach the following policies to your role\n\n- `arn:aws:iam::aws:policy/AWSLambdaFullAccess`\n- `arn:aws:iam::aws:policy/AmazonS3FullAccess`\n\nNote that these policies are too broad. You may use more fine-grained policies/roles for each lambda.\n\nYou may attach the policies using the [console](http://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-using.html), the [command line](http://docs.aws.amazon.com/cli/latest/reference/iam/attach-role-policy.html) or our [Terraform template](./examples/ad-hoc-example-1/terraform/lambda.tf).\n\n### 3. Lambda functions\nCreate the required lambdas with the following details:\n\n| Lambda Name   | Handler       |Memory|Function package|Runtime|\n|-------------|-------------|----|----|----|\n| mappers_driver| fr.d2si.ooso.mappers_driver.MappersDriver | 1536 |example-project/target/job.jar|java8|\n| mappers_listener| fr.d2si.ooso.mappers_listener.MappersListener | 1536|example-project/target/job.jar|java8|\n| mapper     | fr.d2si.ooso.mapper_wrapper.MapperWrapper | same value as in the [configuration file](#4-configuration-file)  |example-project/target/job.jar|java8|\n| reducers_driver| fr.d2si.ooso.reducers_driver.ReducersDriver | 1536|example-project/target/job.jar|java8|\n| reducers_listener| fr.d2si.ooso.reducers_listener.ReducersListener | 1536|example-project/target/job.jar|java8|\n| reducer     | fr.d2si.ooso.reducer_wrapper.ReducerWrapper | same value as in the [configuration file](#4-configuration-file)  |example-project/target/job.jar|java8|\n\nWe assume that the project jar is located at `example-project/target/job.jar`.\n\nYou may create the lambda functions using the [console](http://docs.aws.amazon.com/lambda/latest/dg/getting-started-create-function.html), the [command line](http://docs.aws.amazon.com/cli/latest/reference/lambda/create-function.html) or our [Terraform template](./examples/ad-hoc-example-1/terraform/lambda.tf).\n\n**Note that you'll only need to deploy the lambdas once. You will be able to run all your jobs even if your business code changes without redeploying the infrastructure.**\n\n### 4. Deployment\na. The easy way\n\n   We provide a fully functional Terraform template that creates everything for you, except the input bucket. This template uses the job configuration file.\n   Here is how to use it:\n\n- [install Terraform](https://www.terraform.io/intro/getting-started/install.html)\n- make sure your job configuration file is correct\n- run the following commands\n    ```\n     cd terraform\n     terraform plan\n     terraform apply\n    ```\n\nFor more info about Terraform, check [Terraform documentation](https://www.terraform.io/docs/).\n\nb. The less easy way\n\nYou may use any deployment method you are familiar with. The AWS console, the AWS cli, python scripts, ...\nHowever we recommend using an Infrastructure-As-Code (IAC) tool such as [Terraform](https://www.terraform.io/) or [CloudFormation](https://aws.amazon.com/cloudformation).\n___\n\n## IV. Running the job\nIn order to run your job, you may execute the main method of the same jar that you used during the deployment. You may either execute it from your IDE or using the command line as follows:\n```bash\n    java -jar job.jar\n```\n\n___\n\n \u003cdiv\u003eThe logo is made by \u003ca href=\"http://www.freepik.com\" title=\"Freepik\"\u003eFreepik\u003c/a\u003e from \u003ca href=\"http://www.flaticon.com\" title=\"Flaticon\"\u003ewww.flaticon.com\u003c/a\u003e and is licensed by \u003ca href=\"http://creativecommons.org/licenses/by/3.0/\" title=\"Creative Commons BY 3.0\" target=\"_blank\"\u003eCC 3.0 BY\u003c/a\u003e\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd2si-oss%2Fooso","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fd2si-oss%2Fooso","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd2si-oss%2Fooso/lists"}