{"id":18825458,"url":"https://github.com/grycap/marla","last_synced_at":"2025-04-14T01:31:27.730Z","repository":{"id":52286847,"uuid":"91562548","full_name":"grycap/marla","owner":"grycap","description":"MApReduce on AWS LAmbda","archived":false,"fork":false,"pushed_at":"2021-05-01T17:10:31.000Z","size":17813,"stargazers_count":3,"open_issues_count":1,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-10T11:03:56.478Z","etag":null,"topics":["aws-lambda","lambda","mapreduce","python","serverless"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grycap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-05-17T10:07:50.000Z","updated_at":"2024-10-07T22:02:48.000Z","dependencies_parsed_at":"2022-09-07T04:41:51.782Z","dependency_job_id":null,"html_url":"https://github.com/grycap/marla","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grycap%2Fmarla","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grycap%2Fmarla/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grycap%2Fmarla/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grycap%2Fmarla/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grycap","download_url":"https://codeload.github.com/grycap/marla/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248807571,"owners_count":21164710,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-lambda","lambda","mapreduce","python","serverless"],"created_at":"2024-11-08T00:59:35.766Z","updated_at":"2025-04-14T01:31:26.931Z","avatar_url":"https://github.com/grycap.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MARLA - MApReduce on AWS Lambda\n\nMARLA is a tool to create and configure a serverless MapReduce processor on AWS by means of a set of Lambda functions created on AWS Lambda. Files are uploaded to Amazon S3 and this triggers the execution of the functions using the user-supplied Mapper and Reduce functions.\n\n# Architecture\n![Alt text](docs/images/marla-arch.png?raw=true \"Architecture\")\n\n# Installation\n\nMARLA requires:\n\n* An AWS account\n* AWS CLI (version 1.11.76+), used to create the Lambda functions and S3 buckets\n* An IAM Role on AWS with permissions to create, delete and list keys on the used S3 buckets and permissions to invoke Lambda functions. See an example of such an IAM role in the [examples/iam-role.json](examples/iam-role.json) file.\n\nThe code of the Lambda functions and user-defined Mapper and Reduce functions is written in Python. \n\nMARLA can be retrieved by issuing this command:\n\n  `git clone https://github.com/grycap/marla`\n\n# Usage\n\nFirst you need to create your own Mapper and Reduce functions in the same file (as shown in the  [example/example_functions.py](example/example_functions.py) file). \n\nThis functions must satisfy some constraints, explained below.\n\n## Mapper Function\n\nThe mapper function must adhere to the following signature:\n\n  `def mapper(chunk):`\n  \nwhere `chunk` is the raw text from the input file to be mapped..\n\nAfter executing the mapper function returns the name-value pairs respectively. That is, a list of 2D tuples with the pairs name-value (`Pair[i][0]` correspond to the name of the element `i`, `Pairs[i][1]` correspond to the value of the element `i`) extracted in the mapper function.\n \n \n## Reducer Function\n\nThe reducer function must adhere to the following signature:\n  \n  `def reducer(Pairs):`\n  \nwhere `Pairs` is a list of 2D tuples with the pairs name-value (in the same format of the mapper function) extracted in the mapper function. `Pairs` is sorted alphabetically by names. \n \n After executing the reduce function returns a list of name-value pairs (`Results[i][0]` correspond to the name of the element `i`, `Results[i][1]` correspond to the value of the element `i`).\n \n \n## Configuration\n \n In addition to the aforementioned functions, the user must specify some parameters in a configuration file. This configuration file must follow the structure of the provided example [examples/config.in](examples/config.in). The order of the keys is not important and its meaning is explained here: \n \n  * ClusterName: An identified for this \"Lambda cluster\".\n  \n  * FunctionsDir: The directory containing the file that defines the Mapper and Reduce functions.\n  \n  * FunctionsFile: The name of the file with the Mapper and Reduce functions.\n  \n  * Region: The AWS region where the AWS Lambda functions will be created.\n  \n  * BucketIn: The bucket for input files. It must exist.\n  \n  * BucketOut: The bucket for output files. We strongly recommend using different buckets for input and output to avoid unwanted recursions.\n  \n  * RoleARN: The ARN of the role under which the Lambda functions will be executed.\n  \n  * MapperNodes: The desired number of concurrent mapper functions.\n  \n  * MinBlockSize: The minimum size, in KB, of text that  every mapper will process.\n  \n  * MaxBlockSize: Maximum size, in KB, of text that  every mapper will process.\n   \n  * KMSKeyARN: The ARN of KMS key used to encript environment variables. (Optional)\n  \n  * MapperMemory: The memory of the mapper Lambda functions. The maximum text size to process by every Mapper will be restricted by this amount of memory.\n  \n  * ReducerMemory: The memory of the reduce Lambda functions.\n  \n  * TimeOut: The elapsed time for a Lambda function to run before terminating it.\n  \n  * ReducersNumber: Number of reducers to use\n \n \n## Creating and Processing the Data\n \n Once fulfilled the previous steps, assumming that you modified the `config.in` file in the `example` directory, issue:\n\n `$ sh marla_create.sh example/config.in`\n \n where `config.in` is the path to the configuration file. \n \n The script will create and configure the Lambda functions and add permissions to the S3 buckets. If the script finishes successfully, you will find a folder with the cluster name in the bucket specified in configuration file, such as this one: `BucketIn/ClusterName`\n \nEvery file you upload in this folder will be processed via MapReduce. The output of the MapReduce process will be stored in the `BucketOut` S3 bucket in the following path: `BucketOut/ClusterName/NameFile/results`\n\nwhere `NameFile` is the name of the uploaded input file without the extension (for example .txt) and \"results\" is the file with the MapReduce results.\n\n## Deleting\n\nTo remove a \"Lambda cluster\", use the script \"marla_remove.sh\" with the name of \"cluster\"\n\n`$ sh marla_remove.sh ClusterName`\n\nThis will remove all the created Lambda functions, but not the files in S3.\n\n## Acknowledgement \nPlease acknowledge the use of MARLA by citing the following publication:\n```\nGiménez-Alventosa, V., Moltó, G., Caballer, M., 2019. A framework and a performance assessment for serverless MapReduce on AWS Lambda. Futur. Gener. Comput. Syst. 97, 259–274. https://doi.org/10.1016/j.future.2019.02.057\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrycap%2Fmarla","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrycap%2Fmarla","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrycap%2Fmarla/lists"}