{"id":16567837,"url":"https://github.com/sylvainhalle/mrsim","last_synced_at":"2025-10-29T00:31:38.852Z","repository":{"id":5634569,"uuid":"6842817","full_name":"sylvainhalle/MrSim","owner":"sylvainhalle","description":"A simple MapReduce framework in Java","archived":false,"fork":false,"pushed_at":"2022-02-17T10:37:46.000Z","size":1680,"stargazers_count":14,"open_issues_count":1,"forks_count":13,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-01T20:44:56.908Z","etag":null,"topics":["hadoop","java","mapreduce","tuples"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sylvainhalle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-11-24T18:21:42.000Z","updated_at":"2024-07-30T09:38:17.000Z","dependencies_parsed_at":"2022-08-24T20:51:02.182Z","dependency_job_id":null,"html_url":"https://github.com/sylvainhalle/MrSim","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylvainhalle%2FMrSim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylvainhalle%2FMrSim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylvainhalle%2FMrSim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylvainhalle%2FMrSim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sylvainhalle","download_url":"https://codeload.github.com/sylvainhalle/MrSim/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238751057,"owners_count":19524519,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hadoop","java","mapreduce","tuples"],"created_at":"2024-10-11T21:07:37.642Z","updated_at":"2025-10-29T00:31:37.957Z","avatar_url":"https://github.com/sylvainhalle.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"MrSim\n=====\n\nMrSim is a basic implementation of the map-reduce algorithm in Java.\n\n## What is MapReduce?\n\nMapReduce is a programming model for processing large data sets, and the\nname of an implementation of the model by Google. MapReduce is typically\nused to do distributed computing on clusters of computers. See\n[Wikipedia](http://en.wikipedia.org/wiki/MapReduce) for a detailed\ndescription of MapReduce.\n\nThere exist many implementations of the map-reduce model, the most\npopular probably being [Apache Hadoop](http://hadoop.apache.org/).\n\n## What is MrSim?\n\nMrSim is a simple implementation of map-reduce in Java, intended for a\n*pedagogical* illustration of the programming model. It originates from\nfrustrating experiences using other frameworks, which require a lengthy and\ncumbersome setup before running even the simplest example. In most cases\nthose examples are entangled with technical considerations (distributed file\nsystem, network configuration) that distract from learning the map-reduce\nprogramming model itself.\n\nMrSim aims at providing a simple framework to create and test map-reduce\njobs using using a minimal setup (actually no setup at all), using\nstraightforward implementations of all necessary concepts. This entails some\npurposeful limitations to the system:\n\n- It is not optimized in any way, and should not be used to run serious\n  map-reduce computations\n- It only offers sequential processing of the map-reduce tuples in a single\n  process\n\nIn counterpart, MrSim offers interesting features from a pedagogical point\nof view:\n\n- It runs out of the box, simply add the classes (or the jar) to your\n  classpath\n- The centralized processing makes it easy to perform step-by-step debugging\n  of a map-reduce job (down to the core implementatios of the framework,\n  since all source code is provided)\n- The map-reduce environment itself is made of **less than 250 lines of\n  code**\n- The examples and underlying implementation are simple and easy to\n  understand\n  \nSurprisingly, MrSim also offers a few features that large-scale\nmap-reduce implementations (such as Hadoop) don't have:\n\n- Inheritance is fully supported when declaring the types for tuple keys and\n  values. This means that a mapper working with tuples of type (K,V) will\n  properly accept a tuple of type (K',V') if K' is a descendant of K and\n  V' is a descendant of V. [This does not work in\n  Hadoop.](http://stackoverflow.com/questions/8553461)\n- Tuples output by reducers can be sent directly as input to mappers, making\n  multiple iterations of map-reduce cycles possible. Again, Hadoop does not\n  support this: tuples produced by reducers must be sent serialized to an\n  output collector, and then be re-read from an input collector and\n  converted back into tuples.\n\nAs a rule, don't expect any fancy features to be introduced if they\ninterfere with the system's current simplicity.\n\n## Compiling and Installing MrSim\n\nFirst make sure you have the following installed:\n\n- The Java Development Kit (JDK) to compile. MrSim was developed and\n  tested on version 6 of the JDK, but it is probably safe to use any\n  later version. Moreover, it most probably compiles on the JDK 5, although\n  this was not tested.\n- [Ant](http://ant.apache.org) to automate the compilation and build process\n\nDownload the sources for MrSim from\n[GitHub](http://github.com/sylvainhalle/MrSim) or clone the repository\nusing Git:\n\n    git clone git@github.com:sylvainhalle/MrSim.git\n\nCompile the sources by simply typing:\n\n    ant\n\nThis will produce a file called `mrsim.jar` in the folder. This\nfile is stand-alone and can be used as a library, so it can be\nmoved around to the location of your choice and included in the build\npath of the project.\n\nIn addition, the script generates in the `doc` folder the Javadoc\ndocumentation for using MrSim. This documentation is also embedded in the\nJAR file. To show documentation in Eclipse, right-click on the jar, click\n\"Properties\", then fill the Javadoc location (which is the JAR itself).\n\n## How to use MrSim?\n\nSee the `Source/Examples` folder for some examples, and the\n`Source/MapReduce/doc` folder for detailed documentation of the code.\n\n## Who maintains MrSim?\n\nMrSim has been developed and is currently maintained by\n[Sylvain Hallé](http://leduotang.ca/sylvain), associate professor at\nUniversité du Québec à Chicoutimi (Canada).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsylvainhalle%2Fmrsim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsylvainhalle%2Fmrsim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsylvainhalle%2Fmrsim/lists"}