{"id":37109516,"url":"https://github.com/anpu9/mit6.824-mapreduce","last_synced_at":"2026-01-14T13:01:17.564Z","repository":{"id":236861783,"uuid":"793300419","full_name":"anpu9/MIT6.824-MapReduce","owner":"anpu9","description":"MapReduce Implementation - Distributed System ","archived":false,"fork":false,"pushed_at":"2024-05-07T18:27:01.000Z","size":22073,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-05-07T19:36:04.017Z","etag":null,"topics":["distributed-systems","mapreduce-java"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anpu9.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-29T00:23:12.000Z","updated_at":"2024-06-19T10:56:56.048Z","dependencies_parsed_at":"2024-05-07T19:35:59.753Z","dependency_job_id":null,"html_url":"https://github.com/anpu9/MIT6.824-MapReduce","commit_stats":null,"previous_names":["anpu9/mit6.824-mapreduce"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/anpu9/MIT6.824-MapReduce","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anpu9%2FMIT6.824-MapReduce","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anpu9%2FMIT6.824-MapReduce/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anpu9%2FMIT6.824-MapReduce/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anpu9%2FMIT6.824-MapReduce/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anpu9","download_url":"https://codeload.github.com/anpu9/MIT6.824-MapReduce/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anpu9%2FMIT6.824-MapReduce/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28420816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T10:47:48.104Z","status":"ssl_error","status_checked_at":"2026-01-14T10:46:19.031Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-systems","mapreduce-java"],"created_at":"2026-01-14T13:01:16.993Z","updated_at":"2026-01-14T13:01:17.559Z","avatar_url":"https://github.com/anpu9.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MIT6.824-MapReduce\n\n## Introduction\n\nThis is the first lab of [MIT 6.824](http://nil.csail.mit.edu/6.824/2021/index.html), distributed system.\n\nThis Lab is an implementation of [MapReduce](http://research.google.com/archive/mapreduce-osdi04.pdf), a framework  introduced by Google, which can make programs written in functional styles automatically parallelized and executed in a cluster of comodity machines.\n\n## Key Concepts\n\nUnder the hood, this framework consists of one `master` and multiple `worker`, which can be either`Map worker` or `Reduce worker`.\n\nThe `Master` will assign as-yet-unstarted tasks and keep track of the progress of these tasks.\n\nAs for `workers`, There are two phases:\n\n1. `Map`: the user-defined functions will receives an inputfile split, takes an input pair and produces a set of intermediate key-value pairs `Map (k1,v1) -\u003e list(k2,v2)`. And these buffered pairs will be written into local disks, partitioned into `R` partitons.\n2. `Reduce`: When last map task has finished, the worker assigned with reduce tasks will be notified by the Master about these location. It reads remotely the buffered data from local disks, sorts them by intermediate keys and applies them to `reducef`, finally append the output to `R` output files\n\n## Implementations\n\nWe're required to implement three major components: `Master`, `Worker`, `RPC`\n\n### Master\n\nMaster needs **data structures** that keeps tracks of the state and type for each tasks. And for each finished map tasks, it stores the locations of `R` intermediate files produced by map workers.\n\nThe **responsibilities** for `master` are:\n\n1. Assign each unstarted task to a certain worker. Especially, if the worker does not report the task back after an duration (10s here), reassign the task to another worker.\n```go\nfunc (m *Master) waitForTask(task *Task) {\n\tif task.Type != Map \u0026\u0026 task.Type != Reduce {\n\t\treturn\n\t}\n\t\u003c-time.After(TaskTimeout * time.Second)\n\tm.Mu.Lock()\n\tdefer m.Mu.Unlock()\n\tif task.Status == Assigned {\n\t\ttask.Status = Idle\n\t\ttask.WorkerId = -1\n\t\tfmt.Println(\"Task timeout, reset task status: \", *task)\n\t}\n}\n```\n2. Monitor the progress. Assign Reduce tasks until all map tasks have finished. When all tasks are done, master needs to notify worker to exit\n```go\nfunc (m *Master) ReportTaskDone(args *ReportTaskArgs, reply *ReportTaskReply) error {\n\tm.Mu.Lock()\n\tdefer m.Mu.Unlock()\n\ttaskType := args.TaskType\n\tvar task *Task\n\tif taskType == Map {\n\t\ttask = \u0026m.MapTasks[args.TaskId]\n\t} else {\n\t\ttask = \u0026m.ReduceTasks[args.TaskId]\n\t}\n\tif task.WorkerId == args.WorkerId \u0026\u0026 task.Status == Assigned {\n\t\ttask.Status = Done\n\t\tif taskType == Map \u0026\u0026 m.nMap \u003e 0 {\n\t\t\t//fmt.Printf(\"Map Task %d finished! \\n\", args.TaskId)\n\t\t\tm.nMap--\n\t\t} else if taskType == Reduce \u0026\u0026 m.nReduce \u003e 0 {\n\t\t\t//fmt.Printf(\"Reduce Task %d finished! \\n\", args.TaskId)\n\t\t\tm.nReduce--\n\t\t}\n\t}\n\treply.CanExit = m.nMap == 0 \u0026\u0026 m.nReduce == 0\n\n\treturn nil\n}\n```\n3. Validate the output. Ensure that nobody observers partially written files in the crashes. Only confirm an output file when it's completely written\n```go\nnewPath := fmt.Sprintf(\"mr-out-%d\", index)\n\terr = os.Rename(file.Name(), newPath)\n```\n### RPC\n\nIt handles two **data flow directions** between worker and master:\n\n1. `Master -\u003e Worker` : Master assigns an idle task for workers\n2. `Worker -\u003e Master` : Workers report the task's progress to the master\n```go\n/*\n`Worker -\u003e Master` : Workers report the task's progress to the master\n */\ntype ReportTaskArgs struct { \n\tWorkerId int\n\tTaskType TaskType\n\tTaskId   int\n}\ntype ReportTaskReply struct {\n\tCanExit bool\n}\n/*\n`Worker -\u003e Master` : Workers report the Reduce task's partition to the master\n*/\ntype BufferArgs struct {\n\tTaskId   int\n\tLocation string\n}\n/*\n `Master -\u003e Worker` : Master assigns an idle task for workers\n*/\ntype TaskArgs struct {\n\tWorkerId int\n}\ntype TaskReply struct {\n\tTask Task\n}\n```\n### Worker\n\n`worker` is kind of single thread. It keeps requesting new task, processing it either by `mapf` or ``reducef`, report it and exit when `master` sends signal to exit.\n\n```go\nfor {\n\t\treply, succ := CallForTask()\n\t\tif succ == false {\n\t\t\tfmt.Println(\"Failed to contact master, worker exiting.\")\n\t\t\treturn\n\t\t}\n\t\texit, succ := false, true\n\t\tif reply.Task.Type == Map {\n\t\t\tMapWorker(reply.Task, mapf)\n\t\t\texit, succ = ReportTaskDone(Map, reply.Task.Index)\n\t\t} else if reply.Task.Type == Reduce {\n\t\t\tReduceWorker(reply.Task, reducef)\n\t\t\texit, succ = ReportTaskDone(Reduce, reply.Task.Index)\n\t\t} else if reply.Task.Type == NoTask {\n\t\t\t// (map/all) tasks have been assigned, but still working\n\t\t} else {\n\t\t\t// exit, all task has finished\n\t\t\treturn\n\t\t}\n\t\tif exit || !succ {\n\t\t\tfmt.Println(\"Master exited or all tasks done, worker exiting.\")\n\t\t\treturn\n\t\t}\n\t\ttime.Sleep(TaskInterval * time.Millisecond)\n\t}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanpu9%2Fmit6.824-mapreduce","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanpu9%2Fmit6.824-mapreduce","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanpu9%2Fmit6.824-mapreduce/lists"}