{"id":34601967,"url":"https://github.com/prxssh/shard","last_synced_at":"2026-05-27T17:31:58.065Z","repository":{"id":333190601,"uuid":"1126342602","full_name":"prxssh/shard","owner":"prxssh","description":"simplified distributed data processing","archived":false,"fork":false,"pushed_at":"2026-01-17T20:52:53.000Z","size":47,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-01-18T07:19:16.235Z","etag":null,"topics":["distributed-systems","golang","grpc","mapreduce"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/prxssh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-01T17:47:46.000Z","updated_at":"2026-01-17T20:52:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/prxssh/shard","commit_stats":null,"previous_names":["prxssh/shard"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/prxssh/shard","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prxssh%2Fshard","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prxssh%2Fshard/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prxssh%2Fshard/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prxssh%2Fshard/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/prxssh","download_url":"https://codeload.github.com/prxssh/shard/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prxssh%2Fshard/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33577633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-27T02:00:06.184Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-systems","golang","grpc","mapreduce"],"created_at":"2025-12-24T12:52:15.736Z","updated_at":"2026-05-27T17:31:58.058Z","avatar_url":"https://github.com/prxssh.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# shard\n\n`shard` is a lightweight, easy-to-use MapReduce framework for Go. It provides a\nsimple and flexible way to write and run distributed computations on a cluster\nof machines.\n\n## Features\n\n*   **Simple API:** `shard` provides a simple and intuitive API for writing\n    MapReduce programs.\n*   **Pluggable Components:** `shard` allows you to bring your own `Mapper`,\n    `Reducer`, `Combiner`, `Partitioner`, and `Filesystem` implementations.\n*   **Master-Worker Architecture:** `shard` uses a master-worker architecture\n    to distribute and manage tasks.\n*   **gRPC for Communication:** `shard` uses gRPC for efficient and reliable\n    communication between the master and worker nodes.\n\n## Installation\n\nTo install `shard`, use `go get`:\n\n```bash\ngo get github.com/prxssh/shard\n```\n\n## Configuration\n\n`shard` can be configured using environment variables or through the `Config`\nstruct.\n\n| Environment Variable | `Config` Field      | Description                               | Default                             |\n| -------------------- | ------------------- | ----------------------------------------- | ----------------------------------- |\n| `SHARD_MODE`         | -                   | The mode to run in (`master` or `worker`). | `master`                            |\n| `SHARD_MASTER_ADDR`  | `MasterAddress`     | The address of the master node.           | `localhost:6969`                    |\n| -                    | `InputPath`         | The path to the input file or directory.  | -                                   |\n| -                    | `OutputDir`         | The path to the output directory.         | `./shard`                           |\n| -                    | `NumReducers`       | The number of reduce tasks.               | `16`                                |\n| -                    | `ChunkSize`         | The size of each input split.             | `64MB`                              |\n| -                    | `MaxConcurrency`    | The maximum number of concurrent tasks.   | `runtime.NumCPU() * 2`              |\n\nCheck the [config.go](https://github.com/prxssh/shard/blob/master/config.go)\nfor complete configuration.\n\n## Usage\n\n\u003e [!WARNING] \n\u003e This project is written just for learning purposes and breaking changes are\n\u003e to be expected.\n\nHere is an example of how to use `shard` to implement a word count program:\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"strconv\"\n\t\"strings\"\n\n\t\"github.com/prxssh/shard\"\n\t\"github.com/prxssh/shard/api\"\n\t\"github.com/prxssh/shard/pkg/filesystem\"\n)\n\nfunc main() {\n\t// Create a new shard config.\n\tcfg, err := shard.NewConfig(\n\t\tshard.WithInputPath(\"input.txt\"),\n\t\tshard.WithMapper(Map),\n\t\tshard.WithReducer(Reduce),\n\t\tshard.WithFilesystem(filesystem.NewLocal()),\n\t)\n\tif err != nil {\n\t\tpanic(err)\n\t}\n\n\t// Run the shard job.\n\tif err := shard.Run(cfg); err != nil {\n\t\tpanic(err)\n\t}\n}\n\n// Map is a mapper that emits a count for each word.\nfunc Map(key, value string, emit api.Emitter) error {\n\twords := strings.Fields(value)\n\tfor _, word := range words {\n\t\tif err := emit(word, \"1\"); err != nil {\n\t\t\treturn err\n\t\t}\n\t}\n\treturn nil\n}\n\n// Reduce is a reducer that sums the counts for each word.\nfunc Reduce(key string, values api.Iterator, emit api.Emitter) error {\n\tcount := 0\n\tfor {\n\t\t_, ok := values.Next()\n\t\tif !ok {\n\t\t\tbreak\n\t\t}\n\t\tcount++\n\t}\n\n\treturn emit(key, strconv.Itoa(count))\n}\n```\n\n## Development\n\nInformation for developers, including how to run tests and generate protobuf files.\n\n### Running Tests\n\nTo run the tests, use the following command:\n\n```bash\nmake test\n```\n\n### Generating Protobuf Files\n\nTo generate the protobuf files, use the following command:\n\n```bash\nmake gen-proto FILE=path/to/file.proto\n```\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file\nfor details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprxssh%2Fshard","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprxssh%2Fshard","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprxssh%2Fshard/lists"}