{"id":18074966,"url":"https://github.com/lnsp/cloudsort","last_synced_at":"2025-04-05T19:19:05.883Z","repository":{"id":82587845,"uuid":"345409292","full_name":"lnsp/cloudsort","owner":"lnsp","description":"Attempt to beat Spark in the sortbenchmark.org cloudsort benchmark","archived":false,"fork":false,"pushed_at":"2022-01-13T22:18:11.000Z","size":63,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-11T16:41:42.211Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lnsp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-07T17:28:48.000Z","updated_at":"2023-03-05T04:59:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"1ecd5686-25af-42f0-b05a-2248daa83c82","html_url":"https://github.com/lnsp/cloudsort","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lnsp%2Fcloudsort","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lnsp%2Fcloudsort/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lnsp%2Fcloudsort/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lnsp%2Fcloudsort/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lnsp","download_url":"https://codeload.github.com/lnsp/cloudsort/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247386362,"owners_count":20930630,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-31T10:43:59.335Z","updated_at":"2025-04-05T19:19:05.863Z","avatar_url":"https://github.com/lnsp.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# cloudsort\n\nThis repository contains all project material required to perform the benchmark.\n\n## Installation\n\n```bash\n$ git clone https://github.com/lnsp/cloudsort.git\n$ cd cloudsort\n$ make\n```\n\n## Usage\n\n\u003e You need a valid S3 configuration to be able to run a sort job. If you want to test it locally, please use a local `minio` instance.\n\n### Start a control server\n\n```bash\n$ ./cloudsort control\n```\n\n### Start one or more worker instances\n\n```bash\n# Make sure that you choose a different address for each worker\n$ ./cloudsort worker\n```\n\n### Submit a sort job\n```\n$ ./cloudsort run --control localhost:6000 --s3-endpoint localhost:9000 --s3-bucket-id cloudsort --s3-object-key gensort-10G\n```\n\n## Architecture\n\nThe project is composed of one control node and N worker nodes. The control node takes care of\n- receiving new job from a client\n- decomposing a job into tasks and handing them out to workers\n- tracking job and task state\n\nA typical run consists of\n1. starting a control node and worker nodes\n2. submitting a new job to the control node\n3. the control node looks at the file size and assigns a task composed of\n    - key range which gets shuffled to the worker\n    - byte range which the worker should download, sort and shuffle to its peers\n4. the worker performs a regular heartbeat and pulls new tasks\n5. after receiving a new task, the worker downloads its chunk in multiple segment with maximum size defined by the `--mem` flag (DOWNLOAD state)\n6. once a segment is finished, the worker starts to sort it immediately\n7. after all segments are downloaded, they are merged back together and an index file is generated (SORT state)\n8. the worker contacts all its peer to allocate a new TCP port for receiving shuffled chunks (SHUFFLE state)\n9. the worker starts reading at N different positions in the merged chunk, sending each peer the respective data until a reader encounters a key which is outside of the target peer's range\n10. the data received from peers is immediately merged to disk (MERGE state, however its actually concurrent with SHUFFLE mostly)\n11. and then uploaded to S3 (UPLOAD state)\n12. after all report finished task back to control node (DONE state)\n\n## Project layout\n\n| File                     | Description                                                                                       |\n| ------------------------ | ------------------------------------------------------------------------------------------------- |\n| `pkg/worker/worker.go`   | Worker code, responsible for pulling and running new tasks                                        |\n| `pkg/worker/task.go`     | Task code, executes the different stages like downloading, merging, shuffling and uploading       |\n| `pkg/worker/sort.go`     | External sort code, most of it is unused due to experimentation with different sorting approaches |\n| `pkg/control/control.go` | Control code, hands out tasks to workers and broadcasts events to the job client                  |\n| `pb/pb.go`               | gRPC API used for communication between client, control and worker except for shuffling           |\n\n## Optimizations and benchmarks\n\n- I experimented with different sorting approaches and algorithms (first sort.Slice, then TimSort now a parallel merge sort) as well as ways to compare the 10 byte keys (bytes.Compare, using Cgo and finally converting a key into one uint64 and uint16 and comparing those)\n- I initially used gRPC for everything (including chunk shuffling), now its done using bare TCP connections reducing the amount of memory allocations by only using a set of two buffers (one active, one backup which are swapped when a chunk is sent/received)\n\n## Performance\n\n### Sorting 1TB with 10 workers and 1 control (8 vCPUs, 32GB RAM, 240GB SSD)\n\n```\nTIMESTAMP  PROGRESS   MESSAGE\n0.00       0.00       Job scheduled\n0.02       0.00       Target file has 1.0 TB of data\n0.30       0.00       state changed 10.0.0.15:6000=DOWNLOAD\n1.26       0.00       state changed 10.0.0.16:6000=DOWNLOAD\n3.03       0.00       state changed 10.0.0.12:6000=DOWNLOAD\n3.90       0.00       state changed 10.0.0.13:6000=DOWNLOAD\n4.79       0.00       state changed 10.0.0.17:6000=DOWNLOAD\n5.69       0.00       state changed 10.0.0.18:6000=DOWNLOAD\n6.68       0.00       state changed 10.0.0.19:6000=DOWNLOAD\n7.23       0.00       state changed 10.0.0.11:6000=DOWNLOAD\n7.57       0.00       state changed 10.0.0.8:6000=DOWNLOAD\n9.40       0.00       state changed 10.0.0.14:6000=DOWNLOAD\n961.02     0.00       state changed 10.0.0.14:6000=SORT\n988.41     0.00       state changed 10.0.0.13:6000=SORT\n1000.99    0.00       state changed 10.0.0.15:6000=SORT\n1132.09    0.00       state changed 10.0.0.17:6000=SORT\n1283.86    0.00       state changed 10.0.0.14:6000=SHUFFLE\n1302.89    0.00       state changed 10.0.0.16:6000=SORT\n1308.28    0.00       state changed 10.0.0.13:6000=SHUFFLE\n1320.12    0.00       state changed 10.0.0.15:6000=SHUFFLE\n1339.71    0.00       state changed 10.0.0.18:6000=SORT\n1411.15    0.00       state changed 10.0.0.19:6000=SORT\n1420.98    0.00       state changed 10.0.0.8:6000=SORT\n1487.17    0.00       state changed 10.0.0.17:6000=SHUFFLE\n1512.22    0.00       state changed 10.0.0.12:6000=SORT\n1677.07    0.00       state changed 10.0.0.16:6000=SHUFFLE\n1680.51    0.00       state changed 10.0.0.11:6000=SORT\n1723.06    0.00       state changed 10.0.0.18:6000=SHUFFLE\n1792.86    0.00       state changed 10.0.0.19:6000=SHUFFLE\n1814.83    0.00       state changed 10.0.0.8:6000=SHUFFLE\n1927.40    0.00       state changed 10.0.0.12:6000=SHUFFLE\n2119.42    0.00       state changed 10.0.0.11:6000=SHUFFLE\n2550.57    0.00       state changed 10.0.0.18:6000=MERGE\n2550.57    0.00       state changed 10.0.0.18:6000=UPLOAD\n2550.60    0.00       state changed 10.0.0.11:6000=MERGE\n2550.79    0.00       state changed 10.0.0.15:6000=MERGE\n2550.79    0.00       state changed 10.0.0.15:6000=UPLOAD\n2550.87    0.00       state changed 10.0.0.19:6000=MERGE\n2550.87    0.00       state changed 10.0.0.19:6000=UPLOAD\n2550.91    0.00       state changed 10.0.0.12:6000=MERGE\n2550.92    0.00       state changed 10.0.0.12:6000=UPLOAD\n2550.93    0.00       state changed 10.0.0.16:6000=MERGE\n2550.93    0.00       state changed 10.0.0.16:6000=UPLOAD\n2551.01    0.00       state changed 10.0.0.17:6000=MERGE\n2551.02    0.00       state changed 10.0.0.17:6000=UPLOAD\n2551.03    0.00       state changed 10.0.0.14:6000=MERGE\n2551.04    0.00       state changed 10.0.0.14:6000=UPLOAD\n2551.06    0.00       state changed 10.0.0.8:6000=MERGE\n2551.07    0.00       state changed 10.0.0.8:6000=UPLOAD\n2551.10    0.00       state changed 10.0.0.13:6000=MERGE\n2551.10    0.00       state changed 10.0.0.13:6000=UPLOAD\n2559.28    0.00       state changed 10.0.0.11:6000=UPLOAD\n3588.45    0.00       state changed 10.0.0.19:6000=DONE\n3598.97    0.00       state changed 10.0.0.17:6000=DONE\n3607.11    0.00       state changed 10.0.0.13:6000=DONE\n3614.02    0.00       state changed 10.0.0.18:6000=DONE\n3623.84    0.00       state changed 10.0.0.16:6000=DONE\n3636.27    0.00       state changed 10.0.0.12:6000=DONE\n3646.11    0.00       state changed 10.0.0.15:6000=DONE\n3662.00    0.00       state changed 10.0.0.14:6000=DONE\n3694.73    0.00       state changed 10.0.0.11:6000=DONE\n3721.79    0.00       state changed 10.0.0.8:6000=DONE\n```\n\n### Example run using 1 controller (8 vCPUs, 16GB, 240GB SSD) and 5 workers (4 vCPUs, 8GB, 160GB SSD)\n\nIn this setting, each node has to sort 16GB by itself using about 6.5GB of buffer memory and then shuffle its data over the network.\n\nThe obvious problem here is that the single controller / data storage caps out on network/disk speed during data fetching.\nCan be solved by using an external service like S3 or do multiple instances of minio, which perform load balancing.\n\n```\nTIMESTAMP  PROGRESS   MESSAGE\n0.00       0.00       Job scheduled\n0.01       0.00       Target file has 80 GB of data\n0.36       0.00       state changed 10.0.0.4:6000=DOWNLOAD\n1.00       0.00       state changed 10.0.0.6:6000=DOWNLOAD\n1.65       0.00       state changed 10.0.0.7:6000=DOWNLOAD\n2.24       0.00       state changed 10.0.0.5:6000=DOWNLOAD\n9.75       0.00       state changed 10.0.0.3:6000=DOWNLOAD\n166.78     0.00       state changed 10.0.0.4:6000=SORT\n178.92     0.00       state changed 10.0.0.7:6000=SORT\n187.43     0.00       state changed 10.0.0.5:6000=SORT\n222.27     0.00       state changed 10.0.0.4:6000=SHUFFLE\n236.67     0.00       state changed 10.0.0.7:6000=SHUFFLE\n243.47     0.00       state changed 10.0.0.5:6000=SHUFFLE\n245.19     0.00       state changed 10.0.0.3:6000=SORT\n261.11     0.00       state changed 10.0.0.6:6000=SORT\n305.01     0.00       state changed 10.0.0.3:6000=SHUFFLE\n330.77     0.00       state changed 10.0.0.6:6000=SHUFFLE\n396.68     0.00       state changed 10.0.0.6:6000=MERGE\n397.12     0.00       state changed 10.0.0.3:6000=MERGE\n397.13     0.00       state changed 10.0.0.3:6000=UPLOAD\n397.13     0.00       state changed 10.0.0.7:6000=MERGE\n397.14     0.00       state changed 10.0.0.7:6000=UPLOAD\n397.15     0.00       state changed 10.0.0.5:6000=MERGE\n397.15     0.00       state changed 10.0.0.5:6000=UPLOAD\n397.18     0.00       state changed 10.0.0.4:6000=MERGE\n397.18     0.00       state changed 10.0.0.4:6000=UPLOAD\n400.21     0.00       state changed 10.0.0.6:6000=UPLOAD\n439.07     0.00       state changed 10.0.0.4:6000=DONE\n439.88     0.00       state changed 10.0.0.5:6000=DONE\n446.41     0.00       state changed 10.0.0.3:6000=DONE\n446.44     0.00       state changed 10.0.0.7:6000=DONE\n455.38     0.00       state changed 10.0.0.6:6000=DONE\n```\n\nAs you can see, the total time it takes to upload/download is roughly 3 minutes.\nThe sort itself takes about 3 minutes. \n\nAt the moment, sorting performance is largely dominated by the run generation. The sort itself is very expensive due to the size of the entries. More tests may be needed to determine ways to optimize the external sort.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flnsp%2Fcloudsort","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flnsp%2Fcloudsort","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flnsp%2Fcloudsort/lists"}