{"id":28604013,"url":"https://github.com/uber/remoteshuffleservice","last_synced_at":"2025-06-11T17:40:19.764Z","repository":{"id":37403695,"uuid":"289070617","full_name":"uber/RemoteShuffleService","owner":"uber","description":"Remote shuffle service for Apache Spark to store shuffle data on remote servers. ","archived":false,"fork":false,"pushed_at":"2023-09-29T12:15:57.000Z","size":1542,"stargazers_count":318,"open_issues_count":31,"forks_count":98,"subscribers_count":19,"default_branch":"master","last_synced_at":"2024-05-09T07:59:23.677Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uber.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-08-20T17:41:23.000Z","updated_at":"2024-05-02T03:20:23.000Z","dependencies_parsed_at":"2023-02-16T13:01:19.587Z","dependency_job_id":"cd74bece-a286-4c4e-b45e-fe3cac56475e","html_url":"https://github.com/uber/RemoteShuffleService","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/uber/RemoteShuffleService","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber%2FRemoteShuffleService","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber%2FRemoteShuffleService/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber%2FRemoteShuffleService/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber%2FRemoteShuffleService/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uber","download_url":"https://codeload.github.com/uber/RemoteShuffleService/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber%2FRemoteShuffleService/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259308163,"owners_count":22837974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-11T17:40:08.238Z","updated_at":"2025-06-11T17:40:19.753Z","avatar_url":"https://github.com/uber.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Uber Remote Shuffle Service (RSS)\n\nUber Remote Shuffle Service provides the capability for Apache Spark applications to store shuffle data \non remote servers. See more details on Spark community document: \n[[SPARK-25299][DISCUSSION] Improving Spark Shuffle Reliability](https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit?ts=5e3c57b8).\n\nPlease contact us (**remoteshuffleservice@googlegroups.com**) for any question or feedback.\n\n## Supported Spark Version\n\n- The **master** branch supports **Spark 2.4.x**. The **spark30** branch supports **Spark 3.0.x**.\n\n## How to Build\n\nMake sure JDK 8+ and maven is installed on your machine.\n\n#### Build RSS Server\n\n- Run: \n\n```\nmvn clean package -Pserver -DskipTests\n```\n\nThis command creates **remote-shuffle-service-xxx-server.jar** file for RSS server, e.g. target/remote-shuffle-service-0.0.9-server.jar.\n\n### Build RSS Client\n\n- Run: \n\n```\nmvn clean package -Pclient -DskipTests\n```\n\nThis command creates **remote-shuffle-service-xxx-client.jar** file for RSS client, e.g. target/remote-shuffle-service-0.0.9-client.jar.\n\n## How to Run\n\n### Step 1: Run RSS Server\n\n- Pick up a server in your environment, e.g. `server1`. Run RSS server jar file (**remote-shuffle-service-xxx-server.jar**) as a Java application, for example,\n\n```\njava -Dlog4j.configuration=log4j-rss-prod.properties -cp target/remote-shuffle-service-0.0.9-server.jar com.uber.rss.StreamServer -port 12222 -serviceRegistry standalone -dataCenter dc1\n```\n\n### Step 2: Run Spark application with RSS Client\n\n- Upload client jar file (**remote-shuffle-service-xxx-client.jar**) to your HDFS, e.g. `hdfs:///file/path/remote-shuffle-service-0.0.9-client.jar`\n\n- Add configure to your Spark application like following (you need to adjust the values based on your environment):\n\n```\nspark.jars=hdfs:///file/path/remote-shuffle-service-0.0.9-client.jar\nspark.executor.extraClassPath=remote-shuffle-service-0.0.9-client.jar\nspark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager\nspark.shuffle.rss.serviceRegistry.type=standalone\nspark.shuffle.rss.serviceRegistry.server=server1:12222\nspark.shuffle.rss.dataCenter=dc1\n```\n\n- Run your Spark application\n\n## Run with High Availability\n\nRemote Shuffle Service could use a [Apache ZooKeeper](https://zookeeper.apache.org/) cluster and register live service \ninstances in ZooKeeper. Spark applications will look up ZooKeeper to find and use active Remote Shuffle Service instances. \n\nIn this configuration, ZooKeeper serves as a **Service Registry** for Remote Shuffle Service, and we need to add those \nparameters when starting RSS server and Spark application.\n\n### Step 1: Run RSS Server with ZooKeeper as service registry\n\n- Assume there is a ZooKeeper server `zkServer1`. Pick up a server in your environment, e.g. `server1`. Run RSS server jar file (**remote-shuffle-service-xxx-server.jar**) as a Java application on `server1`, for example,\n\n```\njava -Dlog4j.configuration=log4j-rss-prod.properties -cp target/remote-shuffle-service-0.0.9-server.jar com.uber.rss.StreamServer -port 12222 -serviceRegistry zookeeper -zooKeeperServers zkServer1:2181 -dataCenter dc1\n```\n\n### Step 2: Run Spark application with RSS Client and ZooKeeper service registry\n\n- Upload client jar file (**remote-shuffle-service-xxx-client.jar**) to your HDFS, e.g. `hdfs:///file/path/remote-shuffle-service-0.0.9-client.jar`\n\n- Add configure to your Spark application like following (you need to adjust the values based on your environment):\n\n```\nspark.jars=hdfs:///file/path/remote-shuffle-service-0.0.9-client.jar\nspark.executor.extraClassPath=remote-shuffle-service-0.0.9-client.jar\nspark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager\nspark.shuffle.rss.serviceRegistry.type=zookeeper\nspark.shuffle.rss.serviceRegistry.zookeeper.servers=zkServer1:2181\nspark.shuffle.rss.dataCenter=dc1\n```\n\n- Run your Spark application\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuber%2Fremoteshuffleservice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuber%2Fremoteshuffleservice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuber%2Fremoteshuffleservice/lists"}