{"id":19439463,"url":"https://github.com/openucx/sparkucx","last_synced_at":"2025-04-24T22:32:38.977Z","repository":{"id":46602564,"uuid":"202551411","full_name":"openucx/sparkucx","owner":"openucx","description":"A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer","archived":false,"fork":false,"pushed_at":"2023-10-30T07:17:44.000Z","size":111,"stargazers_count":49,"open_issues_count":11,"forks_count":31,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-03T12:46:48.575Z","etag":null,"topics":["apache-spark","big-data","hadoop","hpc","rdma","spark"],"latest_commit_sha":null,"homepage":"https://www.sparkucx.org/","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openucx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-15T14:01:09.000Z","updated_at":"2025-01-01T17:13:11.000Z","dependencies_parsed_at":"2024-11-10T15:28:24.665Z","dependency_job_id":"9f4c83f9-33a9-476f-a984-07d831a95f0f","html_url":"https://github.com/openucx/sparkucx","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openucx%2Fsparkucx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openucx%2Fsparkucx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openucx%2Fsparkucx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openucx%2Fsparkucx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openucx","download_url":"https://codeload.github.com/openucx/sparkucx/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250719821,"owners_count":21476143,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","big-data","hadoop","hpc","rdma","spark"],"created_at":"2024-11-10T15:23:05.994Z","updated_at":"2025-04-24T22:32:38.713Z","avatar_url":"https://github.com/openucx.png","language":"Scala","readme":"# SparkUCX ShuffleManager Plugin\nSparkUCX is a high performance ShuffleManager plugin for Apache Spark, that uses RDMA and other high performance transports\nthat are supported by [UCX](https://github.com/openucx/ucx#supported-transports), to perform Shuffle data transfers in Spark jobs.\n\nThis open-source project is developed, maintained and supported by the [UCF consortium](http://www.ucfconsortium.org/).\n\n## Runtime requirements\n* Apache Spark 2.3/2.4/3.0\n* Java 8+\n* Installed UCX of version 1.10+, and [UCX supported transport hardware](https://github.com/openucx/ucx#supported-transports).\n\n## Installation\n\n### Obtain SparkUCX\nPlease use the [\"Releases\"](https://github.com/openucx/sparkucx/releases) page to download SparkUCX jar file\nfor your spark version (e.g. spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar).\nPut SparkUCX jar file in $SPARK_UCX_HOME on all the nodes in your cluster.\n\u003cbr\u003eIf you would like to build the project yourself, please refer to the [\"Build\"](https://github.com/openucx/sparkucx#build) section below.\n\nUcx binaries **must** be in Spark classpath on every Spark Master and Worker.\nIt can be obtained by installing the latest version from [Ucx release page](https://github.com/openucx/ucx/releases)\n\n### Configuration\n\nProvide Spark the location of the SparkUCX plugin jars and ucx shared binaries by using the extraClassPath option.\n\n```\nspark.driver.extraClassPath     $SPARK_UCX_HOME/spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar:$UCX_PREFIX/lib\nspark.executor.extraClassPath   $SPARK_UCX_HOME/spark-ucx-1.0-for-spark-2.4.0-jar-with-dependencies.jar:$UCX_PREFIX/lib\n```\nTo enable the SparkUCX Shuffle Manager plugin, add the following configuration property\nto spark (e.g. in $SPARK_HOME/conf/spark-defaults.conf):\n\n```\nspark.shuffle.manager   org.apache.spark.shuffle.UcxShuffleManager\n```\nFor spark-3.0 version add SparkUCX ShuffleIO plugin:\n```\nspark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.compat.spark_3_0.UcxLocalDiskShuffleDataIO\n```\n\n### Build\n\nBuilding the SparkUCX plugin requires [Apache Maven](http://maven.apache.org/) and Java 8+ JDK\n\nBuild instructions:\n\n```\n% git clone https://github.com/openucx/sparkucx\n% cd sparkucx\n% mvn -DskipTests clean package -Pspark-2.4\n```\n\n### Performance\n\nSparkUCX plugin is built to provide the best performance out-of-the-box, and provides multiple configuration options to further tune SparkUCX per-job. For more information on how to setup [HiBench](https://github.com/Intel-bigdata/HiBench) benchmark and reproduce results, please refer to [Accelerated Apache SparkUCX 2.4/3.0 cluster deployment](https://docs.mellanox.com/pages/releaseview.action?pageId=19819236).\n\n![Performance results](https://docs.mellanox.com/download/attachments/19819236/image2020-1-23_15-39-14.png)\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenucx%2Fsparkucx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenucx%2Fsparkucx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenucx%2Fsparkucx/lists"}