{"id":13435582,"url":"https://github.com/kwai/blaze","last_synced_at":"2025-05-14T12:09:41.299Z","repository":{"id":36974961,"uuid":"380944275","full_name":"kwai/blaze","owner":"kwai","description":"Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.","archived":false,"fork":false,"pushed_at":"2025-04-11T08:57:30.000Z","size":9831,"stargazers_count":1440,"open_issues_count":2,"forks_count":151,"subscribers_count":23,"default_branch":"master","last_synced_at":"2025-04-12T08:12:15.201Z","etag":null,"topics":["big-data","datafusion","rust-lang","spark"],"latest_commit_sha":null,"homepage":"https://blaze-project.github.io/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kwai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-28T07:29:43.000Z","updated_at":"2025-04-12T04:14:09.000Z","dependencies_parsed_at":"2024-11-08T09:27:13.248Z","dependency_job_id":"4fead363-490d-4cc1-9821-f2e670ed1f97","html_url":"https://github.com/kwai/blaze","commit_stats":{"total_commits":1002,"total_committers":23,"mean_commits":43.56521739130435,"dds":0.5099800399201597,"last_synced_commit":"f222016e4cc1ceb96467e8b92e2c0d735d8768cf"},"previous_names":["blaze-init/blaze"],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kwai%2Fblaze","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kwai%2Fblaze/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kwai%2Fblaze/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kwai%2Fblaze/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kwai","download_url":"https://codeload.github.com/kwai/blaze/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248537144,"owners_count":21120711,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","datafusion","rust-lang","spark"],"created_at":"2024-07-31T03:00:37.096Z","updated_at":"2025-05-14T12:09:41.282Z","avatar_url":"https://github.com/kwai.png","language":"Rust","funding_links":[],"categories":["HarmonyOS","Rust","大数据"],"sub_categories":["Windows Manager"],"readme":"\u003c!---\n  Copyright 2022 The Blaze Authors\n  \n  Licensed under the Apache License, Version 2.0 (the \"License\");\n  you may not use this file except in compliance with the License.\n  You may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0 \n\n  Unless required by applicable law or agreed to in writing, software\n  distributed under the License is distributed on an \"AS IS\" BASIS,\n  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n  See the License for the specific language governing permissions and\n  limitations under the License.\n--\u003e\n\n# BLAZE\n\n[![TPC-DS](https://github.com/blaze-init/blaze/actions/workflows/tpcds.yml/badge.svg?branch=master)](https://github.com/blaze-init/blaze/actions/workflows/tpcds.yml)\n[![master-ce7-builds](https://github.com/blaze-init/blaze/actions/workflows/build-ce7-releases.yml/badge.svg?branch=master)](https://github.com/blaze-init/blaze/actions/workflows/build-ce7-releases.yml)\n\n![dev/blaze-logo.png](./dev/blaze-logo.png)\n\nThe Blaze accelerator for Apache Spark leverages native vectorized execution to accelerate query processing. It combines\nthe power of the [Apache DataFusion](https://arrow.apache.org/datafusion/) library and the scale of the Spark distributed\ncomputing framework.\n\nBlaze takes a fully optimized physical plan from Spark, mapping it into DataFusion's execution plan, and performs native\nplan computation in Spark executors.\n\nBlaze is composed of the following high-level components:\n\n- **Spark Extension**: hooks the whole accelerator into Spark execution lifetime.\n- **Spark Shims**: specialized codes for different versions of spark.\n- **Native Engine**: implements the native engine in rust, including:\n  - ExecutionPlan protobuf specification\n  - JNI gateway\n  - Customized operators, expressions, functions\n\nBased on the inherent well-defined extensibility of DataFusion, Blaze can be easily extended to support:\n\n- Various object stores.\n- Operators.\n- Simple and Aggregate functions.\n- File formats.\n\nWe encourage you to [extend DataFusion](https://github.com/apache/arrow-datafusion) capability directly and add the\nsupports in Blaze with simple modifications in plan-serde and extension translation.\n\n## Build from source\n\nTo build Blaze, please follow the steps below:\n\n1. Install Rust\n\nThe native execution lib is written in Rust. So you're required to install Rust (nightly) first for\ncompilation. We recommend you to use [rustup](https://rustup.rs/).\n\n2. Install Protobuf\n\nEnsure `protoc` is available in PATH environment. protobuf can be installed via linux system package\nmanager (or Homebrew on mac), or manually download and build from https://github.com/protocolbuffers/protobuf/releases .\n\n3. Install JDK+Maven\n\nBlaze has been well tested on jdk8 and maven3.5, should work fine with higher versions.\n\n4. Check out the source code.\n\n```shell\ngit clone git@github.com:kwai/blaze.git\ncd blaze\n```\n\n5. Build the project.\n\nSpecify shims package of which spark version that you would like to run on.\n\nCurrently, we have supported these shims:\n\n* spark-3.0 - for spark3.0.x\n* spark-3.1 - for spark3.1.x\n* spark-3.2 - for spark3.2.x\n* spark-3.3 - for spark3.3.x\n* spark-3.4 - for spark3.4.x\n* spark-3.5 - for spark3.5.x.\n\nYou could either build Blaze in pre mode for debugging or in release mode to unlock the full potential of\nBlaze.\n\n```shell\nSHIM=spark-3.3 # or spark-3.0/spark-3.1/spark-3.2/spark-3.3/spark-3.4/spark-3.5\nMODE=release # or pre\nmvn clean package -P\"${SHIM}\" -P\"${MODE}\"\n```\n\nSkip build native (native lib is already built, and you can check the native lib in `native-engine/_build/${MODE}`).\n\n```shell\nSHIM=spark-3.3 # or spark-3.0/spark-3.1/spark-3.2/spark-3.3/spark-3.4/spark-3.5\nMODE=release # or pre\nmvn clean package -P\"${SHIM}\" -P\"${MODE}\" -DskipBuildNative\n```\n\nAfter the build is finished, a fat Jar package that contains all the dependencies will be generated in the `target`\ndirectory.\n\n## Build with docker\n\nYou can use the following command to build a centos-7 compatible release:\n```shell\nSHIM=spark-3.3 MODE=release ./release-docker.sh\n```\n\n## Run Spark Job with Blaze Accelerator\n\nThis section describes how to submit and configure a Spark Job with Blaze support.\n\n1. move blaze jar package to spark client classpath (normally `spark-xx.xx.xx/jars/`).\n\n2. add the follow confs to spark configuration in `spark-xx.xx.xx/conf/spark-default.conf`:\n\n```properties\nspark.blaze.enable true\nspark.sql.extensions org.apache.spark.sql.blaze.BlazeSparkSessionExtension\nspark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager\nspark.memory.offHeap.enabled false\n\n# suggested executor memory configuration\nspark.executor.memory 4g\nspark.executor.memoryOverhead 4096\n```\n\n3. submit a query with spark-sql, or other tools like spark-thriftserver:\n```shell\nspark-sql -f tpcds/q01.sql\n```\n\n## Integrate with Apache Celeborn\nBlaze has supported Celeborn integration now, use the following configurations to enable shuffling with Celeborn:\n\n```properties\n\n# change celeborn endpoint and storage directory to the correct location\nspark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.celeborn.BlazeCelebornShuffleManager\nspark.serializer org.apache.spark.serializer.KryoSerializer\nspark.celeborn.master.endpoints localhost:9097\nspark.celeborn.client.spark.shuffle.writer hash\nspark.celeborn.client.push.replicate.enabled false\nspark.celeborn.storage.availableTypes HDFS\nspark.celeborn.storage.hdfs.dir hdfs:///home/celeborn\nspark.sql.adaptive.localShuffleReader.enabled false\n```\n## Integrate with Apache Uniffle\nBlaze supports integration with Apache Uniffle, a high-performance remote shuffle service for Apache Spark. \n\nTo enable Uniffle as the shuffle manager in Blaze, configure your Spark application with the following settings in \n`spark-defaults.conf` or via Spark submit options:\n\n```properties\n\nspark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.uniffle.BlazeUniffleShuffleManager\nspark.serializer org.apache.spark.serializer.KryoSerializer\nspark.rss.coordinator.quorum \u003ccoordinatorIp1\u003e:19999,\u003ccoordinatorIp2\u003e:19999\nspark.rss.enabled true\n```\nNotes:\n\n* Uniffle Client Dependency: Ensure the Uniffle client library (e.g., `rss-client-spark3-shaded-0.9.2.jar` for Uniffle 0.9.2 or later) is included in your Spark application's classpath.\n* Coordinator Endpoints: Replace `\u003ccoordinator-host\u003e:19999` with the actual Uniffle coordinator address in your cluster.\n* For detailed setup and advanced configuration, refer to the [Apache Uniffle Documentation](https://uniffle.apache.org/docs/client-guide).\n\n## Performance\n\nCheck [TPC-H Benchmark Results](./benchmark-results/tpch.md).\nThe latest benchmark result shows that Blaze saved more than 50% time on TPC-H 1TB datasets comparing with Vanilla Spark 3.5.\n\nStay tuned and join us for more upcoming thrilling numbers.\n\nTPC-H Query time:\n![tpch-blaze400-spark351.png](./benchmark-results/tpch-blaze400-spark351.png)\n\nWe also encourage you to benchmark Blaze and share the results with us. 🤗\n\n## Community\n\nWe're using [Discussions](https://github.com/blaze-init/blaze/discussions) to connect with other members\nof our community. We hope that you:\n- Ask questions you're wondering about.\n- Share ideas.\n- Engage with other community members.\n- Welcome others who are open-minded. Remember that this is a community we build together 💪 .\n\n\n## License\n\nBlaze is licensed under the Apache 2.0 License. A copy of the license\n[can be found here.](LICENSE.txt)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkwai%2Fblaze","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkwai%2Fblaze","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkwai%2Fblaze/lists"}