{"id":13571224,"url":"https://github.com/apache/datafusion-comet","last_synced_at":"2026-04-01T18:55:49.991Z","repository":{"id":217348257,"uuid":"743651128","full_name":"apache/datafusion-comet","owner":"apache","description":"Apache DataFusion Comet Spark Accelerator","archived":false,"fork":false,"pushed_at":"2025-05-09T20:37:53.000Z","size":17429,"stargazers_count":944,"open_issues_count":243,"forks_count":202,"subscribers_count":56,"default_branch":"main","last_synced_at":"2025-05-10T17:16:23.819Z","etag":null,"topics":["arrow","datafusion","rust","spark"],"latest_commit_sha":null,"homepage":"https://datafusion.apache.org/comet","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-01-15T17:33:42.000Z","updated_at":"2025-05-10T03:55:43.000Z","dependencies_parsed_at":"2024-07-19T00:17:07.815Z","dependency_job_id":"db91aae2-65e2-4a3c-b6f1-ad9d6a7a12fd","html_url":"https://github.com/apache/datafusion-comet","commit_stats":{"total_commits":479,"total_committers":54,"mean_commits":8.87037037037037,"dds":0.7181628392484343,"last_synced_commit":"fa275f1c007cf3cce60272367d7c952bc867570c"},"previous_names":["apache/arrow-datafusion-comet","apache/datafusion-comet"],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fdatafusion-comet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fdatafusion-comet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fdatafusion-comet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fdatafusion-comet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/datafusion-comet/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254052658,"owners_count":22006716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","datafusion","rust","spark"],"created_at":"2024-08-01T14:00:59.981Z","updated_at":"2026-04-01T18:55:49.978Z","avatar_url":"https://github.com/apache.png","language":"Rust","readme":"\u003c!--\nLicensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements.  See the NOTICE file\ndistributed with this work for additional information\nregarding copyright ownership.  The ASF licenses this file\nto you under the Apache License, Version 2.0 (the\n\"License\"); you may not use this file except in compliance\nwith the License.  You may obtain a copy of the License at\n\n  http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing,\nsoftware distributed under the License is distributed on an\n\"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\nKIND, either express or implied.  See the License for the\nspecific language governing permissions and limitations\nunder the License.\n--\u003e\n\n# Apache DataFusion Comet\n\n[![Apache licensed][license-badge]][license-url]\n[![Discord chat][discord-badge]][discord-url]\n[![Pending PRs][pending-pr-badge]][pending-pr-url]\n[![Maven Central][maven-badge]][maven-url]\n\n[license-badge]: https://img.shields.io/badge/license-Apache%20v2-blue.svg\n[license-url]: https://github.com/apache/datafusion-comet/blob/main/LICENSE.txt\n[discord-badge]: https://img.shields.io/discord/885562378132000778.svg?logo=discord\u0026style=flat-square\n[discord-url]: https://discord.gg/3EAr4ZX6JK\n[pending-pr-badge]: https://img.shields.io/github/issues-search/apache/datafusion-comet?query=is%3Apr+is%3Aopen+draft%3Afalse+review%3Arequired+status%3Asuccess\u0026label=Pending%20PRs\u0026logo=github\n[pending-pr-url]: https://github.com/apache/datafusion-comet/pulls?q=is%3Apr+is%3Aopen+draft%3Afalse+review%3Arequired+status%3Asuccess+sort%3Aupdated-desc\n[maven-badge]: https://img.shields.io/maven-central/v/org.apache.datafusion/comet-spark-spark4.0_2.13\n[maven-url]: https://search.maven.org/search?q=g:org.apache.datafusion%20AND%20comet-spark\n\n\u003cimg src=\"docs/source/_static/images/DataFusionComet-Logo-Light.png\" width=\"512\" alt=\"logo\"/\u003e\n\nApache DataFusion Comet is a high-performance accelerator for Apache Spark, built on top of the powerful\n[Apache DataFusion] query engine. Comet is designed to significantly enhance the\nperformance of Apache Spark workloads while leveraging commodity hardware and seamlessly integrating with the\nSpark ecosystem without requiring any code changes.\n\nComet also accelerates Apache Iceberg, when performing Parquet scans from Spark.\n\n[Apache DataFusion]: https://datafusion.apache.org\n\n# Benefits of Using Comet\n\n## Run Spark Queries at DataFusion Speeds\n\nComet delivers a performance speedup for many queries, enabling faster data processing and shorter time-to-insights.\n\nThe following chart shows the time it takes to run the 22 TPC-H queries against 100 GB of data in Parquet format\nusing a single executor with 8 cores. See the [Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html)\nfor details of the environment used for these benchmarks.\n\nWhen using Comet, the overall run time is reduced from 687 seconds to 302 seconds, a 2.2x speedup.\n\n![](docs/source/_static/images/benchmark-results/0.11.0/tpch_allqueries.png)\n\nHere is a breakdown showing relative performance of Spark and Comet for each TPC-H query.\n\n![](docs/source/_static/images/benchmark-results/0.11.0/tpch_queries_compare.png)\n\nThe following charts shows how much Comet currently accelerates each query from the benchmark.\n\n### Relative speedup\n\n![](docs/source/_static/images/benchmark-results/0.11.0/tpch_queries_speedup_rel.png)\n\n### Absolute speedup\n\n![](docs/source/_static/images/benchmark-results/0.11.0/tpch_queries_speedup_abs.png)\n\nThese benchmarks can be reproduced in any environment using the documentation in the\n[Comet Benchmarking Guide](https://datafusion.apache.org/comet/contributor-guide/benchmarking.html). We encourage\nyou to run your own benchmarks.\n\nResults for our benchmark derived from TPC-DS are available in the [benchmarking guide](https://datafusion.apache.org/comet/contributor-guide/benchmark-results/tpc-ds.html).\n\n## Use Commodity Hardware\n\nComet leverages commodity hardware, eliminating the need for costly hardware upgrades or\nspecialized hardware accelerators, such as GPUs or FPGA. By maximizing the utilization of commodity hardware, Comet\nensures cost-effectiveness and scalability for your Spark deployments.\n\n## Spark Compatibility\n\nComet aims for 100% compatibility with all supported versions of Apache Spark, allowing you to integrate Comet into\nyour existing Spark deployments and workflows seamlessly. With no code changes required, you can immediately harness\nthe benefits of Comet's acceleration capabilities without disrupting your Spark applications.\n\n## Tight Integration with Apache DataFusion\n\nComet tightly integrates with the core Apache DataFusion project, leveraging its powerful execution engine. With\nseamless interoperability between Comet and DataFusion, you can achieve optimal performance and efficiency in your\nSpark workloads.\n\n## Active Community\n\nComet boasts a vibrant and active community of developers, contributors, and users dedicated to advancing the\ncapabilities of Apache DataFusion and accelerating the performance of Apache Spark.\n\n## Getting Started\n\nTo get started with Apache DataFusion Comet, follow the\n[installation instructions](https://datafusion.apache.org/comet/user-guide/installation.html). Join the\n[DataFusion Slack and Discord channels](https://datafusion.apache.org/contributor-guide/communication.html) to connect\nwith other users, ask questions, and share your experiences with Comet.\n\nFollow [Apache DataFusion Comet Overview](https://datafusion.apache.org/comet/about/index.html#comet-overview) to get more detailed information\n\n## Contributing\n\nWe welcome contributions from the community to help improve and enhance Apache DataFusion Comet. Whether it's fixing\nbugs, adding new features, writing documentation, or optimizing performance, your contributions are invaluable in\nshaping the future of Comet. Check out our\n[contributor guide](https://datafusion.apache.org/comet/contributor-guide/contributing.html) to get started.\n\n## License\n\nApache DataFusion Comet is licensed under the Apache License 2.0. See the [LICENSE.txt](LICENSE.txt) file for details.\n\n## Acknowledgments\n\nWe would like to express our gratitude to the Apache DataFusion community for their support and contributions to\nComet. Together, we're building a faster, more efficient future for big data processing with Apache Spark.\n","funding_links":[],"categories":["🔄 Data Plattform Tools","Scala","大数据"],"sub_categories":["🧠 Prompt Engineering \u0026 Memory Bank"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fdatafusion-comet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Fdatafusion-comet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fdatafusion-comet/lists"}