https://github.com/oracle/spark-oracle
On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.
https://github.com/oracle/spark-oracle
oracle spark sql
Last synced: 8 months ago
JSON representation
On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.
- Host: GitHub
- URL: https://github.com/oracle/spark-oracle
- Owner: oracle
- License: other
- Created: 2022-01-05T18:12:13.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2025-03-10T12:26:51.000Z (about 1 year ago)
- Last Synced: 2025-04-09T23:38:00.434Z (12 months ago)
- Topics: oracle, spark, sql
- Language: Scala
- Homepage:
- Size: 2.87 MB
- Stars: 33
- Watchers: 9
- Forks: 10
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Security: SECURITY.md
Awesome Lists containing this project
README
# Spark_On_Oracle
- Currently, data lakes comprising Oracle Data Warehouse and Apache Spark have these characteristics:
- They have **separate data catalogs,** even if they access the same data in an object store.
- Applications built entirely on Spark have to **compensate for gaps in data management.**
- Applications that federate across Spark and Oracle usually suffer from
**inefficient data movement.**
- Operating Spark clusters are expensive because they lack administration tooling
and they have gaps in data management. **Therefore, price-performance advantages of Spark are overstated.**

This project fixes those issues:
- It provides a single catalog: Oracle Data Dictionary.
- Oracle is responsible for data management, including:
- Consistency
- Isolation
- Security
- Storage layout
- Data lifecycle
- Data in an object store managed by Oracle as external tables
- It provides support for a full Spark programming model.
- **Spark on Oracle** has these characteristics:
- Full pushdown on SQL workloads: Query, DML on all tables, DDL for external tables.
- Push SQL operations of other workloads.
- Surface Oracle capabilities like machine learning and streaming in the Spark programming model.
- Co-processor on Oracle instances to run certain kinds of Scala code. Co-processors are isolated and limited and therefore are easy to manage.
- Enable simpler, smaller Spark clusters.

**Feature summary:**
- Catalog integration. (See [this page](https://github.com/oracle/spark-oracle/wiki/Oracle-Catalog).)
- Significant support for SQL pushdown, to the extent that more than 95 (of 99) [TPCDS queries](https://github.com/oracle/spark-oracle/wiki/TPCDS-Queries)
are completely pushed to Oracle instance. (See [Operator](https://github.com/oracle/spark-oracle/wiki/Operator-Translation) and [Expression](https://github.com/oracle/spark-oracle/wiki/Expression-Translation) translation pages.)
- Deployable as a Spark extension jar for Spark 3 environments.
- [Language integration beyond SQL](https://github.com/oracle/spark-oracle/wiki/Language-Integration)
and [DML](https://github.com/oracle/spark-oracle/wiki/Write-Path-Flow) support.
See [Project Wiki](https://github.com/oracle/spark-oracle/wiki/home) for complete documentation.
## Installation
Spark on Oracle can be deployed on any Spark 3.1 or above environment.
See the [Quick Start Guide](https://github.com/oracle/spark-oracle/wiki/Quick-Start-Guide).
## Documentation
See the [wiki](https://github.com/oracle/spark-oracle/wiki/home).
## Examples
The [demo script](https://github.com/oracle/spark-oracle/wiki/Demo) walks you
through the features of the library.
## Help
Please file Github issues.
## Contributing
This project welcomes contributions from the community. Before submitting a pull
request, please [review our contribution guide](./CONTRIBUTING.md).
## Security
Please consult the [security guide](./SECURITY.md) for our responsible security
vulnerability disclosure process.
## License
Copyright (c) 2021, 2023 Oracle and/or its affiliates.
Released under the Universal Permissive License v1.0 as shown at
.