https://github.com/phact/dse-spark-rest-api
https://github.com/phact/dse-spark-rest-api
Last synced: 7 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/phact/dse-spark-rest-api
- Owner: phact
- Created: 2017-11-27T23:31:03.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-10-18T18:15:22.000Z (over 7 years ago)
- Last Synced: 2025-02-22T15:58:27.684Z (over 1 year ago)
- Language: Shell
- Size: 215 KB
- Stars: 0
- Watchers: 2
- Forks: 2
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Spark Submit Options
This is a guide for how to submit spark jobs through the various API options available and their pros and cons.
### Motivation
DSE Analytics jobs (both bulk and streaming) must be submitted to the cluster in order to be executed.
Depending on user requirements, there are a few different ways of doing so. This asset describes the pros and cons of each.
### What is included?
This field asset includes a simple batch analytics application and describes how to use the:
* `dse spark-submit`
* Undocumented Spark RESTFUL API
To submit it to the cluster. Other alternatives for execution include
* `dse spark-client`
* spark job server
### Business Take Aways
In analytics use cases, Business stakeholders depend on timely and trackable runs of their business logic to satisfy analytical requirements. This asset helps their technical counterparts support these needs.
### Technical Take Aways
The preferred method for submitting spark applications (whether remotely using `dse client-tool` or from the cluster itself) is `dse spark-submit`. DSE takes care of setting environmental variables and identifying the Spark master for application submission automatically simplifying availability requirements of spark applications.
However, some users require the ability to submit spark jobs remotely via REST. In these cases, customers often find out about job server and incur the complexity that goes along with job server to achieve REST submission. In some cases the undocumented Spark REST api is sufficient to meet the requirement.
However, be aware that the Spark REST API is not supported by DataStax or by
the spark community.
Note: Starting with DSE 5.1 we are able to automatically find the Master for job submissions and allow the selection of a local datacenter for a spark job by using the `dse://` syntax in the `.master` property. The implementation of `dse://` has broken compatibility with the Spark RESTful API in 5.1.0.