https://github.com/bernhard-42/spark-yarn-rest-api
Demonstrates how to submit a job to Spark on HDP directly via YARN's REST API from any workstation
https://github.com/bernhard-42/spark-yarn-rest-api
Last synced: 7 months ago
JSON representation
Demonstrates how to submit a job to Spark on HDP directly via YARN's REST API from any workstation
- Host: GitHub
- URL: https://github.com/bernhard-42/spark-yarn-rest-api
- Owner: bernhard-42
- Created: 2016-04-10T13:14:32.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-04-18T11:54:53.000Z (over 9 years ago)
- Last Synced: 2025-02-28T22:35:14.132Z (7 months ago)
- Language: Python
- Size: 19.5 KB
- Stars: 24
- Watchers: 6
- Forks: 21
- Open Issues: 2
-
Metadata Files:
- Readme: Readme-Knox-Kerberos.md
Awesome Lists containing this project
README
## 1 Kerberize cluster
- Use an existing KDC or setup an MIT kdc
- Enable kerberos via Ambari: `http://:8080/#/main/admin/kerberos`.
- Prepare REST APIs for kerberos
- Follow ["2. Configuring HTTP Authentication for HDFS, YARN, MapReduce2, HBase, Oozie, Falcon and Storm"](http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.0/bk_Ambari_Security_Guide/content/_configuring_http_authentication_for_HDFS_YARN_MapReduce2_HBase_Oozie_Falcon_and_Storm.html) to enable kerberos for YARN REST API
- Set `yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled` to `true` in yarn-site.xml## 2 Connect Knox to LDAP
- Use existing LDAP Server or start Knox's ldap server (`/usr/hdp/current/knox-server/bin/ldap.sh start`)
- Add a new user to LDAP:
```bash
[root@LDAP-HOST ~]$ cat <>.ldif
dn: uid=<>,ou=people,dc=hadoop,dc=apache,dc=org
objectclass:top
objectclass:person
objectclass:organizationalPerson
objectclass:inetOrgPerson
cn: <>
sn: <>
uid: <>
userPassword:<>[root@LDAP-HOST ~]$ ldapadd -p 33389 -h localhost -W -D "uid=admin,ou=people,dc=hadoop,dc=apache,dc=org" -f <>.ldif
```## 3 Create a keytab for the user that should run the spark job
- Create keytab for <> for its primary name <>
```bash
[root@KDC-HOST ~]$ kadmin
kadmin> xst -k /etc/security/keytabs/<>.keytab <>@<>
```- Copy '/etc/security/keytabs/<>.keytab' to **every** machine on the cluster and set permissions:
```bash
[root@CLUSTER-HOST ~]$ chown <>:hadoop /etc/security/keytabs/<>.keytab
[root@CLUSTER-HOST ~]$ chmod 400 /etc/security/keytabs/<>.keytab
```- Test on every machine:
```bash
[root@CLUSTER-HOST ~]$ kinit <>@<> -k -t /etc/security/keytabs/<>.keytab
```There must be no password prompt!
```bash
[root@KDC-HOST ~]$ klist -l
# Principal name Cache name
# -------------- ----------
# <>@<> FILE:/tmp/krb5cc_1020
```## 4 Test connection from the workstation outside the cluster
- **HDFS** (should work without further configuration)
```bash
[MacBook simple-project]$ curl -s -k -u '<>:<>' \
https://$KNOX_SERVER:8443/gateway/default/webhdfs/v1/?op=GETFILESTATUS | jq .
# {
# "FileStatus": {
# "accessTime": 0,
# "blockSize": 0,
# "childrenNum": 9,
# "fileId": 16385,
# "group": "hdfs",
# "length": 0,
# "modificationTime": 1458070072105,
# "owner": "hdfs",
# "pathSuffix": "",
# "permission": "755",
# "replication": 0,
# "storagePolicy": 0,
# "type": "DIRECTORY"
# }
# }
```- **YARN**
```bash
[MacBook simple-project]$ curl -s -k -u '<>:<>' -d '' \
https://$KNOX_SERVER:8443/gateway/default/resourcemanager/v1/cluster/apps/new-application
# {
# "application-id": "application_1460654399208_0004",
# "maximum-resource-capability": {
# "memory": 8192,
# "vCores": 3
# }
# }
```## 5 Edit project.cfg
Set at least:
```bash
clusterKerberized = True
hdfsAccessPrincipal = <>@<>
hdfsAccessKeytab = /etc/security/keytabs/<>.keytab
```## 6 Submit job
```bash
[Mac-Book]$ python bin/spark-remote-submit.py
```