https://github.com/dacort/spark-tweeter
I know ... you always wanted your Spark jobs to be able to tweet, right?
https://github.com/dacort/spark-tweeter
Last synced: about 1 month ago
JSON representation
I know ... you always wanted your Spark jobs to be able to tweet, right?
- Host: GitHub
- URL: https://github.com/dacort/spark-tweeter
- Owner: dacort
- License: mit
- Created: 2021-05-21T05:53:38.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-05-24T21:47:30.000Z (almost 4 years ago)
- Last Synced: 2025-02-16T04:33:30.067Z (3 months ago)
- Language: Go
- Size: 36.1 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spark-tweeter
A demo sidecar container for use with [EMR on EKS pod templates](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/pod-templates.html).

## Prerequisites
- A Twitter app and associated user credentials
## Usage
- Create a secret in your EKS cluster with your Twitter credentials
```shell
kubectl create secret generic -n emr-jobs twitter-creds \
--from-literal=consumer_key=${CONSUMER_KEY} \
--from-literal=consumer_secret=${CONSUMER_SECRET} \
--from-literal=access_token=${ACCESS_TOKEN} \
--from-literal=access_token_secret=${ACCESS_TOKEN_SECRET}
```- Create a pod template file and upload to S3
_Note that the pod template file mounts two EMR volumes that contain logs and a heartbeat file._
```yaml
# tweetcar.yaml
apiVersion: v1
kind: Pod
spec:
containers:
- name: side-car-tweeter
image: ghcr.io/dacort/spark-tweeter:latest
env:
- name: CONSUMER_KEY
valueFrom:
secretKeyRef:
name: twitter-creds
key: consumer_key
- name: CONSUMER_SECRET
valueFrom:
secretKeyRef:
name: twitter-creds
key: consumer_secret
- name: ACCESS_TOKEN
valueFrom:
secretKeyRef:
name: twitter-creds
key: access_token
- name: ACCESS_TOKEN_SECRET
valueFrom:
secretKeyRef:
name: twitter-creds
key: access_token_secret
- name: EMR_COMMS_MOUNT
value: /var/log/fluentd
resources: {}
volumeMounts:
- name: emr-container-application-log-dir
mountPath: /var/log/spark/user
- name: emr-container-communicate
mountPath: /var/log/fluentd
``````shell
aws s3 cp tweetcar.yaml s3:///pod_templates/tweetcar.yaml
```- Run your Spark job with this sidecar mounted on the Driver
```shell
aws emr-containers start-job-run \
--virtual-cluster-id ${EMR_EKS_CLUSTER_ID} \
--name dacort-tweeter \
--execution-role-arn ${EMR_EKS_EXECUTION_ARN} \
--release-label emr-5.33.0-latest \
--job-driver '{
"sparkSubmitJobDriver": {
"entryPoint": "s3://'${S3_BUCKET}'/code/pyspark/windy_city.py",
"sparkSubmitParameters": "--conf spark.executor.instances=20 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1 --conf spark.kubernetes.driver.podTemplateFile=s3://'${S3_BUCKET}'/pod_templates/tweetcar.yaml"
}
}'
```