Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dhanush-nferx/spark-on-k8s
Deployment of Apache Spark on Kubernetes.
https://github.com/dhanush-nferx/spark-on-k8s
Last synced: 12 days ago
JSON representation
Deployment of Apache Spark on Kubernetes.
- Host: GitHub
- URL: https://github.com/dhanush-nferx/spark-on-k8s
- Owner: dhanush-nferx
- Created: 2024-07-24T09:18:36.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-07-24T10:43:53.000Z (6 months ago)
- Last Synced: 2024-11-11T02:27:29.537Z (2 months ago)
- Size: 10.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Apache Spark on Kubernetes: Step-by-Step Setup and Troubleshooting
**This guide provides a comprehensive step-by-step approach to setting up Apache Spark on a Kubernetes cluster. It includes both the installation and common troubleshooting steps to ensure a smooth deployment.**### Prerequisites
1. Kubernetes cluster is already set up. (Refer to our previous [guide](https://github.com/dhanush-nferx/vgrt-fusion-k8s) for Kubernetes setup if needed)
2. Helm is installed and configured on your local machine.
3. Docker is installed and running.## Step 1: Add Helm Repository
First, add the Bitnami Helm repository to your Helm configuration.
```
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
```## Step 2: Create Namespace for Spark
Create a dedicated namespace for the Spark deployment.
```
kubectl create namespace spark
```## Step 3: Use the Provided Values File
Use the values.yaml file from the repository for custom configuration to deploy Apache Spark.
> [!NOTE]
> imagePullSecrets is necessary if you are using a private Docker registry or need to authenticate with Docker Hub.## Step 4: Install Apache Spark with Helm
Use the Helm chart to install Apache Spark on your Kubernetes cluster.
```
helm install spark bitnami/spark --namespace spark -f values.yaml
```## Step 5: Verify the Deployment
Check if the Spark master and workers are running correctly.
```
kubectl get pods -n spark
```You should see output similar to:
## Step 6: Access the Spark UI
**Port-Forward the Spark Master UI**
Since the Spark master UI is exposed on port 80 by default, use port-forwarding to access it locally.
```
kubectl port-forward svc/spark-master-svc -n spark 8080:80
```Open your browser and navigate to http://localhost:8080 to access the Spark UI.
![image](https://github.com/user-attachments/assets/fb4e6717-bc7b-4ae2-9c42-3a90ffe04d18)
# Troubleshooting Common Issues and Solutions
**Issue: Pod Scheduling Failures**
If pods are not scheduling, you might see errors related to taints and node affinity.
1. Remove Unnecessary Taints:
```
kubectl taint nodes controlplane node-role.kubernetes.io/control-plane-
```
2. Verify Node Labels:
Ensure node labels match the nodeSelector in your Helm values file.You can label your nodes with:
```
kubectl label nodes controlplane kubernetes.io/hostname=controlplane
kubectl label nodes node01 kubernetes.io/hostname=node01
kubectl label nodes node02 kubernetes.io/hostname=node02
```
**Issue: Spark UI Not Accessible**If the Spark UI is not accessible, ensure the correct port is used:
1. Verify Service Ports:
Check the Spark master service ports:```
kubectl get services -n spark
```
2. Port-Forward Correctly:
Use the correct service port for port-forwarding:```
kubectl port-forward svc/spark-master-svc -n spark 8080:80
```***By following this guide, you should have a fully functional Apache Spark setup on your Kubernetes cluster.***
> [!IMPORTANT]
> ***If you have any queries, post a comment and we will look into it.***