Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dhanush-nferx/spark-on-k8s

Deployment of Apache Spark on Kubernetes.
https://github.com/dhanush-nferx/spark-on-k8s

Last synced: 12 days ago
JSON representation

Deployment of Apache Spark on Kubernetes.

Awesome Lists containing this project

README

        

# Apache Spark on Kubernetes: Step-by-Step Setup and Troubleshooting
**This guide provides a comprehensive step-by-step approach to setting up Apache Spark on a Kubernetes cluster. It includes both the installation and common troubleshooting steps to ensure a smooth deployment.**

### Prerequisites

1. Kubernetes cluster is already set up. (Refer to our previous [guide](https://github.com/dhanush-nferx/vgrt-fusion-k8s) for Kubernetes setup if needed)
2. Helm is installed and configured on your local machine.
3. Docker is installed and running.

## Step 1: Add Helm Repository

First, add the Bitnami Helm repository to your Helm configuration.

```
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
```

## Step 2: Create Namespace for Spark

Create a dedicated namespace for the Spark deployment.

```
kubectl create namespace spark
```

## Step 3: Use the Provided Values File

Use the values.yaml file from the repository for custom configuration to deploy Apache Spark.

> [!NOTE]
> imagePullSecrets is necessary if you are using a private Docker registry or need to authenticate with Docker Hub.

## Step 4: Install Apache Spark with Helm

Use the Helm chart to install Apache Spark on your Kubernetes cluster.

```
helm install spark bitnami/spark --namespace spark -f values.yaml
```

## Step 5: Verify the Deployment

Check if the Spark master and workers are running correctly.

```
kubectl get pods -n spark
```

You should see output similar to:

image

## Step 6: Access the Spark UI

**Port-Forward the Spark Master UI**

Since the Spark master UI is exposed on port 80 by default, use port-forwarding to access it locally.

```
kubectl port-forward svc/spark-master-svc -n spark 8080:80
```

Open your browser and navigate to http://localhost:8080 to access the Spark UI.

![image](https://github.com/user-attachments/assets/fb4e6717-bc7b-4ae2-9c42-3a90ffe04d18)

# Troubleshooting Common Issues and Solutions

**Issue: Pod Scheduling Failures**

If pods are not scheduling, you might see errors related to taints and node affinity.

1. Remove Unnecessary Taints:
```
kubectl taint nodes controlplane node-role.kubernetes.io/control-plane-
```
2. Verify Node Labels:

Ensure node labels match the nodeSelector in your Helm values file.

You can label your nodes with:

```
kubectl label nodes controlplane kubernetes.io/hostname=controlplane
kubectl label nodes node01 kubernetes.io/hostname=node01
kubectl label nodes node02 kubernetes.io/hostname=node02
```

**Issue: Spark UI Not Accessible**

If the Spark UI is not accessible, ensure the correct port is used:

1. Verify Service Ports:

Check the Spark master service ports:

```
kubectl get services -n spark
```

2. Port-Forward Correctly:

Use the correct service port for port-forwarding:

```
kubectl port-forward svc/spark-master-svc -n spark 8080:80
```

***By following this guide, you should have a fully functional Apache Spark setup on your Kubernetes cluster.***

> [!IMPORTANT]
> ***If you have any queries, post a comment and we will look into it.***