https://github.com/jplane/pyspark-devcontainer
A simple VS Code devcontainer setup for local PySpark development
https://github.com/jplane/pyspark-devcontainer
devcontainer devcontainers jupyter jupyter-notebooks pyspark pyspark-notebook python spark vscode
Last synced: 7 days ago
JSON representation
A simple VS Code devcontainer setup for local PySpark development
- Host: GitHub
- URL: https://github.com/jplane/pyspark-devcontainer
- Owner: jplane
- Created: 2023-03-09T20:18:55.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-11T16:26:45.000Z (almost 2 years ago)
- Last Synced: 2025-03-29T09:04:54.746Z (28 days ago)
- Topics: devcontainer, devcontainers, jupyter, jupyter-notebooks, pyspark, pyspark-notebook, python, spark, vscode
- Language: Jupyter Notebook
- Homepage:
- Size: 318 KB
- Stars: 48
- Watchers: 2
- Forks: 29
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Local PySpark dev environment
This repo provides everything needed for a self-contained, local PySpark 1-node "cluster" running on your laptop, including a Jupyter notebook environment.
It uses [Visual Studio Code](https://code.visualstudio.com/) and the [devcontainer feature](https://code.visualstudio.com/docs/devcontainers/containers) to run the Spark/Jupyter server in Docker, connected to a VS Code dev environment frontend.
## Requirements
- Install [Docker Desktop](https://www.docker.com/products/docker-desktop/) (you don't have to be a Docker super-expert :-))
- Install [Visual Studio Code](https://code.visualstudio.com/download)
- Install the [VS Code Remote Development pack](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack)
## Setup
1. Install required tools
1. Git clone this repo to your laptop
1. Open the local repo folder in VS Code
1. Open the [VS Code command palette](https://code.visualstudio.com/docs/getstarted/userinterface#_command-palette) and select/type 'Reopen in Container'
1. Wait while the devcontainer is built and initialized, this may take several minutes
1. Open [test.ipynb](./test.ipynb) in VS Code
1. If you get an HTTP warning, click 'Yes'

1. Wait a few moments for the Jupyter kernel to initialize... if after about 30 seconds or so the button on the upper-right still says 'Select Kernel', click that and select the option with 'ipykernel'


1. Run the first cell... it will take a few seconds to initialize the kernel and complete. You should see a message to browse to the Spark UI... click that for details of how your Spark session executes the work defined in your notebook on your 1-node Spark "cluster"

1. Run the remaining cells in the notebook, in order... see the output of cell 3

1. Have fun exploring [PySpark](https://sparkbyexamples.com/pyspark-tutorial/)!