{"id":17449517,"url":"https://github.com/jplane/pyspark-devcontainer","last_synced_at":"2025-04-19T14:55:46.277Z","repository":{"id":179152476,"uuid":"611920379","full_name":"jplane/pyspark-devcontainer","owner":"jplane","description":"A simple VS Code devcontainer setup for local PySpark development","archived":false,"fork":false,"pushed_at":"2023-07-11T16:26:45.000Z","size":326,"stargazers_count":48,"open_issues_count":1,"forks_count":29,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-29T09:04:54.746Z","etag":null,"topics":["devcontainer","devcontainers","jupyter","jupyter-notebooks","pyspark","pyspark-notebook","python","spark","vscode"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jplane.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-09T20:18:55.000Z","updated_at":"2025-02-28T11:09:55.000Z","dependencies_parsed_at":null,"dependency_job_id":"319975c4-2455-4573-8360-ab18465f3b77","html_url":"https://github.com/jplane/pyspark-devcontainer","commit_stats":null,"previous_names":["jplane/pyspark-devcontainer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jplane%2Fpyspark-devcontainer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jplane%2Fpyspark-devcontainer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jplane%2Fpyspark-devcontainer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jplane%2Fpyspark-devcontainer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jplane","download_url":"https://codeload.github.com/jplane/pyspark-devcontainer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249718710,"owners_count":21315094,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["devcontainer","devcontainers","jupyter","jupyter-notebooks","pyspark","pyspark-notebook","python","spark","vscode"],"created_at":"2024-10-17T21:42:00.147Z","updated_at":"2025-04-19T14:55:46.272Z","avatar_url":"https://github.com/jplane.png","language":"Jupyter Notebook","readme":"# Local PySpark dev environment\n\nThis repo provides everything needed for a self-contained, local PySpark 1-node \"cluster\" running on your laptop, including a Jupyter notebook environment.\n\nIt uses [Visual Studio Code](https://code.visualstudio.com/) and the [devcontainer feature](https://code.visualstudio.com/docs/devcontainers/containers) to run the Spark/Jupyter server in Docker, connected to a VS Code dev environment frontend.\n\n## Requirements\n\n- Install [Docker Desktop](https://www.docker.com/products/docker-desktop/) (you don't have to be a Docker super-expert :-))\n\n- Install [Visual Studio Code](https://code.visualstudio.com/download)\n\n- Install the [VS Code Remote Development pack](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack)\n\n## Setup\n\n1. Install required tools \n\n1. Git clone this repo to your laptop\n\n1. Open the local repo folder in VS Code\n\n1. Open the [VS Code command palette](https://code.visualstudio.com/docs/getstarted/userinterface#_command-palette) and select/type 'Reopen in Container'\n\n1. Wait while the devcontainer is built and initialized, this may take several minutes\n\n1. Open [test.ipynb](./test.ipynb) in VS Code\n\n1. If you get an HTTP warning, click 'Yes'\n\n    ![HTTP warning](./media/http_warning.png)\n\n1. Wait a few moments for the Jupyter kernel to initialize... if after about 30 seconds or so the button on the upper-right still says 'Select Kernel', click that and select the option with 'ipykernel'\n\n    ![Choose kernel](./media/select_kernel.png)\n\n    ![ipykernel](./media/ipykernel.png)\n\n1. Run the first cell... it will take a few seconds to initialize the kernel and complete. You should see a message to browse to the Spark UI... click that for details of how your Spark session executes the work defined in your notebook on your 1-node Spark \"cluster\"\n\n    ![job output](./media/view_spark_job.png)\n\n1. Run the remaining cells in the notebook, in order... see the output of cell 3\n\n    ![output](./media/output.png)\n\n1. Have fun exploring [PySpark](https://sparkbyexamples.com/pyspark-tutorial/)!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjplane%2Fpyspark-devcontainer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjplane%2Fpyspark-devcontainer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjplane%2Fpyspark-devcontainer/lists"}