{"id":13608816,"url":"https://github.com/itsjafer/jupyterlab-sparkmonitor","last_synced_at":"2025-04-13T06:37:38.778Z","repository":{"id":42520808,"uuid":"246929828","full_name":"itsjafer/jupyterlab-sparkmonitor","owner":"itsjafer","description":"JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook","archived":false,"fork":false,"pushed_at":"2022-12-27T15:34:10.000Z","size":4279,"stargazers_count":92,"open_issues_count":12,"forks_count":23,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-03-26T23:06:44.218Z","etag":null,"topics":["apache-spark","jupyter","jupyter-lab","jupyterlab","jupyterlab-extension","pyspark","spark"],"latest_commit_sha":null,"homepage":"https://krishnan-r.github.io/sparkmonitor/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/itsjafer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-12T20:59:01.000Z","updated_at":"2024-11-06T08:51:20.000Z","dependencies_parsed_at":"2023-01-31T05:30:51.616Z","dependency_job_id":null,"html_url":"https://github.com/itsjafer/jupyterlab-sparkmonitor","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsjafer%2Fjupyterlab-sparkmonitor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsjafer%2Fjupyterlab-sparkmonitor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsjafer%2Fjupyterlab-sparkmonitor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itsjafer%2Fjupyterlab-sparkmonitor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/itsjafer","download_url":"https://codeload.github.com/itsjafer/jupyterlab-sparkmonitor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248675353,"owners_count":21143763,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","jupyter","jupyter-lab","jupyterlab","jupyterlab-extension","pyspark","spark"],"created_at":"2024-08-01T19:01:30.204Z","updated_at":"2025-04-13T06:37:38.735Z","avatar_url":"https://github.com/itsjafer.png","language":"JavaScript","funding_links":[],"categories":["JupyterLab扩展"],"sub_categories":[],"readme":"# Spark Monitor - An extension for Jupyter Lab\n\nThis project was originally written by krishnan-r as a Google Summer of Code project for Jupyter Notebook. [Check his website out here.](https://krishnan-r.github.io/sparkmonitor/)\n\nAs a part of my internship as a Software Engineer at Yelp, I created this fork to update the extension to be compatible with JupyterLab - Yelp's choice for sharing and collaborating on notebooks.\n\n## About\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ca href=\"http://jupyter.org/\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/6822941/29750386-872556fe-8b5c-11e7-95e1-42b12d709017.png\" height=\"50\"/\u003e\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cb\u003e+\u003c/b\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=\"https://spark.apache.org/\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/6822941/29750352-e9807b36-8b5b-11e7-929a-249f56c7cf79.png\" height=\"80\"/\u003e\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003cb\u003e=\u003c/b\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=\"https://user-images.githubusercontent.com/6822941/29601568-d5e42934-87f9-11e7-9780-3cd3a0d8d86b.png\" title=\"The SparkMonitor Extension.\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/6822941/29601568-d5e42934-87f9-11e7-9780-3cd3a0d8d86b.png\" height=\"80\"/\u003e\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\nSparkMonitor is an extension for Jupyter Lab that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface itself. \u003cbr\u003e\n\n---\n\n![jobdisplay](https://user-images.githubusercontent.com/6822941/29753710-ff8849b6-8b94-11e7-8f9c-bdc59bf72143.gif)\n\n### Requirements\n\n-   At least JupyterLab 3\n-   pyspark 3.X.X or newer (For compatibility with older pyspark versions, use jupyterlab-sparkmonitor 3.X)\n\n## Features\n\n-   Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook\n-   A table of jobs and stages with progressbars\n-   A timeline which shows jobs, stages, and tasks\n-   A graph showing number of active tasks \u0026 executor cores vs time\n-   A notebook server extension that proxies the Spark UI and displays it in an iframe popup for more details\n-   For a detailed list of features see the use case [notebooks](https://krishnan-r.github.io/sparkmonitor/#common-use-cases-and-tests)\n-   Support for multiple SparkSessions (default port is 4040)\n-   [How it Works](https://krishnan-r.github.io/sparkmonitor/how.html)\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ca href=\"https://user-images.githubusercontent.com/6822941/29601990-d6256a1e-87fb-11e7-94cb-b4418c61d221.png\" title=\"Jobs and stages started from a cell.\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/6822941/29601990-d6256a1e-87fb-11e7-94cb-b4418c61d221.png\"\u003e\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=\"https://user-images.githubusercontent.com/6822941/29601769-d8e82a26-87fa-11e7-9b0e-91b1414e7821.png\" title=\"A graph of the number of active tasks and available executor cores.\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/6822941/29601769-d8e82a26-87fa-11e7-9b0e-91b1414e7821.png\" \u003e\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=\"https://user-images.githubusercontent.com/6822941/29601776-d919dae4-87fa-11e7-8939-a6c0d0072d90.png\" title=\"An event timeline with jobs, stages and tasks across various executors. The tasks are split into various coloured phases, providing insight into the nature of computation.\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/6822941/29601776-d919dae4-87fa-11e7-8939-a6c0d0072d90.png\"\u003e\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd\u003e\u003ca href=\"https://user-images.githubusercontent.com/6822941/29750236-be1f6b0c-8b59-11e7-9a36-92e04e3bf05b.png\" title=\"The Spark web UI as a popup within the notebook interface.\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/6822941/29750236-be1f6b0c-8b59-11e7-9a36-92e04e3bf05b.png\" \u003e\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=\"https://user-images.githubusercontent.com/6822941/29750177-ea2c18b8-8b58-11e7-955e-69ecf33a6284.png\" title=\"Details of a task.\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/6822941/29750177-ea2c18b8-8b58-11e7-955e-69ecf33a6284.png\" \u003e\u003c/a\u003e\u003c/td\u003e\n\u003ctd\u003e\u003ca href=\"https://user-images.githubusercontent.com/6822941/29601997-d6533840-87fb-11e7-90ce-daa0fe73b9e5.png\" title=\"An event timeline.\"\u003e\u003cimg src=\"https://user-images.githubusercontent.com/6822941/29601997-d6533840-87fb-11e7-90ce-daa0fe73b9e5.png\"\u003e\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n## Quick Start\n\n### To do a quick test of the extension\n\nThis docker image has pyspark and several other related packages installed alongside the sparkmonitor extension.\n\n```bash\ndocker run -it -p 8888:8888 itsjafer/sparkmonitor\n```\n\n### Setting up the extension\n\n```bash\npip install jupyterlab-sparkmonitor # install the extension\n\n# set up ipython profile and add our kernel extension to it\nipython profile create --ipython-dir=.ipython\necho \"c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')\" \u003e\u003e  .ipython/profile_default/ipython_config.py\n\n# run jupyter lab\nIPYTHONDIR=.ipython jupyter lab --watch\n```\n\nWith the extension installed, a SparkConf object called `conf` will be usable from your notebooks. You can use it as follows:\n\n```python\nfrom pyspark import SparkContext\n\n# start the spark context using the SparkConf the extension inserted\nsc=SparkContext.getOrCreate(conf=conf) #Start the spark context\n\n# Monitor should spawn under the cell with 4 jobs\nsc.parallelize(range(0,100)).count()\nsc.parallelize(range(0,100)).count()\nsc.parallelize(range(0,100)).count()\nsc.parallelize(range(0,100)).count()\n```\n\nIf you already have your own spark configuration, you will need to set `spark.extraListeners` to `sparkmonitor.listener.JupyterSparkMonitorListener` and `spark.driver.extraClassPath` to the path to the sparkmonitor python package `path/to/package/sparkmonitor/listener.jar`\n\n```python\nfrom pyspark.sql import SparkSession\nspark = SparkSession.builder\\\n        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\\\n        .config('spark.driver.extraClassPath', 'venv/lib/python3.7/site-packages/sparkmonitor/listener.jar')\\\n        .getOrCreate()\n\n# should spawn 4 jobs in a monitor bnelow the cell\nspark.sparkContext.parallelize(range(0,100)).count()\nspark.sparkContext.parallelize(range(0,100)).count()\nspark.sparkContext.parallelize(range(0,100)).count()\nspark.sparkContext.parallelize(range(0,100)).count()\n```\n\n## Changelog\n\n* 1.0 - Initial Release\n* 2.0 - Migration to JupyterLab 2, Multiple Spark Sessions, and displaying monitors beneath the correct cell more accurately\n* 3.0 - Migrate to JupyterLab 3 as prebuilt extension\n* 4.0 - pyspark 3.X Compatibility; no longer compatible with PySpark 2.X or under\n\n## Development\n\nIf you'd like to develop the extension:\n\n```bash\nmake all # Clean the directory, build the extension, and run it locally\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitsjafer%2Fjupyterlab-sparkmonitor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fitsjafer%2Fjupyterlab-sparkmonitor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitsjafer%2Fjupyterlab-sparkmonitor/lists"}