Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mozilla/jupyter-spark
Jupyter Notebook extension for Apache Spark integration
https://github.com/mozilla/jupyter-spark
Last synced: 4 months ago
JSON representation
Jupyter Notebook extension for Apache Spark integration
- Host: GitHub
- URL: https://github.com/mozilla/jupyter-spark
- Owner: mozilla
- License: mpl-2.0
- Archived: true
- Created: 2016-01-27T15:38:50.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2020-12-01T19:43:11.000Z (about 4 years ago)
- Last Synced: 2024-09-19T02:26:27.056Z (4 months ago)
- Language: JavaScript
- Size: 274 KB
- Stars: 193
- Watchers: 34
- Forks: 34
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-jupyter-resources - GitHub - 48% open · ⏱️ 01.12.2020): (Jupyter拓展)
README
# jupyter-spark
[![No Maintenance Intended](http://unmaintained.tech/badge.svg)](http://unmaintained.tech/)
[![Build Status](https://travis-ci.org/mozilla/jupyter-spark.svg?branch=master)](https://travis-ci.org/mozilla/jupyter-spark)
[![codecov](https://codecov.io/gh/mozilla/jupyter-spark/branch/master/graph/badge.svg)](https://codecov.io/gh/mozilla/jupyter-spark)**NOTE: This project is currently unmaintained, if anyone would like to take over maintenance please [let us know](https://github.com/mozilla/jupyter-spark/issues/55).**
Jupyter Notebook extension for Apache Spark integration.
Includes a progress indicator for the current Notebook cell if it invokes a
Spark job. Queries the Spark UI service on the backend to get the required
Spark job information.![Alt text](/screenshots/ProgressBar.png?raw=true "Spark progress bar")
To view all currently running jobs, click the "show running Spark jobs"
button, or press ```Alt+S```.![Alt text](/screenshots/SparkButton.png?raw=true "show running Spark jobs button")
![Alt text](/screenshots/Dialog.png?raw=true "Spark dialog")
A proxied version of the Spark UI can be accessed at
http://localhost:8888/spark.## Installation
To install, simply run:
```
pip install jupyter-spark
jupyter serverextension enable --py jupyter_spark
jupyter nbextension install --py jupyter_spark
jupyter nbextension enable --py jupyter_spark
jupyter nbextension enable --py widgetsnbextension
```The last step is needed to enable the `widgetsnbextension` extension that
Jupyter-Spark depends on. It may have been enabled before by a different
extension.You may want to append ``--user`` to the commands above if you're getting
configuration errors upon invoking them.To double-check if the extension was correctly installed run:
```
jupyter nbextension list
jupyter serverextension list
```Pleaes feel free to install [lxml](http://lxml.de/) as well to improve
performance of the server side communication to Spark using your favorite
package manager, e.g.:```
pip install lxml
```For development and testing, clone the project and run from a shell in the
project's root directory:```
pip install -e .
jupyter serverextension enable --py jupyter_spark
jupyter nbextension install --py jupyter_spark
jupyter nbextension enable --py jupyter_spark
```To uninstall the extension run:
```
jupyter serverextension disable --py jupyter_spark
jupyter nbextension disable --py jupyter_spark
jupyter nbextension uninstall --py jupyter_spark
pip uninstall jupyter-spark
```## Configuration
To change the URL of the Spark API that the job metadata is fetched from
override the `Spark.url` config value, e.g. on the command line:```
jupyter notebook --Spark.url="http://localhost:4040"
```## Example
There is a simple `pyspark` example included in `examples` to confirm that your
installation is working.## Changelog
### 0.3.0 (2016-07-04)
- Rewrote proxy to use an async Tornado handler and HTTP client to fetch
responses from Spark.- Simplified proxy processing to take Amazon EMR proxying into account
- Extended test suite to cover proxy handler, too.
- Removed requests as a dependency.
### 0.2.0 (2016-06-30)
- Refactored to fix a bunch of Python packaging and code quality issues
- Added test suite for Python code
- Set up continuous integration: https://travis-ci.org/mozilla/jupyter-spark
- Set up code coverage reports: https://codecov.io/gh/mozilla/jupyter-spark
- Added ability to override Spark API URL via command line option
- **IMPORTANT** Requires manual step to enable after running pip install
(see installation docs)!To update:
1. Run `pip uninstall jupyter-spark`
2. Delete `spark.js` from your `nbextensions` folder.
3. Delete any references to `jupyter_spark.spark` in
`jupyter_notebook_config.json` (in your .jupyter directory)
4. Delete any references to `spark` in `notebook.json`
(in .jupyter/nbconfig)
5. Follow installation instructions to reinstall### 0.1.1 (2016-05-03)
- Initial release with a working prototype