Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/ploomber/soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
https://github.com/ploomber/soorgeon

data-engineering data-science jupyter jupyter-notebooks machine-learning mlops workflow

Last synced: 3 months ago
JSON representation

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊

Lists

README

        

# Soorgeon

> [!TIP]
> Deploy AI apps for free on [Ploomber Cloud!](https://ploomber.io/?utm_medium=github&utm_source=soorgeon)


Join our community
|
Newsletter
|
Contact us
|
Blog
|
Website
|
YouTube

![header](_static/header.png)

Convert monolithic Jupyter notebooks into [Ploomber](https://github.com/ploomber/ploomber) pipelines.

https://user-images.githubusercontent.com/989250/150660392-559eca67-b630-4ef2-b660-4f5ddb5a8d65.mp4

[3-minute video tutorial](https://www.youtube.com/watch?v=EJecqsZBr3Q).

*Note: Soorgeon is in alpha, [help us make it better](CONTRIBUTING.md).*

## Install

*Compatible with Python 3.7 and higher.*

```sh
pip install soorgeon
```

## Usage

### [Optional] Testing if the notebook runs

Before refactoring, you can optionally test if the original notebook or script runs without exceptions:

```sh
# works with ipynb files
soorgeon test path/to/notebook.ipynb

# and notebooks in percent format
soorgeon test path/to/notebook.py
```

Optionally, set the path to the output notebook:

```sh
soorgeon test path/to/notebook.ipynb path/to/output.ipynb

soorgeon test path/to/notebook.py path/to/output.ipynb
```

### Refactoring

To refactor your notebook:

```sh
# refactor notebook
soorgeon refactor nb.ipynb

# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet

# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory

# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py

# use alternative serializer (cloudpickle or dill) if notebook
# contains variables that cannot be serialized using pickle
soorgeon refactor nb.ipynb --serializer cloudpickle
soorgeon refactor nb.ipynb --serializer dill
```

To learn more, check out our [guide](doc/guide.md).

### Cleaning

Soorgeon has a `clean` command that applies
[black](https://github.com/psf/black) for `.ipynb` and `.py` files:

```
soorgeon clean path/to/notebook.ipynb
```

or

```
soorgeon clean path/to/script.py
```

## Linting

Soorgeon has a `lint` command that can apply [flake8]:

```
soorgeon lint path/to/notebook.ipynb
```

or

```
soorgeon lint path/to/script.py
```

## Examples

```sh
git clone https://github.com/ploomber/soorgeon
```

Exploratory data analysis notebook:

```sh
cd soorgeon/examples/exploratory
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build
```

Machine learning notebook:

```sh
cd soorgeon/examples/machine-learning
soorgeon refactor nb.ipynb

# to run the pipeline
pip install -r requirements.txt
ploomber build
```

To learn more, check out our [guide](doc/guide.md).

## Community

* [Join us on Slack](https://ploomber.io/community)
* [Newsletter](https://www.getrevue.co/profile/ploomber)
* [YouTube](https://www.youtube.com/channel/UCaIS5BMlmeNQE4-Gn0xTDXQ)
* [Contact the development team](mailto:[email protected])

## About Ploomber

Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.

Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!

[Click here to know how you can contribute to Ploomber.](https://github.com/ploomber/contributing/blob/main/README.md)

## Telemetry

We collect anonymous statistics to understand and improve usage. For details, [see here](https://docs.ploomber.io/en/latest/community/user-stats.html)