{"id":20101598,"url":"https://github.com/ploomber/soorgeon","last_synced_at":"2025-04-10T03:55:34.013Z","repository":{"id":37004193,"uuid":"427500064","full_name":"ploomber/soorgeon","owner":"ploomber","description":"Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊","archived":true,"fork":false,"pushed_at":"2024-09-18T20:56:06.000Z","size":529,"stargazers_count":78,"open_issues_count":15,"forks_count":20,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-10T03:55:28.741Z","etag":null,"topics":["data-engineering","data-science","jupyter","jupyter-notebooks","machine-learning","mlops","workflow"],"latest_commit_sha":null,"homepage":"https://ploomber.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ploomber.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-12T21:24:29.000Z","updated_at":"2025-02-18T22:07:14.000Z","dependencies_parsed_at":"2024-01-15T20:54:44.604Z","dependency_job_id":"1d71cd0c-ede4-4d7b-ae55-0f282ded6f5e","html_url":"https://github.com/ploomber/soorgeon","commit_stats":{"total_commits":272,"total_committers":14,"mean_commits":"19.428571428571427","dds":"0.20220588235294112","last_synced_commit":"a9153d43b904f072d1634b4ed27f07602e48feb5"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ploomber%2Fsoorgeon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ploomber%2Fsoorgeon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ploomber%2Fsoorgeon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ploomber%2Fsoorgeon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ploomber","download_url":"https://codeload.github.com/ploomber/soorgeon/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248155002,"owners_count":21056542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","data-science","jupyter","jupyter-notebooks","machine-learning","mlops","workflow"],"created_at":"2024-11-13T17:25:55.221Z","updated_at":"2025-04-10T03:55:33.993Z","avatar_url":"https://github.com/ploomber.png","language":"Python","funding_links":[],"categories":["Simplification Tools"],"sub_categories":[],"readme":"# Soorgeon\n\n\u003e [!TIP]\n\u003e Deploy AI apps for free on [Ploomber Cloud!](https://ploomber.io/?utm_medium=github\u0026utm_source=soorgeon)\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://ploomber.io/community\"\u003eJoin our community\u003c/a\u003e\n  |\n  \u003ca href=\"https://share.hsforms.com/1E7Qa_OpcRPi_MV-segFsaAe6c2g\"\u003eNewsletter\u003c/a\u003e\n  |\n  \u003ca href=\"mailto:contact@ploomber.io\"\u003eContact us\u003c/a\u003e\n  |\n  \u003ca href=\"https://ploomber.io/\"\u003eBlog\u003c/a\u003e\n  |  \n  \u003ca href=\"https://www.ploomber.io\"\u003eWebsite\u003c/a\u003e\n  |\n  \u003ca href=\"https://www.youtube.com/channel/UCaIS5BMlmeNQE4-Gn0xTDXQ\"\u003eYouTube\u003c/a\u003e\n\u003c/p\u003e\n\n\n![header](_static/header.png)\n\nConvert monolithic Jupyter notebooks into [Ploomber](https://github.com/ploomber/ploomber) pipelines.\n\nhttps://user-images.githubusercontent.com/989250/150660392-559eca67-b630-4ef2-b660-4f5ddb5a8d65.mp4\n\n[3-minute video tutorial](https://www.youtube.com/watch?v=EJecqsZBr3Q).\n\n*Note: Soorgeon is in alpha, [help us make it better](CONTRIBUTING.md).*\n\n## Install\n\n*Compatible with Python 3.7 and higher.*\n\n```sh\npip install soorgeon\n```\n\n## Usage\n\n### [Optional] Testing if the notebook runs\n\nBefore refactoring, you can optionally test if the original notebook or script runs without exceptions:\n\n```sh\n# works with ipynb files\nsoorgeon test path/to/notebook.ipynb\n\n# and notebooks in percent format\nsoorgeon test path/to/notebook.py\n```\n\nOptionally, set the path to the output notebook:\n\n```sh\nsoorgeon test path/to/notebook.ipynb path/to/output.ipynb\n\nsoorgeon test path/to/notebook.py path/to/output.ipynb\n```\n\n### Refactoring\n\nTo refactor your notebook:\n\n```sh\n# refactor notebook\nsoorgeon refactor nb.ipynb\n\n# all variables with the df prefix are stored in csv files\nsoorgeon refactor nb.ipynb --df-format csv\n# all variables with the df prefix are stored in parquet files\nsoorgeon refactor nb.ipynb --df-format parquet\n\n# store task output in 'some-directory' (if missing, this defaults to 'output')\nsoorgeon refactor nb.ipynb --product-prefix some-directory\n\n# generate tasks in .py format\nsoorgeon refactor nb.ipynb --file-format py\n\n# use alternative serializer (cloudpickle or dill) if notebook \n# contains variables that cannot be serialized using pickle \nsoorgeon refactor nb.ipynb --serializer cloudpickle\nsoorgeon refactor nb.ipynb --serializer dill\n```\n\nTo learn more, check out our [guide](doc/guide.md).\n\n### Cleaning\n\nSoorgeon has a `clean` command that applies\n[black](https://github.com/psf/black) \u003c!--and [isort](https://github.com/PyCQA/isort)--\u003efor `.ipynb` and `.py` files:\n\n```\nsoorgeon clean path/to/notebook.ipynb\n```\n\nor\n\n```\nsoorgeon clean path/to/script.py\n```\n\n## Linting\n\nSoorgeon has a `lint` command that can apply [flake8]:\n\n```\nsoorgeon lint path/to/notebook.ipynb\n```\n\nor\n\n```\nsoorgeon lint path/to/script.py\n```\n\n## Examples\n\n```sh\ngit clone https://github.com/ploomber/soorgeon\n```\n\nExploratory data analysis notebook:\n\n```sh\ncd soorgeon/examples/exploratory\nsoorgeon refactor nb.ipynb\n\n# to run the pipeline\npip install -r requirements.txt\nploomber build\n```\n\nMachine learning notebook:\n\n```sh\ncd soorgeon/examples/machine-learning\nsoorgeon refactor nb.ipynb\n\n# to run the pipeline\npip install -r requirements.txt\nploomber build\n```\n\nTo learn more, check out our [guide](doc/guide.md).\n\n## Community\n\n* [Join us on Slack](https://ploomber.io/community)\n* [Newsletter](https://www.getrevue.co/profile/ploomber)\n* [YouTube](https://www.youtube.com/channel/UCaIS5BMlmeNQE4-Gn0xTDXQ)\n* [Contact the development team](mailto:contact@ploomber.io)\n\n\n## About Ploomber\n\nPloomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.\n\nWhatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!\n\n[Click here to know how you can contribute to Ploomber.](https://github.com/ploomber/contributing/blob/main/README.md)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fploomber%2Fsoorgeon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fploomber%2Fsoorgeon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fploomber%2Fsoorgeon/lists"}