{"id":15297380,"url":"https://github.com/winvector/wvpy","last_synced_at":"2025-04-13T23:15:55.105Z","repository":{"id":40775625,"uuid":"158005992","full_name":"WinVector/wvpy","owner":"WinVector","description":"Tools to convert from Jupyter notebooks to and from Python .py files, and render.","archived":false,"fork":false,"pushed_at":"2024-02-16T19:47:55.000Z","size":8669,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-13T23:15:42.262Z","etag":null,"topics":["datascience","machine-learning","python","python3"],"latest_commit_sha":null,"homepage":"https://winvector.github.io/wvpy/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WinVector.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-17T17:02:34.000Z","updated_at":"2025-04-03T13:41:26.000Z","dependencies_parsed_at":"2024-01-16T03:45:26.636Z","dependency_job_id":"a7d977a5-481a-4f8e-a370-9f59a4e428ea","html_url":"https://github.com/WinVector/wvpy","commit_stats":{"total_commits":226,"total_committers":1,"mean_commits":226.0,"dds":0.0,"last_synced_commit":"c92aa12184cf854803853ba11cd798f017150d8a"},"previous_names":[],"tags_count":26,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2Fwvpy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2Fwvpy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2Fwvpy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WinVector%2Fwvpy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WinVector","download_url":"https://codeload.github.com/WinVector/wvpy/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248794568,"owners_count":21162615,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datascience","machine-learning","python","python3"],"created_at":"2024-09-30T19:16:59.828Z","updated_at":"2025-04-13T23:15:55.074Z","avatar_url":"https://github.com/WinVector.png","language":"HTML","readme":"\n\u003ca href=\"https://github.com/WinVector/wvpy\"\u003ewvpy\u003c/a\u003e tools for converting Jupyter notebooks to and from Python files.\n\nVideo tutorial here: \u003ca href=\"https://youtu.be/cQ-tCwD4moc\"\u003ehere\u003c/a\u003e.\n\nMany of the data science functions have been moved to wvu [https://github.com/WinVector/wvu](https://github.com/WinVector/wvu).\n\n\n\n\n\u003ca href=\"https://github.com/WinVector/wvpy\"\u003ewvpy\u003c/a\u003e is a very effective personal Jupyter workflow for data science development.\n\n\n\u003ca href=\"https://jupyter.org\"\u003eJupyter\u003c/a\u003e (nee IPython) workbooks are JSON documents that allow a data scientist to mix: code, markdown, results, images, and graphs. They are a great contribution to scientific reproducibility, as they can contain a number of steps that can all be re-run in batch. They serve a similar role to literate programming, SWEAVE, and rmarkdown/knitr. The main design difference is Jupyter notebooks do not separate specification from presentation, which causes a number of friction points. They are not legible without a tool (such as JupyterLab, Jupyter Notebook, Visual Studio Code, PyCharm, or other IDEs), they are fairly incompatible with source control (as they may contain images as binary blobs, and many of the tools alter the notebook on opening), and they make \u003ccode\u003egrep\u003c/code\u003eing/searching difficult.\n\nThe above issues are fortunately all \u003cem\u003einessential difficulties\u003c/em\u003e. Python is a very code-oriented work environment, so most tools expose a succinct programable interface. The tooling exposed by the Python packages \u003ca href=\"https://pypi.org/project/ipython/\"\u003eIPython\u003c/a\u003e, \u003ca href=\"https://pypi.org/project/nbformat/\"\u003enbformat\u003c/a\u003e, and \u003ca href=\"https://pypi.org/project/nbconvert/\"\u003enbconvert\u003c/a\u003e are very powerful and convenient. With only a little organizing code we were able to build a very powerful personal data science workflow that we have found works very well for clients.\n\nThe two features wvpy provides are:\n\n  * Converting Juypter notebooks to/from vanilla Python files. We have an article on the technique [here](https://win-vector.com/2022/08/20/an-effective-personal-jupyter-data-science-workflow/), and a short video demonstration [here](https://youtu.be/cQ-tCwD4moc).\n  * Running many Jupyter notebooks with many different inputs, or *parameterized*. We have a short video on the technique [here](w92jsKubNCk).\n\nThe first wvpy feature is: converting Jupyter notebooks (which are JSON files ending with a \u003ccode\u003e.ipynb\u003c/code\u003e suffix) to and from simple Python code that is more compatible with source control (such as Git). A video describing\n\nLet's start with a simple example Jupyter notebook: \u003ca href=\"https://github.com/WinVector/wvpy/blob/main/examples/worksheets/plot.ipynb\"\u003eplot.ipynb\u003c/a\u003e. If we install (using a shell such as bask, or zsh) \u003ca href=\"https://github.com/WinVector/wvpy\"\u003ewvpy\u003c/a\u003e \u003ca href=\"https://pypi.org/project/wvpy/\"\u003efrom PyPi\u003c/a\u003e.\n\n\u003ccode\u003e\n\u003cpre\u003e\npip install wvpy\n\u003c/pre\u003e\n\u003c/code\u003e\n\nAnd we download \u003ca href=\"https://github.com/WinVector/wvpy/blob/main/examples/worksheets/plot.ipynb\"\u003eplot.ipynb\u003c/a\u003e\n\n\u003ccode\u003e\n\u003cpre\u003e\nwget https://raw.githubusercontent.com/WinVector/wvpy/main/examples/worksheets/plot.ipynb\n\u003c/pre\u003e\n\u003c/code\u003e\n\nThen we can convert the Jupyter notebook to the Python formatted file as follows (we discuss this format a bit \u003ca href=\"https://win-vector.com/2022/04/30/separating-code-from-presentation-in-jupyter-notebooks/\"\u003ehere\u003c/a\u003e).\n\n\u003ccode\u003e\n\u003cpre\u003e\npython -m wvpy.pysheet --delete plot.ipynb\n\u003c/pre\u003e\n\u003c/code\u003e\n\nThe tool reports the steps it takes in the conversion.\n\n\u003ccode\u003e\n\u003cpre\u003e\nfrom \"plot.ipynb\" to \"plot.py\"\n   copying previous output target \"plot.py\" to \"plot.py~\"\n   converting Jupyter notebook \"plot.ipynb\" to Python \"plot.py\"\n   moving input plot.ipynb to plot.ipynb~\n\u003c/pre\u003e\n\u003c/code\u003e\n\nThe resulting Python file is shown \u003ca href=\"https://github.com/WinVector/wvpy/blob/main/examples/worksheets/plot.py\"\u003ehere\u003c/a\u003e. The idea is: the entire file is pure Python, with the non-python blocks in multi-line strings. This file has all results and meta-data stripped out, and a small amount of whitespace regularization. This \".py\" format is exactly the right format for source control, we get reliable and legible differences. In my personal practice I don't always check \".ipynb\" files in to source control, but only the matching \".py\" files. This discipline makes \u003ccode\u003egrep\u003c/code\u003eing and searching for items in the project as easy as finding items in code.\n\nIn the \".py\" file \"begin text\", \"end text\", and \"end code\" markers show where the Jupyter cell boundaries are. This allows reliable conversion from the \".py\" file back to a Jupyter notebook. PyCharm and others have a similar notebook representation strategy.\n\nWe can convert back from \".py\" to \".ipynb\" as follows.\n\n\u003ccode\u003e\n\u003cpre\u003e\npython -m wvpy.pysheet --delete plot\n\u003c/pre\u003e\n\u003c/code\u003e\n\n\u003ccode\u003e\n\u003cpre\u003e\nfrom \"plot.py\" to \"plot.ipynb\"\n   converting Python plot.py to Jupyter notebook plot.ipynb\n   moving input plot.py to plot.py~\n\u003c/pre\u003e\n\u003c/code\u003e\n\nNotice this time we did not specify the file suffix (the \".py\" or \".ipynb\"). The tooling checks that exactly one of these exists and converts one to another. This allows easy conversion back and forth reusing command history.\n\nEither form of the worksheet can be executed and rendered by HTML from the command line as follows.\n\n\u003ccode\u003e\n\u003cpre\u003e\npython -m wvpy.render_workbook plot\n\u003c/pre\u003e\n\u003c/code\u003e\n\n\u003ccode\u003e\n\u003cpre\u003e\nstart render_as_html \"plot.ipynb\"  2022-08-20 12:19:06.669369\n\tdone render_as_html \"plot.html\" 2022-08-20 12:19:10.080226\n\u003c/pre\u003e\n\u003c/code\u003e\n\nThis produces what we expect to see from a Jupyter notebook as a presentation.\n\n\u003cimg style=\"display:block; margin-left:auto; margin-right:auto;\" src=\"https://win-vector.com/wp-content/uploads/2022/08/Screen-Shot-2022-08-20-at-12.45.27-PM.png\" alt=\"Screen Shot 2022 08 20 at 12 45 27 PM\" title=\"Screen Shot 2022-08-20 at 12.45.27 PM.png\" border=\"0\" width=\"338\" height=\"565\" /\u003e\n\nThere is also an option in the HTML renderer that strips out input blocks. This isn't fully presentation ready, but it makes for very good in progress work reports.\n\n\u003ccode\u003e\n\u003cpre\u003e\npython -m wvpy.render_workbook --strip_input plot\n\u003c/pre\u003e\n\u003c/code\u003e\n\n\n\u003ccode\u003e\n\u003cpre\u003e\nstart render_as_html \"plot.ipynb\"  2022-08-20 12:19:35.251560\n\tdone render_as_html \"plot.html\" 2022-08-20 12:19:38.478107\n\u003c/pre\u003e\n\u003c/code\u003e\n\nThis gives a simplified output as below.\n\n\n\u003cimg style=\"display:block; margin-left:auto; margin-right:auto;\" src=\"https://win-vector.com/wp-content/uploads/2022/08/Screen-Shot-2022-08-20-at-12.43.40-PM.png\" alt=\"Screen Shot 2022 08 20 at 12 43 40 PM\" title=\"Screen Shot 2022-08-20 at 12.43.40 PM.png\" border=\"0\" width=\"335\" height=\"465\" /\u003e\n\nFor already executed sheets one would use the standard Juypter supplied command \u003ccode\u003ejupyter nbconvert --to html plot.ipynb\u003c/code\u003e, the merit of the rendering here is parameterization of notebooks and stripping of input and prompt ids. The strategy here is to be lightweight stand-alone, and not a plug in such as the strategy pursued by \u003ca href=\"https://github.com/mwouts/jupytext\"\u003ejupytext\u003c/a\u003e or \u003ca href=\"https://www.fast.ai/2022/07/28/nbdev-v2/\"\u003enbdev\u003c/a\u003e, or targeting fully camera ready reports via \u003ca href=\"https://www.fast.ai/2022/07/28/nbdev-v2/\"\u003eQuarto\u003c/a\u003e. We feel the \u003ca href=\"https://github.com/WinVector/wvpy\"\u003ewvpy\u003c/a\u003e approach maximizes productivity during development, with minimal plug-in and install burdens.\n\nWe also supply a \u003ca href=\"https://github.com/WinVector/wvpy/blob/main/pkg/wvpy/jtools.py#L281\"\u003esimple class for holding render tasks\u003c/a\u003e, including inserting arbitrary initialization code for each run. This makes it very easy to render the same Jupyter workbook for different targets (say the same analysis for each city in a state) and even parallelize the rendering using standard Python tools such as \u003ccode\u003emultiprocessing.Pool\u003c/code\u003e. This parameterized running allows simple management of fairly large projects. If we need to run a great many variations of a notebook we use the \u003ca href=\"https://github.com/WinVector/wvpy/blob/main/pkg/wvpy/jtools.py#L281\"\u003eJTask container\u003c/a\u003e and either a for loop or \u003ccode\u003emultiprocessing.Pool\u003c/code\u003e over the tasks in Python (remember, when we have Python we don't have to perform all steps at the GUI or even in a shell!). A small example of the method is found \u003ca href=\"https://github.com/WinVector/wvpy/tree/main/examples/param_worksheet\"\u003ehere\u003c/a\u003e, where a single Jupyter notebook \u003ca href=\"https://github.com/WinVector/wvpy/blob/main/examples/param_worksheet/ParamExample.ipynb\"\u003eParamExample.ipynb\u003c/a\u003e is used by \u003ca href=\"https://github.com/WinVector/wvpy/blob/main/examples/param_worksheet/run_examples.py\"\u003erun_examples.py\u003c/a\u003e to produce the multiple per-date HTML, PDF, and PNG files found in the \u003ca href=\"https://github.com/WinVector/wvpy/tree/main/examples/param_worksheet\"\u003edirectory\u003c/a\u003e.\n\nWe have found the quickest development workflow is to work with the \".ipynb\" Jupyter notebooks (usually in Visual Studio Code, and settng any values that were supposed to come from the \u003ccode\u003ewvpy.render_workbook\u003c/code\u003e by hand after checking they are not set in \u003ccode\u003eglobals()\u003c/code\u003e). Then when the worksheet is working we convert it to \".py\" using \u003ccode\u003ewvpy.pysheet\u003c/code\u003e and check that in to source control. \n\nAs a side-note, I find Python is a developer first community, which is very refreshing. Capabilities (such as Jupyter, nbconvert, and nbformat) are released as code under generous open source licenses and documentation instead of being trapped in monolithic applications. This means one can take advantage of their capabilities using only a small amount of code. And under the mentioned assumption that Python is a developer first community, small amounts of code are considered easy integrations. wvpy is offered in the same spirit, it is available for use from PyPi \u003ca href=\"https://pypi.org/project/wvpy/\"\u003ehere\u003c/a\u003e under a BSD 3-clause License and has it code available here for re-use or adaption \u003ca href=\"https://github.com/WinVector/wvpy\"\u003ehere\u003c/a\u003e under the same license. It isn't a big project, but it has made working on client projects and teaching data science a bit easier for me.\n\n\u003ca href=\"https://win-vector.com\"\u003eWin Vector LLC\u003c/a\u003e will be offering private (and hopefully someday public) training on the work flow (including notebook parameterization to run many jobs from a single source, use of \u003ccode\u003emultiprocessing.Pool\u003c/code\u003e for speedup, and \u003ccode\u003eIPython.display.display; IPython.display.Markdown\u003c/code\u003e for custom results) going forward.\n\nSome more examples and videos can be found at: https://github.com/WinVector/wvpy/tree/main/examples/declare_variables and https://github.com/WinVector/wvpy/tree/main/examples/declare_variables .\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwinvector%2Fwvpy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwinvector%2Fwvpy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwinvector%2Fwvpy/lists"}