{"id":13717809,"url":"https://github.com/recipy/recipy","last_synced_at":"2025-05-07T08:30:35.651Z","repository":{"id":29440880,"uuid":"32976962","full_name":"recipy/recipy","owner":"recipy","description":"Effortless method to record provenance in Python","archived":false,"fork":false,"pushed_at":"2022-01-12T23:24:38.000Z","size":1154,"stargazers_count":434,"open_issues_count":89,"forks_count":42,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-05-02T01:03:59.638Z","etag":null,"topics":["database","provenance","python","recipy","reproducible-research","science"],"latest_commit_sha":null,"homepage":"https://recipy.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/recipy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":".zenodo.json"}},"created_at":"2015-03-27T09:07:42.000Z","updated_at":"2025-02-16T16:44:50.000Z","dependencies_parsed_at":"2022-09-01T22:50:59.443Z","dependency_job_id":null,"html_url":"https://github.com/recipy/recipy","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/recipy%2Frecipy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/recipy%2Frecipy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/recipy%2Frecipy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/recipy%2Frecipy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/recipy","download_url":"https://codeload.github.com/recipy/recipy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252842369,"owners_count":21812656,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","provenance","python","recipy","reproducible-research","science"],"created_at":"2024-08-03T00:01:27.417Z","updated_at":"2025-05-07T08:30:34.864Z","avatar_url":"https://github.com/recipy.png","language":"Python","funding_links":[],"categories":["Identity, signing and provenance","Python"],"sub_categories":["Supply chain beyond libraries"],"readme":"# recipy\n\n## What is it and who cares?\nImagine the situation: You’ve written some wonderful Python code which produces a beautiful graph as an output. You save that graph, naturally enough, as `graph.png`. You run the code a couple of times, each time making minor modifications. You come back to it the next week/month/year. Do you know how you created that graph? What input data? What version of your code? If you’re anything like me then the answer will often, frustratingly, be “no”. Of course, you then waste lots of time trying to work out how you created it, or even give up and never use it in that journal paper that will win you a Nobel Prize…\n\nThis talk will introduce ReciPy (from *recipe* and *python*), a Python module that will save you from this situation! (Although it can’t guarantee that your resulting paper will win a Nobel Prize!) With the addition of a single line of code to the top of your Python files, ReciPy will log each run of your code to a database, keeping track of the input files, output files and the version of your code, and then let you query this database to find out how you actually did create `graph.png`.\n\n## Installation:\nThe easiest way to install is by simply running\n\n    pip install recipy\n\nAlternatively, you can clone this repository and run:\n\n\tpython setup.py install\n\nIf you want to install the dependencies manually (they should be installed automatically if you're following the instructions above) then run:\n\n\tpip install -r requirements.txt\n\nYou can upgrade from a previous release by running:\n\n\tpip install -U recipy\n\nTo find out what has changed since the last release, see the [changelog](https://github.com/recipy/recipy/blob/master/CHANGELOG.md)\n\n**Note:** Previous (unreleased) versions of recipy required MongoDB to be installed and set up manually. This is no longer required, as a pure Python database (TinyDB) is used instead. Also, the GUI is now integrated fully into recipy and does not require installing separately.\n\n## Usage\nSimply add the following line to the top of your Python script:\n\n``` python\nimport recipy\n```\n\nNote that this **must** be the **very top** line of your script, before you import anything else.\n\nThen just run your script as usual, and all of the data will be logged into the TinyDB database (don't worry, the database is automatically created if needed). You can then use the `recipy` script to quickly query the database to find out what run of your code produced what output file. So, for example, if you run some code like this:\n\n``` python\nimport recipy\nimport numpy\n\narr = numpy.arange(10)\narr = arr + 500\n\nnumpy.save('test.npy', arr)\n```\n\n(Note the addition of `import recipy` at the beginning of script - but there are no other changes from a standard script)\n\nAlternatively, run an unmodified script with `python -m recipy SCRIPT [ARGS ...]` to enable recipy logging. This invokes recipy's module entry point, which takes care of import recipy for you, before running your script.\n\nit will produce an output called `test.npy`. To find out the details of the run which created this file you can search using\n\n    recipy search test.npy\n\nand it will display information like the following:\n\n    Created by robin on 2015-05-25 19:00:15.631000\n\tRan /Users/robin/code/recipy/example_script.py using /usr/local/opt/python/bin/python2.7\n\tGit: commit 91a245e5ea82f33ae58380629b6586883cca3ac4, in repo /Users/robin/code/recipy, with origin git@github.com:recipy/recipy.git\n\tEnvironment: Darwin-14.3.0-x86_64-i386-64bit, python 2.7.9 (default, Feb 10 2015, 03:28:08)\n\tInputs:\n\n\tOutputs:\n\t  /Users/robin/code/recipy/test.npy\n\nAn alternative way to view this is to use the GUI. Just run `recipy gui` and a browser window will open with an interface that you can use to search all of your recipy 'runs':\n\n![Screenshot of GUI](http://rtwilson.com/images/RecipyGUI.png)\n\nIf you want to log inputs and outputs of files read or written with built-in open, you need to do a little more work. Either use `recipy.open` (only requires `import recipy` at the top of your script), or add `from recipy import open` and just use `open`.\nThis workaround is required, because many libraries use built-in open internally, and you only want to record the files you explicitly opened yourself.\n\nIf you use Python 2, you can pass an `encoding` parameter to `recipy.open`. In this case `codecs` is used to open the file with proper encoding.\n\nOnce you've got some runs in your database, you can 'annotate' these runs with any notes that you want to keep about them. This can be particularly useful for recording which runs worked well, or particular problems you ran into. This can be done from the 'details' page in the GUI, or by running\n\n\trecipy annotate\n\nwhich will open an editor to allow you to write notes that will be attached to the run. These will then be viewable via the command-line and the GUI when searching for runs.\n\nThere are other features in the command-line interface too: `recipy --help` to see the other options. You can view diffs, see all runs that created a file with a given name, search based on ids, show the latest entry and more:\n\n\trecipy - a frictionless provenance tool for Python\n\n\tUsage:\n\t  recipy search [options] \u003coutputfile\u003e\n\t  recipy latest [options]\n\t  recipy gui [options]\n\t  recipy annotate [\u003cidvalue\u003e]\n\t  recipy (-h | --help)\n\t  recipy --version\n\n\tOptions:\n\t  -h --help     Show this screen\n\t  --version     Show version\n\t  -a --all      Show all results (otherwise just latest result given)\n\t  -f --fuzzy    Use fuzzy searching on filename\n\t  -r --regex    Use regex searching on filename\n\t  -i --id       Search based on (a fragment of) the run ID\n\t  -v --verbose  Be verbose\n\t  -d --diff     Show diff\n\t  -j --json     Show output as JSON\n\t  --no-browser  Do not open browser window\n\t  --debug       Turn on debugging mode\n\n## Configuration\nRecipy stores all of its configuration and the database itself in `~/.recipy`. Recipy's  main configuration file is inside this folder, called `recipyrc`. The configuration file format is very simple, and is based on Windows INI files - and having a configuration file is completely optional: the defaults will work fine with no configuration file.\n\nAn example configuration is:\n\n\t[ignored metadata]\n\tdiff\n\n\t[general]\n\tdebug\n\nThis simply instructs recipy not to save `git diff` information when it records metadata about a run, and also to print debug messages (which can be really useful if you're trying to work out why certain functions aren't patched). At the moment, the only possible options are:\n\n * `[general]`\n\t * `debug` - print debug messages\n \t * `editor = vi` - Configure the default text editor that will be used when recipy needs you to type in a message. Use notepad if on Windows, for example\n\t * `quiet` - don't print any messages\n\t * `port` - specify port to use for the GUI\n *  `[data]`\n\t * `file_diff_outputs` - store diff between the old output and new output file, if the output file exists before the script is executed\n *  `[database]`\n \t * `path = /path/to/file.json` - set the path to the database file\n * `[ignored metadata]`\n\t * `diff` - don't store the output of `git diff` in the metadata for a recipy run\n\t * `git` - don't store anything relating to git (origin, commit, repo etc) in the metadata for a recipy run\n     * `input_hashes` - don't compute and store SHA-1 hashes of input files\n     * `output_hashes` - don't compute and store SHA-1 hashes of output files\n * `[ignored inputs]`\n \t * List any module here (eg. `numpy`) to instruct recipy *not* to record inputs from this module, or `all` to ignore inputs from all modules\n * `[ignored outputs]`\n \t * List any module here (eg. `numpy`) to instruct recipy *not* to record outputs from this module, or `all` to ignore outputs from all modules\t \n\nBy default all metadata is stored (ie. no metadata is ignored) and debug messages are not shown. A `.recipyrc` file in the current directory takes precedence over the `~/.recipy/recipyrc` file, allowing per-project configurations to be easily handled.\n\n**Note:** No default configuration file is provided with recipy, so if you wish to configure anything you will need to create a properly-formatted file yourself.\n\n## How it works\nWhen you import recipy it adds a number of classes to `sys.meta_path`. These are then used by Python as part of the importing procedure for modules. The classes that we add are classes derived from `PatchImporter`, often using the easier interface provided by `PatchSimple`, which allow us to wrap functions that do input/output in a function that calls recipy first to log the information.\n\nGenerally, most of the complexity is hidden away in `PatchImporter` and `PatchSimple` (plus `utils.py`), so the actual code to wrap a module, such as `numpy` is fairly simple:\n\n``` python\n# Inherit from PatchSimple\nclass PatchNumpy(PatchSimple):\n    # Specify the full name of the module\n    modulename = 'numpy'\n\n    # List functions that are involved in input/output\n    # these can be anything that can go after \"modulename.\"\n    # so they could be something like \"pyplot.savefig\" for example\n    input_functions = ['genfromtxt', 'loadtxt', 'load', 'fromfile']\n    output_functions = ['save', 'savez', 'savez_compressed', 'savetxt']\n\n    # Define the functions that will be used to wrap the input/output\n    # functions.\n    # In this case we are calling the log_input function to log it to the DB\n    # and we are giving it the 0th argument from the function (because all of\n    # the functions above take the filename as the 0th argument), and telling\n    # it that it came from numpy.\n    input_wrapper = create_wrapper(log_input, 0, 'numpy')\n    output_wrapper = create_wrapper(log_output, 0, 'numpy')\n```\n\nA class like this must be implemented for each module whose input/output needs logging. At the moment the following input and output functions are patched:\n\nPatched modules\n===============\n\nThis table lists the modules recipy has patches for, and the input and output functions that are patched.\n\n\u003ctable\u003e\n\u003ccolgroup\u003e\n\u003ccol width=\"33%\" /\u003e\n\u003ccol width=\"33%\" /\u003e\n\u003ccol width=\"33%\" /\u003e\n\u003c/colgroup\u003e\n\u003cthead\u003e\n\u003ctr class=\"header\"\u003e\n\u003cth align=\"left\"\u003eModule\u003c/th\u003e\n\u003cth align=\"left\"\u003eInput functions\u003c/th\u003e\n\u003cth align=\"left\"\u003eOutput functions\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr class=\"odd\"\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003epandas\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003eread_csv\u003c/code\u003e, \u003ccode\u003eread_table\u003c/code\u003e, \u003ccode\u003eread_excel\u003c/code\u003e, \u003ccode\u003eread_hdf\u003c/code\u003e, \u003ccode\u003eread_pickle\u003c/code\u003e, \u003ccode\u003eread_stata\u003c/code\u003e, \u003ccode\u003eread_msgpack\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003eDataFrame.to_csv\u003c/code\u003e, \u003ccode\u003eDataFrame.to_excel\u003c/code\u003e, \u003ccode\u003eDataFrame.to_hdf\u003c/code\u003e, \u003ccode\u003eDataFrame.to_msgpack\u003c/code\u003e, \u003ccode\u003eDataFrame.to_stata\u003c/code\u003e, \u003ccode\u003eDataFrame.to_pickle\u003c/code\u003e, \u003ccode\u003ePanel.to_excel\u003c/code\u003e, \u003ccode\u003ePanel.to_hdf\u003c/code\u003e, \u003ccode\u003ePanel.to_msgpack\u003c/code\u003e, \u003ccode\u003ePanel.to_pickle\u003c/code\u003e, \u003ccode\u003eSeries.to_csv\u003c/code\u003e, \u003ccode\u003eSeries.to_hdf\u003c/code\u003e, \u003ccode\u003eSeries.to_msgpack\u003c/code\u003e, \u003ccode\u003eSeries.to_pickle\u003c/code\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr class=\"even\"\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003ematplotlib.pyplot\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003esavefig\u003c/code\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr class=\"odd\"\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003enumpy\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003egenfromtxt\u003c/code\u003e, \u003ccode\u003eloadtxt\u003c/code\u003e, \u003ccode\u003efromfile\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003esave\u003c/code\u003e, \u003ccode\u003esavez\u003c/code\u003e, \u003ccode\u003esavez_compressed\u003c/code\u003e, \u003ccode\u003esavetxt\u003c/code\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr class=\"even\"\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003elxml.etree\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003eparse\u003c/code\u003e, \u003ccode\u003eiterparse\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr class=\"odd\"\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003ebs4\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003eBeautifulSoup\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr class=\"even\"\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003egdal\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003eOpen\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003eDriver.Create\u003c/code\u003e, \u003ccode\u003eDriver.CreateCopy\u003c/code\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr class=\"odd\"\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003esklearn\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003edatasets.load_svmlight_file\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003edatasets.dump_svmlight_file\u003c/code\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr class=\"even\"\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003enibabel\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003enifti1.Nifti1Image.from_filename\u003c/code\u003e, \u003ccode\u003enifti2.Nifti2Image.from_filename\u003c/code\u003e, \u003ccode\u003efreesurfer.mghformat.MGHImage.from_filename\u003c/code\u003e, \u003ccode\u003espm99analyze.Spm99AnalyzeImage.from_filename\u003c/code\u003e, \u003ccode\u003eminc1.Minc1Image.from_filename\u003c/code\u003e, \u003ccode\u003eminc2.Minc2Image.from_filename\u003c/code\u003e, \u003ccode\u003eanalyze.AnalyzeImage.from_filename\u003c/code\u003e, \u003ccode\u003eparrec.PARRECImage.from_filename\u003c/code\u003e, \u003ccode\u003espm2analyze.Spm2AnalyzeImage.from_filename\u003c/code\u003e\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u003ccode\u003enifti1.Nifti1Image.to_filename\u003c/code\u003e, \u003ccode\u003enifti2.Nifti2Image.to_filename\u003c/code\u003e, \u003ccode\u003efreesurfer.mghformat.MGHImage.to_filename\u003c/code\u003e, \u003ccode\u003espm99analyze.Spm99AnalyzeImage.to_filename\u003c/code\u003e, \u003ccode\u003eminc1.Minc1Image.to_filename\u003c/code\u003e, \u003ccode\u003eminc2.Minc2Image.to_filename\u003c/code\u003e, \u003ccode\u003eanalyze.AnalyzeImage.to_filename\u003c/code\u003e, \u003ccode\u003eparrec.PARRECImage.to_filename\u003c/code\u003e, \u003ccode\u003espm2analyze.Spm2AnalyzeImage.to_filename\u003c/code\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\nHowever, the code example above shows how easy it is to write a class to wrap a new module - so please feel free to submit a Pull Request to make recipy work with your favourite scientific modules!\n\n## Test framework\n\nrecipy's test framework is in `integration_test`. The test framework has been designed to run under both Python 2.7+ and Python 3+. For more information see [recipy test framework](./docs/TestFramework.md).\n\nThe test framework is run on the following platforms:\n\n* Travis CI: [![Integration test status image](https://travis-ci.org/recipy/recipy.svg)](https://travis-ci.org/recipy/recipy)\n* AppVeyor: [![Build status](https://ci.appveyor.com/api/projects/status/irvathkx02yigjfn?svg=true\n)](https://ci.appveyor.com/project/recipy/recipy)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frecipy%2Frecipy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frecipy%2Frecipy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frecipy%2Frecipy/lists"}