{"id":27231350,"url":"https://github.com/dtrckd/pymake","last_synced_at":"2025-04-10T13:44:24.772Z","repository":{"id":57453778,"uuid":"77100809","full_name":"dtrckd/pymake","owner":"dtrckd","description":"An experiment control system for reproducible research","archived":false,"fork":false,"pushed_at":"2024-03-05T02:40:54.000Z","size":1497,"stargazers_count":14,"open_issues_count":4,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-18T17:57:45.946Z","etag":null,"topics":["cli","cross-validation","machine-learning","model-driven-development","scientific"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dtrckd.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-12-22T01:51:36.000Z","updated_at":"2024-09-23T22:08:57.000Z","dependencies_parsed_at":"2022-08-29T06:51:20.941Z","dependency_job_id":null,"html_url":"https://github.com/dtrckd/pymake","commit_stats":{"total_commits":402,"total_committers":4,"mean_commits":100.5,"dds":0.04228855721393032,"last_synced_commit":"2aa9a27020a70df8af27fe5be237363386a79abc"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtrckd%2Fpymake","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtrckd%2Fpymake/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtrckd%2Fpymake/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dtrckd%2Fpymake/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dtrckd","download_url":"https://codeload.github.com/dtrckd/pymake/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248226499,"owners_count":21068214,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","cross-validation","machine-learning","model-driven-development","scientific"],"created_at":"2025-04-10T13:44:24.011Z","updated_at":"2025-04-10T13:44:24.756Z","avatar_url":"https://github.com/dtrckd.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PYMAKE\n\n\u003c!--[![Documentation Status](https://readthedocs.org/projects/pmk/badge/?version=latest)](https://pmk.readthedocs.io/en/latest/?badge=latest)--\u003e\n[![Documentation Status](https://readthedocs.org/projects/pmk/badge/?version=latest)](https://pymake.neocities.org/)\n[![Join the chat at https://gitter.im/pymake/Lobby](https://badges.gitter.im/pymake/Lobby.svg)](https://gitter.im/pymake/Lobby?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\nPymake (**pmk**) is an environment for making reproducible research. \nIt provides tools adapted to ease the creation, maintenance, tracking and sharing of experiments. It has two main paradigms:\n\n* Manage and navigate in your experiments, as a **command-line** interface.\n* Models and workflows for Machine Learning experiments, as a **framework**.\n\nIt follows a Don't-Repeat-Yourself ([DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)) philosophy and propose a workflow called Model-Spec-Action (MSA) which is in the spirit of former Model-View-Controller ([MVC](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller)) design pattern but adapted for computer simulations, generally speaking.\n\nIt can be represented as follows:\n\n![MSA Model](wiki/msa.png)\n\n\u003c!-- Build Powerful CLI | Create Beautiful UI | Browse your Experiments  --\u003e\n\n\n\u003c!--\n\n    * (future) Integration of datasets from [Zenodo](https://www.zenodo.org/)\n    * (future) Integration of Data drivers for major type of data beiing:\n        * Text (candidate: Nltk, Spacy)\n        * Network (candidate: networkx, Graph-tool)\n        * Image: (andidate: OpenCV)\n\n\n\n--\u003e\n\n# Table of Contents\n1. [Features](#1)\n2. [Install](#2)\n3. [Example](#3)\n4. [FAQ](#4)\n5. [Documentation](#5)\n6. [man pymake](#man)\n\n## Features [](#1)\n* Specification of design of experimentation with a simple grammar,\n* Indexation of specs, models, scripts and corpus, powered by [Whoosh](https://whoosh.readthedocs.io/en/latest/),\n* Customisable Command-line for quick design and experiment testing, powered by [argparse](https://docs.python.org/3/library/argparse.html#module-argparse)\n* Command-line auto-completion for specs and scripts,\n* Simple grid search specification and navigation,\n* Support experiments rules filtering (experimental)\n* Support disks I/O management for training/input data and outputs results.\n    * Automatic filesystem I/O for data persistence.\n    * Automatic compression.\n    * Pickle and Json format are currently supported.\n* Support plotting and table printing facilities powered by [matplotlib](https://matplotlib.org) and [pandas](https://pandas.pydata.org/)\n* Support experiments parallelization powered by [gnu-parallel](https://www.gnu.org/software/parallel/),\n* Browse, design and test several models and corpus found in the literature.\n    * Integration of models from [scikit-learn](http://scikit-learn.org),\n\nPerspectives:\n\n* Web server UI and notebook automatic builder.\n* Better documentation (see also [wiki/](./wiki/)).\n\n\n## Install [](#2)\n\n#### Short\n\n    pip install pmk\n\n#### From source\n\n###### Linux dependencies\n\n    apt-get install python3-setuptools python3-pip python3-tk libopenblas-dev gfortran parallel\n\n###### MacOs dependencies\n\n    brew install parallel\n\n###### Manual Installation\n\n```bash\ngit clone https://github.com/dtrckd/pymake\ncd pymake \u0026\u0026 make\n```\n\n## Examples [](#3)\n\nPublic projects that uses Pymake : \n\n* [ml](https://github.com/dtrckd/ml) : Machine Learning models and experiments.\n* [docsearch](https://github.com/dtrckd/docsearch) : self hosted search engine for your pdf documents.\n\n## Glossary and Types\n\n* *run* or *expe*: It is the term that design one single experiment. it is related to an atomic, sequential code execution.\n* *model*: A class that have a method named `fit` and located in `model/`.\n* *spec*: A spec is a design of experience. it is defines by a subset of Expspaces, ExpTensors and ExpGroups.\n* *script*: A script is a file containing a list of actions,  (see *ExpeFormat*).\n* *actions*: An action is basically one method in a script that can be triggered by users. The term script is often used instead of *action* by misuse language.\n* *ExpSpace*: A dict-like object used to stored the settings of one *expe*.\n* *ExpTensor*: A dict-like object to represent a set of *expe* with common parameters. Each entries that are instance of `list` or `set` are used to build the Cartesian product of those entries. It is used to defined grid search over parameters.\n* *ExpGroup*: A list-like object to defined a set of heterogeneous expes.\n* *ExpeFormat*: A base class used to create scripts. It acts like a sandbox for the runs. The classes that inherit ExpeFormat should be located in `script/`.\n* *ExpDesign*: A base  class used to create design of experience. The experience of type ExpSpace, ExpTensor and ExpGroup should be defined within class that inherit ExpDesign and located in `spec/`.\n* *pymake.cfg*: the pymake configuration file, where, for example, the name of the location (model/, spec, model/) can be changed among other settings.\n* *gramarg*: It refers to a file, by default in gramarg.py, where you can tune the command line options of pmk by adding your onw. The command line option grammar is powered by the python module argparse.\n\u003c!-- grammarg, --\u003e\n\n\n\n## FAQ [](#4)\n\n\n###### How to see the difference between two specs\n\n    pmk diff spec1 spec2\n\n###### How to tune the command-line options\n\nThe pymake.cfg have a settings, by default `gramarg = project_name.grammarg`, which point to the python file gramarg.py. Inside this file you can add command-line options, fully compatible with the `argparse` python module. By default the file contains an empty list. If you want, let's say to set a parameter in your expe with the command line like this `pmk --my_key my_value` you can add a element in the list as follows:\n\n```python\n_gram = [\n    '--my_key',dict(type=str, help='simple option from command-line'),\n]\n```\n\nNow suppose that you want to run several expe with different value for an argument, for example `--my-key 10 20` will result in a expTensor with two expe, one with \"my-key\" at 10 the other at 20. To activate this you can proceed as follows:\n\n```python\nfrom pymake.core.gram import exp_append\n\n_gram = [\n    '--my_key',dict(nargs='*', action=exp_append),\n]\n```\n\nThus the argument you will get is a str for \"my-key\". If you want a int let's say, you can proceed as follows:\n\n```python\nfrom functools import partial\nfrom pymake.core.gram import exp_append\n\n_gram = [\n    '--my_key',dict(nargs='*',  action=partial(exp_append, _t=int)),\n]\n```\n\nFinally if you argument \"my-key\" should be a list of values (int here) and should not create several expe, you can proceed like this:\n\n```python\nfrom functools import partial\nfrom pymake.core.gram import exp_append_uniq\n\n_gram = [\n    '--my_key',dict(nargs='*',  action=partial(exp_append_uniq, _t=int)),\n]\n```\n\n\n###### How to change a settings in a spec from command-line without specifying it in the grammarg file\n\nPymake provide a magic command line argument to specify any field in an expe. Let's say you want to give the value `my_value` in the field `my_key` in your expe, then you can do `pmk [...] --pmk my_value=my_key`. You can chain as many key=value pairs like this.\n\n\n###### How to share memory between all expe/run in a sandbox (ExpeFormat)\n\nIf a spec has several run/expe and if the run/expe are launched sequentially (without `--cores` option), then one can use a global container defined in the ExpeFormat sandbox classes in the variable `self.D`. Typically one would init variables at the first experience, process it, and at the final run, do some processing with that variable, as illustrated in the following example:\n\n```python\nclass MyScripts(ExpeFormat):\n\n    def my_action(self):\n        if self.is_first_expe():\n            self.D.my_shared_var = 0\n\n        my_shared_var = self.D.my_shared_var\n        my_shared_var += 1\n\n        if self.is_last_expe():\n            print('Expe total: %d' % self.D.my_shared_var)\n```\n\n\nIf the runs are parallelized (with `--cores` options), there is no current implemented way to do it although it is likely to be developed in the future.\n\n\n\n###### How to virtually remove a spec term from the commandline\n\nIf one parameter is accessible from the command line. You can deactivate it from the command line by giving the argument `_null`, from example `pmk a_complex_spec --my_key _null`. Thus the associated value will takes no value (or its default value.)\n\n\n###### How to activate Spec/Script auto-completion\n\nThe command, `pymake update` build the auto-completion file for bash. To enable it, put the following lines at the end of your `~/.bashrc`:\n\n```bash\nif [ -d $HOME/.bash_completion.d ]; then\n    if [ ! -z $(ls $HOME/.bash_completion.d) ]; then\n        for bcfile in $HOME/.bash_completion.d/*; do\n            . $bcfile\n        done\n    fi\nfi\n```\n\nIf you want to enable the auto-completion, open a new terminal or just run `source ~/.bashrc`.\n\n\n## Documentation [](#5)\n\n\n1. Workflow / directory structure\n2. pymake commands\n4. Designing Experiments\n5. Track your data and results\n3. pymake.cfg\n6. Search and indexation\n\n(to be completed)\n\n----\n\n##### Workflow / Directory Structure\n\nIn a pymake project there is 4 main components, associated to 4 directories (you can change those names in the pymake.cfg):\n\n* `data/`: Where are storer input/output of any experiments,\n    + contains datasets (and saved results) \u003c!--  selection with the `-c` options and see frontendManager --\u003e,\n* `model/`: It represents our understanding of the data,\n    + contains models -- every class with a `fit` method \u003c!-- selection with the `-m` options and see ModelManager --\u003e,\n* `script/`: Code that operate with the data and models,\n    + contains scripts for actions, -- every class that inherit `ExpeFormat` \u003c!-- selection with the `-x` options --\u003e\n* `spec/`: It is the specifications of the context of an experiment. In order words, the parameters of an experiment.\n    + contains specification of (design) experiments (`ExpSpace`,`ExpTensor` and `ExpGroup`), -- can be given as an argument of pymake.\n\nAlong with those directory there is two system files:\n* pymake.cfg: at the root of a project (basically define a project) specify the paths for the `data | model | script | spec`  and other global options, \u003c!-- document each entry --\u003e\n* gramarg.py: defines the command-line options for a project.\n\n\n##### Pymake Commands\n\nInitialize a new project in the current directory:\n\n    pymake init\n\nIf new models or scripts are added in the project, you'll need to update the pymake index:\n\n    pymake update\n\n\nList/Search information:\n\n```bash\npmk -l spec   # show available designs of experimentation\npmk -l model  # show available models\npmk -l script # show available scripts\npmk show expe_name # or just pymake expe_name\n```\n\nRun experiments:\n\n```bash\npmk run [expe_name] --script script_name [script options...]\n# Or shortly (alias):\npmk [expe_name] -x script_name\n# Run in parallel:\npmk [expe_name] -x script_name --cores N_CORES\n```\n\nShow Paths for disks I/O:\n\n    pmk path [expe_name] [script options...]\n\nShow individuals commands for asynchronously purpose (@deprecated):\n\n    pmk cmd [expe_name] [script options...]\n\n##### Designing experiments\n\n##### Specifications\n\n\nA design of experiment is defined as one of the following type:\n* ExpSpace: A subclass of `dict` =\u003e 1 experiment\n* ExpTensor: A subclass of `dict` =\u003e many experiments (Cartesian Product of all `list` entrie of the dict)\n* ExpGroup: A subclass of `list` =\u003e group of ExpSpace or ExpTensor.\n\nDesign of experiment (ExpSpace, ExpTensor or ExpGroup) must live inside a class that inherit `ExpDesign`. Those classes live in files inside the `spec/` directory. You'll need the following import:\n`from pymake import ExpDesign, ExpSpace, ExpTensor, ExpGroup`\n\n\nThe following examples need to be instantiated in class that inherits `ExpDesign`: `class MyDesign(ExpDesign)`.\n\nTo specify an unique experiment, one can use the `ExpSpace` class:\n\n```python\nexp1 = ExpSpace(name = 'myexpe',\n        size = 42,\n        key1 = 100,\n        key2 = 'johndoe'\n        _format = '{name}-{size}-{key1}_{key2}'\n        )\n```\n\nTo specify a **grid search**, one can use the `ExpTensor` class:\n\n```python\nexp2 = ExpTensor(name = 'myexpe',\n        size = [42, 100],\n        key1 = list(range(20, 1000))\n        key2 = 'johndoe'\n        _format = '{name}-{size}-{key1}_{key2}'\n        )\n```\n\nWhich will results in four experiments where \"size\" and \"key1\" settings take different values.\n\nThe third class is the `ExpGroup` which allows to group several design of experiments (for example if they have different settings name):\n\n```python\nexp3 = ExpGroup([exp1, exp2])\n```\n\nYou can then run `pmk -l` to see our design of experiments.\n\n##### Designing a Model\n\nBasically, A model is a class inside `model/` that have a method `fit`.\n\n(Doc in progress for more fancy use cases of design.)\n\n##### Designing a Script\n\nA script is a piece of code that you execute which is parameterized by a **specification**. More specifically, Scripts are methods of class that inherits a `ExpeFormat` and that lives inside the `script/` folder.\n\nOnce you defined some scripts, you'll be able to list them with `pmk -l script`, and to run them, by their name, with `pmk [specification_id] -x script_name`.\n\nThen each experiments defined in your design (or _default_expe if no specification_id is given), will go through the script method. Then, a bunch of facilities are living inside the method at run-time:\n\n* `self.expe`:  The settings of the current experiment,\n* `self._it`: The ith script running inside the script,\n* and more (doc in progress)\n\n\n##### Track your data and results\n\n\n\nIf a your expe contain models, you can automatically load and save it in a expe if your spec have a field named \"model\", and that its value point to a valid model in your pmk path. Then you can load your model in a script by calling `self.load_model()`.  If you give the argument `-w` in the command-line, or (equivalent) your expe have have a pair `_write=True`, the model is automatically saved at the end of the expe, after the model have been updated. Then you can reload from its file br calling `self.load_model(load=True)`.\n\n\nIn order to save and analyze your results, each unique experiment need to be identified in a file. To do so, Pymake comes with its own mechanism to map the settings/specification to an unique \u003cfilename\u003es. Pymake use the following conventions:\n\n* \u003cfilename\u003e.inf: csv file where each lines contains the state of the iterative process of an experiment, (see \\_scv_format)\n* \u003cfilename\u003e.pk.gz: to save compressed binary object usually at the end of an experiments, and load it after for analysis/visualization,\n* \u003cfilename\u003e.json: to save information in a JSON format.\n\nThere is a bunch of special spec parameters to customize the behaviours of pymake describe in the following sections.\n\n\n###### Formatting the filename -- _format\n\nThe choice of the filename will depends on the settings of the experiments. In order to specify the format of the filename, there is the special settings `--format  str_fmt`. `str_fmt` is a string template for the filename, with braces delimiter to specify what settings will be replaced, example:\n\nSuppose we have the following settings:\n\n```python\nsettings = ExpSpace(name = 'myexpe',\n        size = 42,\n        key1 = 100,\n        key2 = 'johndoe'\n        _format = '{name}-{size}-{key1}_{key2}'\n        )\n```\n\nThe filename for this unique experiment will be 'myexpe-42-100_johndoe'\n\n\nTo give an unique identifier of an expe belonging to a group of experiments (`ExpTensor` or `ExpGroup`) one can use the special term `{_id}` (unique counter identifier) and `${name}` (name of experiment as given in the ExpDesign class) in the `_format` attribute.\n\n###### Formatting the path -- _refdir and _repeat\n\n\nThe path of the filename identifying an expe is automatically inferred by pymake. Thus, if you want to better partition your results, there is two parameters to control the output_path. By default it is something like `.pmk/results/training/\u003crefdir\u003e/\u003crepeat\u003e/output_path`. Thus you can control in your spec parameter the two level of sub-directory customizable with the keys `_refdir` and `_repeat` (in spec). If not given, the default parameters are \"default\" and '' (void) for respectively `_refdir` and `_repeat`. Note that you can format it with the same syntax explained for `_format`.\n\n\n###### Specifying what measure to save -- _measures.\n\n\n\u003c!--\nSuppose you a script `fit` that execute a fit method of a given model. You may want to observe the convergence some quantities of your model trough iterations. The way to achieve this with pmk is to use the special settings `_measures` that contains the attribute of your model you want to track.\n\nOnce you provide this settings, you need to inject a function in the fit method of your model to track the results.\n\n--\u003e\n\nto complete...\n\n* explain the `_scv_typo` parameters..\n* the model need to have a method injected a the end of its iterative process..\n\n\n## man pymake [](#man)\n\npymake (pmk) command-line reference.\n\n```bash\ninit = command$;\ncommand = 'pmk' [command_name] [expedesign_id]* [expe_id]* [pmk_options];\ncommand_name = 'run' | 'runpara' | 'path' | 'cmd' | 'update' | 'show' | 'hist' |  '' ;\nexpe_id = int; # int identifier of an expe from 0 to; size(exp) -1.\nexpedesign_id = [exp id/name]; # string identifier to an exp\npmk_options = [pymake special options + project options];\n```\n\n### Command_name\nIf 'expe_name' is empty and `-x` is given, pymake assumes `run` command. If no design spec is given, then the parameters are empty unless the script defines a `_default_expe` expe settings. All settings undefined in a design but defined in the `_default_expe` will take this value. Further, `_default_expe` can point to an existing spec in `spec/`; to do so use the following setting inside `_spec='my_expe_name'`.\n\nRemark: -l and -s (--simulate) options don't execute, they just show things up.\n\n### Expedesign_id\nPick among all (design of) experiments in {spec}. To list them `pmk -l spec`.\n\n### pmk_options\nHere are all the special options that own pymake, such as --refdir, --format, --script, -w, -l, -h etc. Additionally, all the options for the current project should be added in the `grammarg.py` file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdtrckd%2Fpymake","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdtrckd%2Fpymake","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdtrckd%2Fpymake/lists"}