{"id":16520962,"url":"https://github.com/jhrcook/pymc3-stan-comparison","last_synced_at":"2025-10-28T07:31:51.270Z","repository":{"id":44735270,"uuid":"427982356","full_name":"jhrcook/pymc3-stan-comparison","owner":"jhrcook","description":"Comparing the performance of the probabilistic programming languages PyMC3 and Stan.","archived":false,"fork":false,"pushed_at":"2022-01-27T21:51:01.000Z","size":16331,"stargazers_count":7,"open_issues_count":4,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-01T12:51:11.494Z","etag":null,"topics":["probabilistic-programming","pymc3","snakemake","stan"],"latest_commit_sha":null,"homepage":"https://jhrcook.github.io/pymc3-stan-comparison/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jhrcook.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-11-14T16:36:07.000Z","updated_at":"2025-01-02T12:48:23.000Z","dependencies_parsed_at":"2022-08-27T16:50:41.870Z","dependency_job_id":null,"html_url":"https://github.com/jhrcook/pymc3-stan-comparison","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhrcook%2Fpymc3-stan-comparison","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhrcook%2Fpymc3-stan-comparison/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhrcook%2Fpymc3-stan-comparison/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jhrcook%2Fpymc3-stan-comparison/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jhrcook","download_url":"https://codeload.github.com/jhrcook/pymc3-stan-comparison/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238614616,"owners_count":19501493,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["probabilistic-programming","pymc3","snakemake","stan"],"created_at":"2024-10-11T16:53:48.808Z","updated_at":"2025-10-28T07:31:42.767Z","avatar_url":"https://github.com/jhrcook.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Comparing performance of PyMC3 and Stan\n\nThe goal of this project is to compare the performance between two popular probabilistic programming languages, [Stan](https://mc-stan.org) and [PyMC3](https://docs.pymc.io/en/v3/).\n\n**The results can be found here: [jhrcook.github.io/pymc3-stan-comparison/](https://jhrcook.github.io/pymc3-stan-comparison/)**\n\n**Contributions are welcome!**\nTo add a new type of model, please see the guide below and feel free to ask for [help](https://github.com/jhrcook/pymc3-stan-comparison/issues).\nYou can also contribute to the data analysis by editing the analysis notebook: [docs/index.ipynb](docs/index.ipynb).\n\n\u003e This project is functional, but still a work in progress.\n\n## To-Do\n\n- finish documentation\n- more models and configurations (see GitHub Issues for requests for model types)\n\n## Table of Contents\n\n1. [Process overview](#process-overview)\n1. [Contributing](#contributing)\n1. [Running the pipeline](#running-the-pipeline)\n\n---\n\n## Process overview\n\n(TODO) - describe the pipeline and configuration system; using snakemake to profile which uses `psutil`.\n\n## Contributing\n\nAny contributions are welcome, particularly for different model types.\nOnce you have the development environment setup, there are just a few steps to adding a new model to the pipeline.\n\n### Overview\n\nThe pipeline uses the configurations in [model-configs.yaml](model-configs.yaml) to know which models to run.\nEach model configuration has five parts:\n\n1. `name`: a unique, identifiable name for the configuration\n1. `model`: the model that will be run (has multiple configuration options)\n1. `mem`: memory (in bytes) to allocate for running the model\n1. `time`: time (in `HH:MM:SS`) to allocate for running the model - **max 12 hours**\n1. `config`: an arbitrary keyword argument dictionary for configuring the model\n\nThe `model` parameter determines which PyMC3 or Stan model to run and the `config` dictionary will be used to configure the data and model.\nThe `mem` and `time` parameters are for the pipeline to use when profiling the models-fitting processes.\n\nTo run an individual model configuration once, pass the name of the configuration to the `fit` command in \"fit.py\" CLI.\nThe example below runs the simplest linear regression PyMC3 model:\n\n```bash\n./fit.py fit \"simple_pymc3_100\"\n```\n\n### Setup\n\nSetup your Python virtual environment using `conda` with the command below:\n\n```bash\nconda env create -f environment.yaml\n```\n\nIt is recommended to try running the two simplest PyMC3 and Stan models to help check your system is ready:\n\n```bash\n./fit.py fit \"simple_pymc3_100\"\n./fit.py fit \"simple_stan_100\"\n```\n\nIf either of these fail, please open an [issue](https://github.com/jhrcook/pymc3-stan-comparison/issues) on GitHub.\n\nI recommend creating a new git branch and working on there.\nPlease give the branch a descriptive name (e.g. if you are adding Gaussian process models name it `gaussian-process`).\n\n```bash\ngit checkout -b \u003cnew-branch-name\u003e\n```\n\n### Define a new model\n\nIf you stick to a few design guidelines in coding your model, adding it to the pipeline is trivial.\nThe simplest example of a model is the [simple linear regression](models/simple_linear_regression.py) model – I recommend using this as a guide.\n\nEach Stan and PyMC3 model will have a configuration class and a function called to fit the model.\n\n#### Model configuration class\n\nI decided to use ['pydantic'](https://pydantic-docs.helpmanual.io) for all of the configuration classes to make data parsing and validation easy.\nThere are several ways to define the configuration classes, but I have found the following pattern to work well and adhere to the [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) principle.\n\nFirst, create a class with the adjustable parameters for your data.\nFor example, for the simple linear regression model, there is a single parameter `size` that determines the number of data points.\n\n```python3\nfrom pydantic import BaseModel, PositiveInt\n\nclass SimpleLinearRegressionDataConfig(BaseModel):\n    \"\"\"Configuration for the data for the simple linear regression model.\"\"\"\n\n    size: PositiveInt\n```\n\nThis is one class because the adjustable parameters will be used by both the PyMC3 and Stan models.\n\nThen, use this data configuration class to create configuration classes for each model.\nI have created two classes (one for each library) with the basic parameters already included (such as `tune` and `draws`).\nSub-classing from these means that the new configuration class automatically inherits those parameters.\n\n\nBelow are the configuration classes for the PyMC3 and Stan simple linear regression models.\nNote that the ellipses `...` are actually used in the code because there are no additional parameters to specify – everything is inherited from `BasePymc3Configuration` and `SimpleLinearRegressionDataConfig`.\n\n```python3\nfrom .sampling_configurations import BasePymc3Configuration, BaseStanConfiguration\n\n\nclass SimplePymc3ModelConfiguration(\n    BasePymc3Configuration, SimpleLinearRegressionDataConfig\n):\n    \"\"\"Configuration for the Simple PyMC3 model.\"\"\"\n\n    ...\n\n\nclass SimpleStanModelConfiguration(\n    BaseStanConfiguration, SimpleLinearRegressionDataConfig\n):\n    \"\"\"Configuration for the Simple PyMC3 model.\"\"\"\n\n    ...\n```\n\n## Running the pipeline\n\n### Setup\n\n```bash\nconda env create -f pipeline-environment.yaml\n```\n\nOn O2, I can run the following command:\n\n```bash\n# Made for O2, only.\nsbatch run-pipeline.sh\n```\n\nOr to run locally:\n\n```bash\nsnakemake --cores 1 --use-conda\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhrcook%2Fpymc3-stan-comparison","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjhrcook%2Fpymc3-stan-comparison","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjhrcook%2Fpymc3-stan-comparison/lists"}