{"id":23131026,"url":"https://github.com/luiarthur/turingbnpbenchmarks","last_synced_at":"2025-07-17T18:35:36.040Z","repository":{"id":37653154,"uuid":"261811317","full_name":"luiarthur/TuringBnpBenchmarks","owner":"luiarthur","description":"Benchmarks of Bayesian Nonparametric models in Turing and other PPLs","archived":false,"fork":false,"pushed_at":"2024-07-18T23:46:37.000Z","size":38496,"stargazers_count":29,"open_issues_count":17,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-07-19T07:59:27.366Z","etag":null,"topics":["bayesian-inference","bayesian-nonparametric-models","benchmarks","julia-language","probabilistic-programming"],"latest_commit_sha":null,"homepage":"https://luiarthur.github.io/TuringBnpBenchmarks/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luiarthur.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-06T16:05:24.000Z","updated_at":"2024-07-18T23:46:41.000Z","dependencies_parsed_at":"2023-01-25T10:15:15.483Z","dependency_job_id":null,"html_url":"https://github.com/luiarthur/TuringBnpBenchmarks","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luiarthur%2FTuringBnpBenchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luiarthur%2FTuringBnpBenchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luiarthur%2FTuringBnpBenchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luiarthur%2FTuringBnpBenchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luiarthur","download_url":"https://codeload.github.com/luiarthur/TuringBnpBenchmarks/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230106021,"owners_count":18173954,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-inference","bayesian-nonparametric-models","benchmarks","julia-language","probabilistic-programming"],"created_at":"2024-12-17T11:11:59.444Z","updated_at":"2024-12-17T11:12:00.071Z","avatar_url":"https://github.com/luiarthur.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TuringBnpBenchmarks\nBenchmarks of Bayesian Nonparametric models in Turing and other PPLs.\n\nThis work is funded by [GSoC 2020][1].\n\nMy mentors for this project are [Hong Ge][3], [Martin Trapp][4], and \n[Cameron Pfiffer][5].\n\n## Abstract\nProbabilistic models, which more naturally quantify uncertainty when compared\nto their deterministic counterparts, are often difficult and tedious to\nimplement. Probabilistic programming languages (PPLs) have greatly increased\nproductivity of probabilistic modelers, allowing practitioners to focus on\nmodeling, as opposed to the implementing algorithms for probabilistic (e.g.\nBayesian) inference. Turing is a PPL developed entirely in Julia and is both\nexpressive and fast due partly to Julia’s just-in-time (JIT) compiler being\nimplemented in LLVM. Consequently, Turing has a more manageable code base and\nhas the potential to be more extensible when compared to more established PPLs\nlike STAN. One thing that may lead to the adoption of Turing is more benchmarks\nand feature comparisons of Turing to other mainstream PPLs. The aim of this\nproject is to provide a more systematic approach to comparing execution times\nand features among several PPLs, including STAN, Pyro, nimble, and Tensorflow\nprobability for a variety of Bayesian nonparametric (BNP) models, which are a\nclass of models that provide a much modeling flexibility and often allow model\ncomplexity to increase with data size.\n\nTo address the need for a more systematic approach for comparing the\nperformance of Turing and various PPLs (STAN, Pyro, nimble, TensorFlow\nprobability) under common Bayesian nonparametric (BNP) models,  which are a\nclass of models that provide a great deal of modeling flexibility and allow the\nnumber of model parameters, and thus model complexity, to increase with the\nsize of the data. The following models will be implemented (if possible) and\ntimed (both compile times and execution times) in the various PPLs (links to\nminimum working examples will be provided):\n\n- Sampling (and variational) algorithms for Dirichlet process (DP) Gaussian /\n  non-Gaussian mixtures for different sample sizes\n    - E.g. Sampling via Chinese restaurant process (CRP) representations\n      (including collapsed Gibbs, sequential Monte Carlo, particle Gibbs),\n      HMC/NUTS for stick-breaking (SB) constructions, variational inference for\n      stick-breaking construction.\n    - **Note**: DPs are a popular choice of BNP models typically used when density\n      estimation is of interest. They are also a popular prior for infinite\n      mixture models, where the number of clusters are not known in advance.\n- Sampling (and variational) algorithms for Pitman-Yor process (PYP) Gaussian /\n  non-Gaussian mixtures for different sample sizes\n    - E.g. Sampling via generalized CRP representations (including collapsed\n      Gibbs, sequential Monte Carlo, particle Gibbs), HMC/NUTS for\n      stick-breaking (SB) constructions, variational inference for\n      stick-breaking construction.\n    - **Note**: PYPs are generalizations of DPs. That is, DPs are a special\n      case of PYPs. PYPs exhibit a power-law behavior, which enables them to\n      better model heavy-tailed distributions.\n- PYP / DP hierarchical models. Specific model to be determined.\n\nIn addition, the effective sample size and inference speed of a standardised\nsetup, e.g. HMC in truncated stick-breaking DP mixture models, for the\nrespective PPLs will be measured.\n\n## What this repo contains\nThis repository includes (or will include) tables and other visualizations\nthat compare the (compile and execution) speed and features of various PPLs\n(Turing, STAN, Pyro, Nimble, TFP) with a repository containing the minimum\nworking examples (MWEs) for each implementation. Blog posts describing the\nbenchmarks will also be included.\n\n## Software / Hardware\nAll experiments for this project were done in an [c5.xlarge][2] AWS Spot\nInstance. As of this writing, here are the specs for this instance:\n\n- vCPU: 4 Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz\n- RAM: 8 GB\n- Storage: EBS only\n- Network Bandwidth: Up to 10 Gbps\n- EBS Bandwidth: Up to 4750 Mbps\n\nThe following software was used:\n- Julia-v1.4.1. See `Project.toml` and `Manifest.tomal` for more info.\n\n[1]: https://summerofcode.withgoogle.com/projects/#5861616765108224\n[2]: https://aws.amazon.com/ec2/instance-types/c5/\n[3]: http://mlg.eng.cam.ac.uk/hong/ \n[4]: https://martintdotblog.wordpress.com/\n[5]: http://cameron.pfiffer.org/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluiarthur%2Fturingbnpbenchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluiarthur%2Fturingbnpbenchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluiarthur%2Fturingbnpbenchmarks/lists"}