{"id":19705671,"url":"https://github.com/llnl/muygpys","last_synced_at":"2025-04-02T17:06:36.407Z","repository":{"id":41994525,"uuid":"390856696","full_name":"LLNL/MuyGPyS","owner":"LLNL","description":"A fast, pure python implementation of the MuyGPs Gaussian process realization and training algorithm. ","archived":false,"fork":false,"pushed_at":"2024-09-30T21:57:14.000Z","size":1884,"stargazers_count":27,"open_issues_count":16,"forks_count":10,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-02T16:11:37.417Z","etag":null,"topics":["machine-learning","math-physics","python","scientific-computing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LLNL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-MIT","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-29T21:39:26.000Z","updated_at":"2025-03-23T22:47:54.000Z","dependencies_parsed_at":"2022-08-11T01:21:07.465Z","dependency_job_id":"5bdd305f-c1da-4f03-bea8-b2bc4a9ea20e","html_url":"https://github.com/LLNL/MuyGPyS","commit_stats":{"total_commits":314,"total_committers":6,"mean_commits":"52.333333333333336","dds":"0.45222929936305734","last_synced_commit":"0dad6a882048bcf885c59a2a23ce09181b7e67f4"},"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2FMuyGPyS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2FMuyGPyS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2FMuyGPyS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2FMuyGPyS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LLNL","download_url":"https://codeload.github.com/LLNL/MuyGPyS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246856671,"owners_count":20844973,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","math-physics","python","scientific-computing"],"created_at":"2024-11-11T21:29:33.558Z","updated_at":"2025-04-02T17:06:36.379Z","avatar_url":"https://github.com/LLNL.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Develop test](https://github.com/LLNL/MuyGPyS/actions/workflows/develop-test.yml/badge.svg)](https://github.com/LLNL/MuyGPyS/actions/workflows/develop-test.yml)\n[![Documentation Status](https://readthedocs.org/projects/muygpys/badge/?version=stable)](https://muygpys.readthedocs.io/en/stable/?badge=stable)\n# Fast implementation of the MuyGPs scalable Gaussian process algorithm\n\nMuyGPs is a scalable approximate Gaussian process (GP) model that achieves fast\nprediction and model optimization while retaining high-accuracy predictions and\nuncertainty quantification.\nThe MuyGPyS implementation allows the user to easily create GP models that can\nquickly train and predict on million-scale problems on a laptop or scale to\nbillions of observations on distributed memory systems using the same front-end\ncode.\n\n## What is MuyGPyS?\n\nMuyGPyS is a general-purpose Gaussian process library, similar to\n[GPy](https://github.com/SheffieldML/GPy),\n[GPyTorch](https://github.com/cornellius-gp/gpytorch), or\n[GPflow](https://github.com/GPflow/GPflow).\n\nMuyGPyS differs from the other options in that it constructs approximate GP\nmodels using nearest neighbors sparsification, conditioning predictions only on\nthe most relevant training data to drastically improve training time and\ntime-to-solution on large-scale problems.\nIndeed, MuyGPyS is intended for GP problems with millions or more observations,\nand supports a distributed memory backend for smoothly scaling to billion-scale\nproblems.\n\nMuyGPs uses nearest neighbors sparsification and performs leave-one-out cross\nvalidation using regularized loss functions to rapidly optimize a GP model without evaluating a much more expensive likelihood, which is required by similar\nscalable methods.\n\n## Getting Started \n\nSee the\n[illustration tutorial](https://muygpys.readthedocs.io/en/stable/examples/neighborhood_illustration.html)\nto see an illustration of why the neighborhood sparsification approach of MuyGPs\nworks.\n\nNext, see the\n[univariate regression tutorial](https://muygpys.readthedocs.io/en/stable/examples/univariate_regression_tutorial.html)\nfor a full description of the API and an end-to-end walkthrough of a simple\nregression problem.\n\nThe full documentation, including several additional tutorials with code\nexamples, can be found at\n[readthedocs.io](https://muygpys.readthedocs.io/en/stable/?).\n\nRead further in this document for installation instructions.\n\n## Backend Math Implementation Options\n\nIn addition to the default basic numpy backend, as of release v0.6.6, `MuyGPyS`\nsupports three additional backend implementations of all of its underlying math\nfunctions:\n\n- [MPI](https://github.com/mpi4py/mpi4py) - distributed memory acceleration\n- [PyTorch](https://github.com/pytorch/pytorch) - GPU acceleration and neural\nnetwork integration\n- [JAX](https://github.com/google/jax) - GPU acceleration\n\nIt is possible to include the dependencies of any, all, or none of these\nadditional backends at install time.\nPlease see the below installation instructions.\n\n`MuyGPyS` uses the `MUYGPYS_BACKEND` environment variable to determine which\nbackend to use at import time.\nIt is also possible to manipulate `MuyGPyS.config` to switch between backends\nprogrammatically.\nThis is not advisable unless the user knows exactly what they are doing\n(and must occur before importing any other `MuyGPyS` components).\n\n`MuyGPyS` will default to the `numpy` backend.\nIt is possible to switch back ends by manipulating the `MUYGPYS_BACKEND`\nenvironment variable in your shell, e.g.\n```\n$ export MUYGPYS_BACKEND=jax    # turn on JAX backend\n$ export MUYGPYS_BACKEND=torch  # turn on Torch backend\n$ export MUYGPYS_BACKEND=mpi    # turn on MPI backend\n```\n\n### Distributed memory support with MPI\n\nThe MPI version of `MuyGPyS` performs all tensor manipulation in distributed\nmemory.\nThe tensor creation functions will in fact create and distribute a chunk of each\ntensor to each MPI rank.\nThis data and subsequent data such as posterior means and variances remains\npartitioned, and most operations are embarassingly parallel.\nGlobal operations such as loss function computation make use of MPI collectives\nlike allreduce.\nIf the user needs to reason about all products of an experiment, such the full\nposterior distribution in local memory, it is necessary to employ a collective\nsuch as `MPI.gather`.\n\nThe wrapped KNN algorithms are not distributed, and so `MuyGPyS` does not yet\nhave an internal distributed KNN implementation.\nFuture versions will support a distributed memory approximate KNN solution.\n\nThe user can run a script `myscript.py` with MPI using, e.g. `mpirun` (or `srun`\nif using slurm) via\n```\n$ export MUYGPYS_BACKEND=mpi\n$ # mpirun version\n$ mpirun -n 4 python myscript.py\n$ # srun version\n$ srun -N 1 --tasks-per-node 4 -p pbatch python myscript.py\n```\n\n### PyTorch Integration\n\nThe `torch` version of `MuyGPyS` allows for construction and training of complex\nkernels, e.g., convolutional neural network kernels. All low-level math is done\non `torch.Tensor` objects. Due to `PyTorch`'s lack of support for the Bessel \nfunction of the second kind, we only support special cases of the Matern kernel,\nin particular when the smoothness parameter is $\\nu = 1/2, 3/2,$ or $5/2$. The\nRBF kernel is supported as the Matern kernel with $\\nu = \\infty$. \n\nThe `MuyGPyS` framework is implemented as a custom `PyTorch` layer. In the \nhigh-level API found in `examples/muygps_torch`, a `PyTorch` MuyGPs `model` is \nassumed to have two components: a `model.embedding` which deforms the original \nfeature data, and a `model.GP_layer` which does Gaussian Process regression on \nthe deformed feature space. A code example is provided below.\n\nMost users will want to use the `MuyGPyS.torch.muygps_layer` module to construct \na custom MuyGPs model. The model can then be calibrated using a standard \nPyTorch training loop. An example of the approach based on the low-level API \nis provided in `docs/examples/torch_tutorial.ipynb`.\n\nIn order to use the `MuyGPyS` torch backend, run the following command in your \nshell environment.\n\n```\n$ export MUYGPYS_BACKEND=torch\n```\n\nOne can also use the following workflow to programmatically set the backend to\ntorch, although the environment variable method is preferred.\n\n```\nfrom MuyGPyS import config\nMuyGPyS.config.update(\"muygpys_backend\",\"torch\")\n\n...subsequent imports from MuyGPyS\n```\n\n### Just-In-Time Compilation with JAX\n\n`MuyGPyS` supports just-in-time compilation of the\nunderlying math functions to CPU or GPU using\n[JAX](https://github.com/google/jax) since version v0.5.0.\nThe JAX-compiled versions of the code are significantly faster than numpy,\nespecially on GPUs.\nIn order to use the `MuyGPyS` torch backend, run the following command in your\nshell environment.\n\n```\n$ export MUYGPYS_BACKEND=jax\n```\n\n\u003e **_NOTE_**: There is a known conflict between recent versions of `MuyGPyS` and\n`JAX` on Python $\\geq$ 3.9.\nThe current fix is to downgrade to Python 3.8.\n\n## Precision\n\nJAX and torch use 32 bit types by default, whereas numpy tends to promote\neverything to 64 bits.\nFor highly stable operations like matrix multiplication, this difference in\nprecision tends to result in a roughly `1e-8` disagreement between 64 bit and 32\nbit implementations.\nHowever, `MuyGPyS` depends upon matrix-vector solves, which can result in\ndisagreements up to `1e-2`.\nHence, `MuyGPyS` forces all back end implementations to use 64 bit types by\ndefault.\n\nHowever, the 64 bit operations are slightly slower than their 32 bit\ncounterparts, and limit throughput on GPUs.\n`MuyGPyS` accordingly supports 32 bit types, but this feature is experimental\nand might have sharp edges.\nFor example, `MuyGPyS` might throw errors or otherwise behave strangely if the\nuser passes arrays of 64 bit types while in 32 bit mode.\nBe sure to set your data types appropriately.\n\nA user can have `MuyGPyS`use 32 bit types by setting the `MUYGPYS_FTYPE`\nenvironment variable to `\"32\"`, e.g.\n```\n$ export MUYGPYS_FTYPE=32  # use 32 bit types in MuyGPyS functions\n```\nIt is also possible to manipulate `MuyGPyS.config` to switch between types\nprogrammatically.\nThis is not advisable unless the user knows exactly what they are doing.\n\n## Installation\n\n### Installation using Pip: CPU\n\nThe index `muygpys` is maintained on PyPI and can be installed using `pip`.\n`muygpys` supports many optional extras flags, which will install additional\ndependencies if specified. \nIf installing CPU-only with pip, you might want to consider the following flags:  \nThese extras include:\n- `hnswlib` - install [hnswlib](https://github.com/nmslib/hnswlib) dependency to\nsupport fast approximate nearest neighbors indexing\n- `jax_cpu` - install [JAX](https://github.com/google/jax) dependencies to \nsupport just-in-time compilation of math functions on CPU (see below to install\non GPU CUDA architectures)\n- `torch` - install [PyTorch](https://github.com/pytorch/pytorch) dependencies\nto employ GPU acceleration and the use of the `MuyGPyS.torch` submodule\n- `mpi` - install [MPI](https://github.com/mpi4py/mpi4py) dependencies to\nsupport distributed memory parallel computation. Requires that the user has\ninstalled a version of MPI such as\n[mvapich](https://mvapich.cse.ohio-state.edu/) or\n[open-mpi](https://github.com/open-mpi/ompi).\n```\n$ # numpy-only installation. Functions will internally use numpy.\n$ pip install --upgrade muygpys\n\n$ # The same, but includes hnswlib.\n$ pip install --upgrade muygpys[hnswlib]\n\n$ # CPU-only JAX installation. Functions will be jit-compiled using JAX.\n$ pip install --upgrade muygpys[jax_cpu]\n\n$ # The same, but includes hnswlib.\n$ pip install --upgrade muygpys[jax_cpu,hnswlib]\n\n$ # MPI installation. Functions will operate in distributed memory.\n$ pip install --upgrade muygpys[mpi]\n\n$ # The same, but includes hnswlib.\n$ pip install --upgrade muygpys[mpi,hnswlib]\n\n$ # pytorch installation. MuyGPyS.torch will be usable.\n$ pip install --upgrade muygpys[torch]\n```\n\n### Installation using Pip: GPU (CUDA)\n\n#### JAX GPU Instructions\n\n[JAX](https://github.com/google/jax) also supports just-in-time compilation to\nCUDA, making the compiled math functions within `MuyGPyS` runnable on NVidia \nGPUS.\nThis requires you to install \n[CUDA](https://developer.nvidia.com/cuda-downloads) and\n[CuDNN](https://developer.nvidia.com/CUDNN)\nin your environment, if they are not already installed, and to ensure that they\nare on your environment's `$LD_LIBRARY_PATH`. \nSee [scripts](scripts/lc-setup/pascal.sh) for an example environment setup.\n\n`MuyGPyS` no longer supports automated GPU-supported JAX installation using pip\nextras.\nTo install JAX as a dependency for `MuyGPyS` to be deployed on cuda-capable\nGPUs, please read and follow the\n[JAX installation instructions](https://github.com/google/jax#installation).\nAfter installing JAX, the user will also need to install\n[Tensorflow Probability](https://github.com/tensorflow/probability) with a JAX\nbackend via\n```\npip install tensorflow-probability[jax]\u003e=0.16.0\n```\n\n#### PyTorch GPU Instructions\n\nMuyGPyS does not and most likely will not support installing CUDA PyTorch with\nan extras flag.\nPlease [install PyTorch separately](https://pytorch.org/get-started/locally/).\n\n### Installation From Source\n\nThis repository includes several `extras_require` optional dependencies.\n- `tests` - install dependencies necessary to run [tests](tests/)\n- `docs` - install dependencies necessary to build the docs\n- `dev` - install dependencies for maintaining code style, running performance\nbenchmarks, linting, and packaging\n\nFor example, follow these instructions to install from source for development \npurposes with CPU JAX support:\n```\n$ git clone git@github.com:LLNL/MuyGPyS.git\n$ cd MuyGPyS\n$ pip install -e .[dev,jax_cpu]\n```\n\nIf you would like to perform a GPU installation from source, you will need to\ninstall the JAX dependency directly.\n\nAdditionally check out the develop branch to access the latest features in \nbetween stable releases.\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for contribution rules. \n\n### Full list of extras flags\n\n- `hnswlib` - install [hnswlib](https://github.com/nmslib/hnswlib) dependency to\nsupport fast approximate nearest neighbors indexing\n- `jax_cpu` - install [JAX](https://github.com/google/jax) dependencies to \nsupport just-in-time compilation of math functions on CPU (see below to install\non GPU CUDA architectures)\n- `torch` - install [PyTorch](https://github.com/pytorch/pytorch)\n- `mpi` - install [MPI](https://github.com/mpi4py/mpi4py) dependency to support\nparallel computation\n- `tests` - install dependencies necessary to run [tests](tests/)\n- `docs` - install dependencies necessary to build the [docs](docs/)\n- `dev` - install dependencies for maintaining code style, linting, and \npackaging\n\n## Building Docs\n\nIn order to build the docs locally, first `pip` install from source using either \nthe `docs` or `dev` options and then execute:\n```\n$ sphinx-build -b html docs docs/_build/html\n```\nFinally, open the file `docs/_build/html/index.html` in your browser of choice.\n\n## Testing\n\nIn order to run tests locally, first `pip` install `MuyGPyS` from source using \nthe `tests` option.\nAll tests in the `tests/` directory are then runnable as python scripts, e.g.\n```\n$ python tests/kernels.py\n```\n\nIndividual `absl` unit test classes can be run in isolation, e.g.\n```\n$ python tests/kernels.py DistancesTest\n```\nIt is also possible to run a single method from a test case:\n```\n$ python tests/kernels.py DistancesTest.test_l2\n```\n\nThe user can run most tests in all backends.\nSome tests use backend-dependent features, and will fail with informative error\nmessages when attempting an unsupported backend.\nThe user needs to set `MUYGPYS_BACKEND` and possibly `MUYGPYS_FTYPE` prior to\nrunning the desired test, e.g.,\n```\n$ export MUYGPYS_BACKEND=jax\n$ python tests/kernels.py\n```\nor\n```\n$ export MUYGPYS_BACKEND=torch\n$ export MUYGPYS_FTYPE=32\n$ python tests/backends/torch_correctness.py\n```\n\nIf the MPI dependencies are installed, the user can also run `absl` tests using\nMPI, e.g. using `mpirun`\n```\n$ export MUYGPYS_BACKEND=mpi\n$ mpirun -n 4 python tests/kernels.py\n```\nor using `srun`\n```\n$ export MUYGPYS_BACKEND=mpi\n$ srun -N 1 --tasks-per-node 4 -p pdebug python tests/kernels.py\n```\n\n# About\n\n## Authors\n\n* Benjamin W. Priest (priest2 at llnl dot gov)\n* Amanda L. Muyskens (muyskens1 at llnl dot gov)\n* Imène Goumiri (goumiri1 at llnl dot gov)\n\n## Papers\n\nMuyGPyS has been used the in the following research papers (newest first):\n\n1. [A Robust Approach to Gaussian Process Implementation](https://arxiv.org/abs/2409.11577)\n1. [Enhancing Electrocardiography Data Classification Confidence: A Robust Gaussian Process Approach (MuyGPs)](https://arxiv.org/abs/2409.04642)\n1. [Stellar Blend Image Classification Using Computationall Efficient Gaussian Processes](https://arxiv.org/abs/2407.19297)\n1. [Closely-Spaced Object Classification Using MuyGPyS](https://arxiv.org/abs/2311.10904)\n1. [Light Curve Forecasting and Anomaly Detection Using Scalable, Anisotropic, and Heteroscedastic Gaussian Process Models](https://amostech.com/TechnicalPapers/2023/Poster/Goumiri.pdf)\n1. [Scalable Gaussian Process Hyperparameter Optimization via Coverage Regularization](http://export.arxiv.org/abs/2209.11280)\n1. [Bayesian Hyperparameter Optimization in Gaussian Processes using Statistical Coverage](https://www.osti.gov/biblio/1902019)\n1. [Light Curve Completion and Forecasting Using Fast and Scalable Gaussian Processes (MuyGPs)](https://arxiv.org/abs/2208.14592)\n1. [Fast Gaussian Process Posterior Mean Prediction via Local Cross Validation and Precomputation](https://arxiv.org/abs/2205.10879v1)\n1. [Gaussian Process Classification of Galaxy Blend Identification in LSST](https://arxiv.org/abs/2107.09246)\n1. [MuyGPs: Scalable Gaussian Process Hyperparameter Estimation Using Local Cross-validation](https://arxiv.org/abs/2104.14581)\n1. [Star-Galaxy Image Separation with Computationally Efficient Gaussian Process Classification](https://arxiv.org/abs/2105.01106)\n1. [Genetic Algorithm for Hyperparameter Optimization in Gaussian Process Modeling](https://www.osti.gov/biblio/1659396)\n1. [Star-Galaxy Separation via Gaussian Processes with Model Reduction](https://arxiv.org/abs/2010.06094)\n\n## Citation\n\nIf you use MuyGPyS in a research paper, please reference our article:\n\n```\n@article{muygps2021,\n  title={MuyGPs: Scalable Gaussian Process Hyperparameter Estimation Using Local Cross-Validation},\n  author={Muyskens, Amanda and Priest, Benjamin W. and Goumiri, Im{\\`e}ne and \n  Schneider, Michael},\n  journal={arXiv preprint arXiv:2104.14581},\n  year={2021}\n}\n```\n\n## License\n\nMuyGPyS is distributed under the terms of the MIT license.\nAll new contributions must be made under the MIT license.\n\nSee [LICENSE-MIT](LICENSE-MIT), [NOTICE](NOTICE), and [COPYRIGHT](COPYRIGHT) for \ndetails.\n\nSPDX-License-Identifier: MIT\n\n## Release\n\nLLNL-CODE-824804\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllnl%2Fmuygpys","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fllnl%2Fmuygpys","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllnl%2Fmuygpys/lists"}