{"id":13689396,"url":"https://github.com/skggm/skggm","last_synced_at":"2025-04-04T21:11:45.741Z","repository":{"id":11495916,"uuid":"60922229","full_name":"skggm/skggm","owner":"skggm","description":"Scikit-learn compatible estimation of general graphical models","archived":false,"fork":false,"pushed_at":"2024-03-20T15:29:38.000Z","size":9033,"stargazers_count":247,"open_issues_count":35,"forks_count":45,"subscribers_count":9,"default_branch":"develop","last_synced_at":"2025-04-04T20:11:21.559Z","etag":null,"topics":["concentration-graph","covariance-matrix","ensemble-learning","gaussian-graphical-models","general-graphical-models","graphical-lasso","graphical-models","machine-learning","nonparametric","precision-matrix","rank-correlation","scikit-learn","skggm"],"latest_commit_sha":null,"homepage":"https://skggm.github.io/skggm/tour","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/skggm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-06-11T18:35:56.000Z","updated_at":"2025-02-16T09:55:14.000Z","dependencies_parsed_at":"2024-01-18T07:14:30.699Z","dependency_job_id":"7efe8524-9df8-45d6-88d8-69d7be9b6e7e","html_url":"https://github.com/skggm/skggm","commit_stats":{"total_commits":640,"total_committers":9,"mean_commits":71.11111111111111,"dds":"0.17031249999999998","last_synced_commit":"b34bae80fa2405e8a4734ab5e65b6c87ce8a8d7d"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skggm%2Fskggm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skggm%2Fskggm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skggm%2Fskggm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skggm%2Fskggm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/skggm","download_url":"https://codeload.github.com/skggm/skggm/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247249536,"owners_count":20908212,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["concentration-graph","covariance-matrix","ensemble-learning","gaussian-graphical-models","general-graphical-models","graphical-lasso","graphical-models","machine-learning","nonparametric","precision-matrix","rank-correlation","scikit-learn","skggm"],"created_at":"2024-08-02T15:01:46.229Z","updated_at":"2025-04-04T21:11:45.713Z","avatar_url":"https://github.com/skggm.png","language":"Python","funding_links":[],"categories":["Sklearn实用程序","Python","Feature Extraction"],"sub_categories":["General Feature Extraction"],"readme":"[![Build Status](https://travis-ci.org/skggm/skggm.svg?branch=develop)](https://travis-ci.org/skggm/skggm)\n[![GitHub version](https://badge.fury.io/gh/skggm%2Fskggm.svg)](https://badge.fury.io/gh/skggm%2Fskggm)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.830033.svg)](https://doi.org/10.5281/zenodo.830033)\n\n\n# skggm : Gaussian graphical models using the scikit-learn API\nIn the last decade, learning networks that encode conditional independence relationships has become an  important problem in machine learning and statistics. For many important probability distributions, such as multivariate Gaussians, this amounts to estimation of inverse covariance matrices. Inverse covariance estimation is now used widely in infer gene regulatory networks in cellular biology and neural interactions in the neuroscience.\n\nHowever, many statistical advances and best practices in fitting such models to data are not yet widely adopted and not available in common python packages for machine learning. Furthermore, inverse covariance estimation is an active area of research where researchers continue to improve algorithms and estimators.\nWith `skggm` we seek to provide these new developments to a wider audience, and also enable researchers to effectively benchmark their methods in regimes relevant to their applications of interest.\n\nWhile `skggm` is currently geared toward _Gaussian graphical models_, we hope to eventually evolve it to support _General graphical models_.  Read more [here](https://skggm.github.io/skggm/tour).\n\n\n## Inverse Covariance Estimation\n\nGiven **n** independently drawn, **p**-dimensional Gaussian random samples \u003cimg src=\"images/X.png\" alt=\"X\" width=\"80\"\u003e with sample covariance \u003cimg src=\"images/sigma_hat.png\" alt=\"S\" width=\"13\"\u003e, the maximum likelihood estimate of the inverse covariance matrix \u003cimg src=\"images/Theta.png\" alt=\"\\lambda\" width=\"12\"\u003e can be computed via the _graphical lasso_, i.e., the program\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/graphlasso_program.png\" alt=\"\\ell_1 penalized inverse covariance estimation\" width=\"480\"\u003e\u003c/p\u003e\n\nwhere \u003cimg src=\"images/Lambda.png\" alt=\"\\Lambda\" width=\"80\"\u003e is a symmetric matrix with non-negative entries and\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/penalty.png\" alt=\"penalty\" width=\"200\"\u003e\u003c/p\u003e\n\nTypically, the diagonals are not penalized by setting \u003cimg src=\"images/lambda_diagonals.png\" alt=\"diagonals\" width=\"170\"\u003e to ensure that \u003cimg src=\"images/Theta.png\" alt=\"Theta\" width=\"13\"\u003e remains positive definite. The objective reduces to the standard graphical lasso formulation of [Friedman et al.](http://statweb.stanford.edu/~tibs/ftp/glasso-bio.pdf) when all off diagonals of the penalty matrix take a constant scalar value \u003cimg src=\"images/scalar_penalty.png\" alt=\"scalar_penalty\" width=\"170\"\u003e. The standard graphical lasso has been implemented in [scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.covariance.GraphLassoCV.html).\n\nIn this package we provide a [scikit-learn](http://scikit-learn.org)-compatible implementation of the program above and a collection of modern best practices for working with the graphical lasso. A rough breakdown of how this package differs from scikit's built-in `GraphLasso` is depicted by this chart:\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/sklearn_skggm_compare.png\" alt=\"sklearn/skggm feature comparison\" width=\"600\"\u003e\u003c/p\u003e\n\n### Quick start\nTo get started, install the package (via pip, see below) and:\n\n- read the tour of skggm at [https://skggm.github.io/skggm/tour](https://skggm.github.io/skggm/tour)\n- read [@mnarayan](https://github.com/mnarayan)'s [talk](https://dx.doi.org/10.6084/m9.figshare.4003380) and check out the companion examples [here](https://github.com/neuroquant/jf2016-skggm) (live via binder at [here](http://mybinder.org/repo/neuroquant/jf2016-skggm)). Presented at HHMI, Janelia Farms, October 2016.\n- basic usage examples can be found in [examples/estimator_suite.py](https://github.com/skggm/skggm/blob/master/examples/estimator_suite.py)\n\n---\n\nThis is an ongoing effort. We'd love your feedback on which algorithms and techniques we should include and how you're using the package. We also welcome contributions.\n\n[@jasonlaska](https://github.com/jasonlaska) and [@mnarayan](https://github.com/mnarayan)\n\n---\n\n## Included in `inverse_covariance`\nAn overview of the skggm graphical lasso facilities is depicted by the following diagram:\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/skggm_workflow.png\" alt=\"sklearn/skggm feature comparison\" width=\"600\"\u003e\u003c/p\u003e\n\nInformation on basic usage can be found at [https://skggm.github.io/skggm/tour](https://skggm.github.io/skggm/tour).  The package includes the following classes and submodules.\n\n- **QuicGraphicalLasso** [[doc]](https://github.com/skggm/skggm/blob/master/inverse_covariance/quic_graph_lasso.py#L165)\n\n    _QuicGraphicalLasso_ is an implementation of [QUIC](http://jmlr.org/papers/volume15/hsieh14a/hsieh14a.pdf) wrapped as a scikit-learn compatible estimator \\[[Hsieh et al.](http://jmlr.org/papers/volume15/hsieh14a/hsieh14a.pdf)\\] . The estimator can be run in `default` mode for a fixed penalty or in `path` mode to explore a sequence of penalties efficiently.  The penalty `lam` can be a scalar or matrix.\n\n    The primary outputs of interest are: `covariance_`, `precision_`, and `lam_`.\n\n    The interface largely mirrors the built-in _[GraphLasso](http://scikit-learn.org/stable/modules/generated/sklearn.covariance.GraphLasso.html)_ although some param names have been changed (e.g., `alpha` to `lam`). Some notable advantages of this implementation over _GraphicalLasso_ are support for a matrix penalization term and speed.\n\n- **QuicGraphicalLassoCV** [[doc]](https://github.com/skggm/skggm/blob/master/inverse_covariance/quic_graph_lasso.py#L444)\n\n    _QuicGraphicalLassoCV_ is an optimized cross-validation model selection implementation similar to scikit-learn's _[GraphLassoCV](http://scikit-learn.org/stable/modules/generated/sklearn.covariance.GraphLassoCV.html)_. As with _QuicGraphicalLasso_, this implementation also supports matrix penalization.\n\n- **QuicGraphicalLassoEBIC** [[doc]](https://github.com/skggm/skggm/blob/master/inverse_covariance/quic_graph_lasso.py#L809)\n\n    _QuicGraphicalLassoEBIC_ is provided as a convenience class to use the _Extended Bayesian Information Criteria_ (EBIC) for model selection \\[[Foygel et al.](https://papers.nips.cc/paper/4087-extended-bayesian-information-criteria-for-gaussian-graphical-models)\\].\n\n- **ModelAverage** [[doc]](https://github.com/skggm/skggm/blob/master/inverse_covariance/model_average.py#L180)\n\n    _ModelAverage_ is an ensemble meta-estimator that computes several fits with a user-specified `estimator` and averages the support of the resulting precision estimates.  The result is a `proportion_` matrix indicating the sample probability of a non-zero at each index. This is a similar facility to scikit-learn's _[RandomizedLasso](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RandomizedLasso.html)_) but for the graph lasso.\n\n    In each trial, this class will:\n\n    1. Draw bootstrap samples by randomly subsampling **X**.\n\n    2. Draw a random matrix penalty.\n\n    The random penalty can be chosen in a variety of ways, specified by the `penalization` parameter.  This technique is also known as _stability selection_ or _random lasso_.\n\n- **AdaptiveGraphicalLasso** [[doc]](https://github.com/skggm/skggm/blob/master/inverse_covariance/adaptive_graph_lasso.py#L10)\n\n    _AdaptiveGraphicalLasso_ performs a two step estimation procedure:\n\n    1. Obtain an initial sparse estimate.\n\n    2. Derive a new penalization matrix from the original estimate.  We currently provide three methods for this: `binary`, `1/|coeffs|`, and `1/|coeffs|^2`.  The `binary` method only requires the initial estimate's support (and this can be be used with _ModelAverage_ below).\n\n    This technique works well to refine the non-zero precision values given a reasonable initial support estimate.\n\n- **inverse_covariance.plot_util.trace_plot**\n\n    Utility to plot `lam_` paths.\n\n- **inverse_covariance.profiling**\n\n    The `.profiling` submodule contains a `MonteCarloProfiling()` class for evaluating  methods over different graphs and metrics.  We currently include the following graph types:\n\n        - LatticeGraph\n        - ClusterGraph\n        - ErdosRenyiGraph (via sklearn)\n\n    An example of how to use these tools can be found in `examples/profiling_example.py`.\n\n## Parallelization Support\n\n`skggm` supports parallel computation through [joblib](http://pythonhosted.org/joblib/) and [Apache Spark](http://spark.apache.org/).  Independent trials, cross validation, and other _embarrassingly parallel_ operations can be farmed out to multiple processes, cores, or worker machines.  In particular,\n\n- `QuicGraphicalLassoCV`\n- `ModelAverage`\n- `profiling.MonteCarloProfile`\n\ncan make use of this through either the `n_jobs` or `sc` (sparkContext) parameters.\n\nSince these are naive implementations, it is not possible to enable parallel work on all three of objects simultaneously when they are being composited together. For example, in this snippet:\n\n    model = ModelAverage(\n        estimator=QuicGraphicalLassoCV(\n            cv=2,\n            n_refinements=6,\n        )\n        penalization=penalization,\n        lam=lam,\n        sc=spark.sparkContext,\n    )\n    model.fit(X)\n\nonly one of `ModelAverage` or `QuicGraphicalLassoCV` can make use of the spark context. The problem size and number of trials will determine the resolution that gives the fastest performance.\n\n\n## Installation\n\nBoth python2.7 and python3.6.x are supported. We use the [black autoformatter](https://github.com/ambv/black) to format our code. If contributing, please run this formatter checks will fail.\n\nClone this repo and run\n\n    python setup.py install\n\nor via PyPI\n\n    pip install skggm\n\nor from a cloned repo\n\n    cd inverse_covariance/pyquic\n    make\n    make python3  (for python3)\n\n**The package requires that `numpy`, `scipy`, and `cython` are installed independently into your environment first.**\n\nIf you would like to fork the pyquic bindings directly, use the Makefile provided in `inverse_covariance/pyquic`.\n\nThis package requires the `lapack` libraries to by installed on your system. A configuration example with these dependencies for Ubuntu and Anaconda 2 can be found [here](https://github.com/neuroquant/jf2016-skggm/blob/master/Dockerfile#L8-L13).\n\n## Tests\nTo run the tests, execute the following lines.\n\n    python -m pytest inverse_covariance (python3 -m pytest inverse_covariance)\n    black --check inverse_covariance\n    black --check examples\n\n# Examples\n\n## Usage\nIn `examples/estimator_suite.py` we reproduce the [plot_sparse_cov](http://scikit-learn.org/stable/auto_examples/covariance/plot_sparse_cov.html) example from the scikit-learn documentation for each method provided (however, the variations chosen are not exhaustive).\n\nAn example run for `n_examples=100` and `n_features=20` yielded the following results.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/estimator_suite_scorecard_100x20.png\" alt=\"(n_examples, n_features) = (100, 20)\" width=\"650\"\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/estimator_suite_plots_page0_100x20.png\" alt=\"(n_examples, n_features) = (100, 20)\" width=\"600\"\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/estimator_suite_plots_page1_100x20.png\" alt=\"(n_examples, n_features) = (100, 20)\" width=\"600\"\u003e\u003c/p\u003e\n\nFor slightly higher dimensions of `n_examples=600` and `n_features=120` we obtained:\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/estimator_suite_scorecard_600x120.png\" alt=\"(n_examples, n_features) = (600, 120)\" width=\"650\"\u003e\u003c/p\u003e\n\n## Plotting the regularization path\nWe've provided a utility function `inverse_covariance.plot_util.trace_plot` that can be used to display the coefficients as a function of `lam_`.  This can be used with any estimator that returns a path.  The example in `examples/trace_plot_example.py` yields:\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"images/trace_plot.png\" alt=\"Trace plot\" width=\"400\"\u003e\u003c/p\u003e\n\n# Citation\n\nIf you use *skggm* or reference our blog post in a presentation or publication, we would appreciate citations of our package.\n\n\u003eJason Laska, Manjari Narayan, 2017. _skggm 0.2.7: A scikit-learn compatible package for Gaussian and related Graphical Models._ doi:10.5281/zenodo.830033\n\nHere is the corresponding Bibtex entry\n```\n@misc{laska_narayan_2017_830033,\n  author       = {Jason Laska and\n                  Manjari Narayan},\n  title        = {{skggm 0.2.7: A scikit-learn compatible package for\n                   Gaussian and related Graphical Models}},\n  month        = jul,\n  year         = 2017,\n  doi          = {10.5281/zenodo.830033},\n  url          = {https://doi.org/10.5281/zenodo.830033}\n}\n```\n\n# References\n\n### BIC / EBIC Model Selection\n\n* [\"Extended Bayesian Information Criteria for Gaussian Graphical Models\"](https://papers.nips.cc/paper/4087-extended-bayesian-information-criteria-for-gaussian-graphical-models) R. Foygel and M. Drton NIPS 2010\n\n### QuicGraphicalLasso / QuicGraphicalLassoCV\n\n* [\"QUIC: Quadratic Approximation for sparse inverse covariance estimation\"](http://jmlr.org/papers/volume15/hsieh14a/hsieh14a.pdf) by C. Hsieh, M. A. Sustik, I. S. Dhillon, P. Ravikumar, Journal of Machine Learning Research (JMLR), October 2014.\n\n* QUIC implementation found [here](http://www.cs.utexas.edu/~sustik/QUIC/) and [here](http://bigdata.ices.utexas.edu/software/1035/) with cython bindings forked from [pyquic](https://github.com/osdf/pyquic)\n\n### Adaptive refitting (two-step methods)\n\n* [\"High dimensional covariance estimation based on Gaussian graphical models\"](http://www.jmlr.org/papers/volume12/zhou11a/zhou11a.pdf) S. Zhou, P. R{\\\"u}htimann, M. Xu, and P. B{\\\"u}hlmann\n\n* [\"Relaxed Lasso\"](http://stat.ethz.ch/~nicolai/relaxo.pdf) N. Meinshausen, December 2006.\n\n### Randomized model averaging\n\n* [\"Stability Selection\"](https://arxiv.org/pdf/0809.2932v2.pdf) N. Meinhausen and P. Buehlmann, May 2009\n\n* [\"Random Lasso\"](https://arxiv.org/abs/1104.3398) S. Wang, B. Nan, S. Rosset, and J. Zhu, Apr 2011\n\n* [\"Mixed effects models for resampled network statistics improves statistical power to find differences in multi-subject functional connectivity\"](http://biorxiv.org/content/early/2016/03/14/027516) M. Narayan and G. Allen, March 2016\n\n### Convergence test\n\n* [\"The graphical lasso: New Insights and alternatives\"](https://web.stanford.edu/~hastie/Papers/glassoinsights.pdf) Mazumder and Hastie, 2012.\n\n### Repeated KFold cross-validation\n\n* [\"Cross-validation pitfalls when selecting and assessing regression and classification models\"](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994246/) D. Krstajic, L. Buturovic, D. Leahy, and S. Thomas, 2014.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskggm%2Fskggm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fskggm%2Fskggm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskggm%2Fskggm/lists"}