{"id":19946124,"url":"https://github.com/quantco/glum","last_synced_at":"2026-03-16T12:08:30.759Z","repository":{"id":37076847,"uuid":"250074055","full_name":"Quantco/glum","owner":"Quantco","description":"High performance Python GLMs with all the features!","archived":false,"fork":false,"pushed_at":"2025-05-14T12:39:48.000Z","size":31499,"stargazers_count":334,"open_issues_count":36,"forks_count":28,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-05-16T13:01:33.748Z","etag":null,"topics":["elastic-net","gamma","glm","lasso","logit","poisson","ridge","tweedie"],"latest_commit_sha":null,"homepage":"https://glum.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Quantco.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rst","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-03-25T19:37:22.000Z","updated_at":"2025-05-14T12:39:50.000Z","dependencies_parsed_at":"2023-12-16T05:10:58.277Z","dependency_job_id":"6118c42f-440e-4ec3-98c9-cb373e5475fb","html_url":"https://github.com/Quantco/glum","commit_stats":{"total_commits":587,"total_committers":34,"mean_commits":"17.264705882352942","dds":0.7325383304940375,"last_synced_commit":"89be1ac722e1fc3b819fb36b95e59ad219e152af"},"previous_names":[],"tags_count":53,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fglum","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fglum/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fglum/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fglum/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Quantco","download_url":"https://codeload.github.com/Quantco/glum/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254535792,"owners_count":22087397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elastic-net","gamma","glm","lasso","logit","poisson","ridge","tweedie"],"created_at":"2024-11-13T00:28:22.882Z","updated_at":"2026-03-16T12:08:30.753Z","avatar_url":"https://github.com/Quantco.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# glum\n\n[![CI](https://github.com/Quantco/glum/actions/workflows/ci.yml/badge.svg)](https://github.com/Quantco/glum/actions)\n[![Daily runs](https://github.com/Quantco/glum/actions/workflows/daily.yml/badge.svg)](https://github.com/Quantco/glum/actions/workflows/daily.yml)\n[![Docs](https://readthedocs.org/projects/pip/badge/?version=latest\u0026style=flat)](https://glum.readthedocs.io/)\n[![Conda-forge](https://img.shields.io/conda/vn/conda-forge/glum?logoColor=white\u0026logo=conda-forge)](https://anaconda.org/conda-forge/glum)\n[![PypiVersion](https://img.shields.io/pypi/v/glum.svg?logo=pypi\u0026logoColor=white)](https://pypi.org/project/glum)\n[![PythonVersion](https://img.shields.io/pypi/pyversions/glum?logoColor=white\u0026logo=python)](https://pypi.org/project/glum)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14991108.svg)](https://doi.org/10.5281/zenodo.14991108)\n\n\n[Documentation](https://glum.readthedocs.io/en/latest/)\n\nGeneralized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `glum`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!\n\nWe believe that for GLM development, broad support for distributions, regularization, and statistical inference, along with fast formula-based specification, is key. `glum` supports\n\n* Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”\n* L1 regularization, which produces sparse and easily interpretable solutions\n* L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects\n* Elastic net regularization\n* Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions\n* Built-in formula-based model specification using `formulaic`\n* Classical statistical inference for unregularized models\n* Box constraints, linear inequality constraints, sample weights, offsets\n\nPerformance also matters, so we conducted extensive benchmarks against other modern libraries. Although performance depends on the specific problem, we find that when N \u003e\u003e K (there are more observations than predictors), `glum` is consistently much faster for a wide range of problems. This repo includes the benchmarking tools in the `glum_benchmarks` module. For details, [see here](glum_benchmarks/README.md).\n\n\u003c!-- BENCHMARK_FIGURES_START --\u003e\n\u003cimg src=\"docs/_static/wide-insurance-gamma-normalized.png#gh-light-mode-only\" alt=\"Benchmark results\" width=\"600\"\u003e\n\u003cimg src=\"docs/_static/wide-insurance-gamma-normalized_dark.png#gh-dark-mode-only\" alt=\"Benchmark results\" width=\"600\"\u003e\n\u003c!-- BENCHMARK_FIGURES_END --\u003e\n\nFor more information on `glum`, including tutorials and API reference, please see [the documentation](https://glum.readthedocs.io/en/latest/).\n\nWhy did we choose the name `glum`? We wanted a name that had the letters GLM and wasn't easily confused with any existing implementation. And we thought glum sounded like a funny name (and not glum at all!). If you need a more professional sounding name, feel free to pronounce it as G-L-um. Or maybe it stands for \"Generalized linear... ummm... modeling?\"\n\n# A classic example predicting housing prices\n\n```python\n\u003e\u003e\u003e import pandas as pd\n\u003e\u003e\u003e from sklearn.datasets import fetch_openml\n\u003e\u003e\u003e from glum import GeneralizedLinearRegressor\n\u003e\u003e\u003e\n\u003e\u003e\u003e # This dataset contains house sale prices for King County, which includes\n\u003e\u003e\u003e # Seattle. It includes homes sold between May 2014 and May 2015.\n\u003e\u003e\u003e # The full version of this dataset can be found at:\n\u003e\u003e\u003e # https://www.openml.org/search?type=data\u0026status=active\u0026id=42092\n\u003e\u003e\u003e house_data = pd.read_parquet(\"data/housing.parquet\")\n\u003e\u003e\u003e\n\u003e\u003e\u003e # Use only select features\n\u003e\u003e\u003e X = house_data[\n...     [\n...         \"bedrooms\",\n...         \"bathrooms\",\n...         \"sqft_living\",\n...         \"floors\",\n...         \"waterfront\",\n...         \"view\",\n...         \"condition\",\n...         \"grade\",\n...         \"yr_built\",\n...         \"yr_renovated\",\n...     ]\n... ].copy()\n\u003e\u003e\u003e\n\u003e\u003e\u003e\n\u003e\u003e\u003e # Model whether a house had an above or below median price via a Binomial\n\u003e\u003e\u003e # distribution. We'll be doing L1-regularized logistic regression.\n\u003e\u003e\u003e price = house_data[\"price\"]\n\u003e\u003e\u003e y = (price \u003c price.median()).values.astype(int)\n\u003e\u003e\u003e model = GeneralizedLinearRegressor(\n...     family='binomial',\n...     l1_ratio=1.0,\n...     alpha=0.001\n... )\n\u003e\u003e\u003e\n\u003e\u003e\u003e _ = model.fit(X=X, y=y)\n\u003e\u003e\u003e\n\u003e\u003e\u003e # .report_diagnostics shows details about the steps taken by the iterative solver.\n\u003e\u003e\u003e diags = model.get_formatted_diagnostics(full_report=True)\n\u003e\u003e\u003e diags[['objective_fct']]\n        objective_fct\nn_iter               \n0            0.693091\n1            0.489500\n2            0.449585\n3            0.443681\n4            0.443498\n5            0.443497\n\u003e\u003e\u003e\n\u003e\u003e\u003e # Models can also be built with formulas from formulaic.\n\u003e\u003e\u003e model_formula = GeneralizedLinearRegressor(\n...     family='binomial',\n...     l1_ratio=1.0,\n...     alpha=0.001,\n...     formula=\"bedrooms + np.log(bathrooms + 1) + bs(sqft_living, 3) + C(waterfront)\"\n... )\n\u003e\u003e\u003e _ = model_formula.fit(X=house_data, y=y)\n\n```\n\n# Installation\n\nPlease install the package through conda-forge:\n```bash\nconda install glum -c conda-forge\n```\n\n# Performance\n\nFor optimal performance on an x86_64 architecture, we recommend using the MKL library\n(`conda install mkl`). By default, conda usually installs the openblas version, which\nis slower, but supported on all major architecture and OS.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantco%2Fglum","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantco%2Fglum","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantco%2Fglum/lists"}