{"id":13689082,"url":"https://github.com/Quantco/glum","last_synced_at":"2025-05-01T23:32:05.118Z","repository":{"id":37076847,"uuid":"250074055","full_name":"Quantco/glum","owner":"Quantco","description":"High performance Python GLMs with all the features!","archived":false,"fork":false,"pushed_at":"2024-04-15T14:06:23.000Z","size":29612,"stargazers_count":282,"open_issues_count":30,"forks_count":23,"subscribers_count":15,"default_branch":"main","last_synced_at":"2024-04-17T13:17:42.184Z","etag":null,"topics":["elastic-net","gamma","glm","lasso","logit","poisson","ridge","tweedie"],"latest_commit_sha":null,"homepage":"https://glum.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Quantco.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rst","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-03-25T19:37:22.000Z","updated_at":"2024-04-23T12:36:57.363Z","dependencies_parsed_at":"2023-12-16T05:10:58.277Z","dependency_job_id":"6118c42f-440e-4ec3-98c9-cb373e5475fb","html_url":"https://github.com/Quantco/glum","commit_stats":{"total_commits":587,"total_committers":34,"mean_commits":"17.264705882352942","dds":0.7325383304940375,"last_synced_commit":"89be1ac722e1fc3b819fb36b95e59ad219e152af"},"previous_names":[],"tags_count":47,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fglum","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fglum/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fglum/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantco%2Fglum/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Quantco","download_url":"https://codeload.github.com/Quantco/glum/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224282107,"owners_count":17285771,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elastic-net","gamma","glm","lasso","logit","poisson","ridge","tweedie"],"created_at":"2024-08-02T15:01:33.193Z","updated_at":"2025-05-01T23:32:05.090Z","avatar_url":"https://github.com/Quantco.png","language":"Python","readme":"# glum\n\n[![CI](https://github.com/Quantco/glum/actions/workflows/ci.yml/badge.svg)](https://github.com/Quantco/glum/actions)\n[![Daily runs](https://github.com/Quantco/glum/actions/workflows/daily.yml/badge.svg)](https://github.com/Quantco/glum/actions/workflows/daily.yml)\n[![Docs](https://readthedocs.org/projects/pip/badge/?version=latest\u0026style=flat)](https://glum.readthedocs.io/)\n[![Conda-forge](https://img.shields.io/conda/vn/conda-forge/glum?logoColor=white\u0026logo=conda-forge)](https://anaconda.org/conda-forge/glum)\n[![PypiVersion](https://img.shields.io/pypi/v/glum.svg?logo=pypi\u0026logoColor=white)](https://pypi.org/project/glum)\n[![PythonVersion](https://img.shields.io/pypi/pyversions/glum?logoColor=white\u0026logo=python)](https://pypi.org/project/glum)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14991108.svg)](https://doi.org/10.5281/zenodo.14991108)\n\n\n[Documentation](https://glum.readthedocs.io/en/latest/)\n\nGeneralized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `glum`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!\n\nThe goal of `glum` is to be at least as feature-complete as existing GLM libraries like `glmnet` or `h2o`. It supports\n\n* Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”\n* L1 regularization, which produces sparse and easily interpretable solutions\n* L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects\n* Elastic net regularization\n* Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions\n* Box constraints, linear inequality constraints, sample weights, offsets\n\nThis repo also includes tools for benchmarking GLM implementations in the `glum_benchmarks` module. For details on the benchmarking, [see here](src/glum_benchmarks/README.md). Although the performance of `glum` relative to `glmnet` and `h2o` depends on the specific problem, we find that when N \u003e\u003e K (there are more observations than predictors), it is consistently much faster for a wide range of problems.\n\n![Performance benchmarks](docs/_static/headline_benchmark.png#gh-light-mode-only)\n![Performance benchmarks](docs/_static/headline_benchmark_dark.png#gh-dark-mode-only)\n\nFor more information on `glum`, including tutorials and API reference, please see [the documentation](https://glum.readthedocs.io/en/latest/).\n\nWhy did we choose the name `glum`? We wanted a name that had the letters GLM and wasn't easily confused with any existing implementation. And we thought glum sounded like a funny name (and not glum at all!). If you need a more professional sounding name, feel free to pronounce it as G-L-um. Or maybe it stands for \"Generalized linear... ummm... modeling?\"\n\n# A classic example predicting housing prices\n\n```python\n\u003e\u003e\u003e import pandas as pd\n\u003e\u003e\u003e from sklearn.datasets import fetch_openml\n\u003e\u003e\u003e from glum import GeneralizedLinearRegressor\n\u003e\u003e\u003e\n\u003e\u003e\u003e # This dataset contains house sale prices for King County, which includes\n\u003e\u003e\u003e # Seattle. It includes homes sold between May 2014 and May 2015.\n\u003e\u003e\u003e # The full version of this dataset can be found at:\n\u003e\u003e\u003e # https://www.openml.org/search?type=data\u0026status=active\u0026id=42092\n\u003e\u003e\u003e house_data = pd.read_parquet(\"data/housing.parquet\")\n\u003e\u003e\u003e\n\u003e\u003e\u003e # Use only select features\n\u003e\u003e\u003e X = house_data[\n...     [\n...         \"bedrooms\",\n...         \"bathrooms\",\n...         \"sqft_living\",\n...         \"floors\",\n...         \"waterfront\",\n...         \"view\",\n...         \"condition\",\n...         \"grade\",\n...         \"yr_built\",\n...         \"yr_renovated\",\n...     ]\n... ].copy()\n\u003e\u003e\u003e\n\u003e\u003e\u003e\n\u003e\u003e\u003e # Model whether a house had an above or below median price via a Binomial\n\u003e\u003e\u003e # distribution. We'll be doing L1-regularized logistic regression.\n\u003e\u003e\u003e price = house_data[\"price\"]\n\u003e\u003e\u003e y = (price \u003c price.median()).values.astype(int)\n\u003e\u003e\u003e model = GeneralizedLinearRegressor(\n...     family='binomial',\n...     l1_ratio=1.0,\n...     alpha=0.001\n... )\n\u003e\u003e\u003e\n\u003e\u003e\u003e _ = model.fit(X=X, y=y)\n\u003e\u003e\u003e\n\u003e\u003e\u003e # .report_diagnostics shows details about the steps taken by the iterative solver.\n\u003e\u003e\u003e diags = model.get_formatted_diagnostics(full_report=True)\n\u003e\u003e\u003e diags[['objective_fct']]\n        objective_fct\nn_iter               \n0            0.693091\n1            0.489500\n2            0.449585\n3            0.443681\n4            0.443498\n5            0.443497\n\u003e\u003e\u003e\n\u003e\u003e\u003e # Models can also be built with formulas from formulaic.\n\u003e\u003e\u003e model_formula = GeneralizedLinearRegressor(\n...     family='binomial',\n...     l1_ratio=1.0,\n...     alpha=0.001,\n...     formula=\"bedrooms + np.log(bathrooms + 1) + bs(sqft_living, 3) + C(waterfront)\"\n... )\n\u003e\u003e\u003e _ = model_formula.fit(X=house_data, y=y)\n\n```\n\n# Installation\n\nPlease install the package through conda-forge:\n```bash\nconda install glum -c conda-forge\n```\n\n# Performance\n\nFor optimal performance on an x86_64 architecture, we recommend using the MKL library\n(`conda install mkl`). By default, conda usually installs the openblas version, which\nis slower, but supported on all major architecture and OS.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FQuantco%2Fglum","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FQuantco%2Fglum","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FQuantco%2Fglum/lists"}