https://github.com/Quantco/glum

High performance Python GLMs with all the features!
https://github.com/Quantco/glum

elastic-net gamma glm lasso logit poisson ridge tweedie

Last synced: 2 months ago
JSON representation

High performance Python GLMs with all the features!

Host: GitHub
URL: https://github.com/Quantco/glum
Owner: Quantco
License: bsd-3-clause
Created: 2020-03-25T19:37:22.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2024-04-15T14:06:23.000Z (about 1 year ago)
Last Synced: 2024-04-17T13:17:42.184Z (about 1 year ago)
Topics: elastic-net, gamma, glm, lasso, logit, poisson, ridge, tweedie
Language: Python
Homepage: https://glum.readthedocs.io/
Size: 28.2 MB
Stars: 282
Watchers: 15
Forks: 23
Open Issues: 30
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.rst
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

        # glum

[![CI](https://github.com/Quantco/glum/actions/workflows/ci.yml/badge.svg)](https://github.com/Quantco/glum/actions)

[![Daily runs](https://github.com/Quantco/glum/actions/workflows/daily.yml/badge.svg)](https://github.com/Quantco/glum/actions/workflows/daily.yml)

[![Docs](https://readthedocs.org/projects/pip/badge/?version=latest&style=flat)](https://glum.readthedocs.io/)

[![Conda-forge](https://img.shields.io/conda/vn/conda-forge/glum?logoColor=white&logo=conda-forge)](https://anaconda.org/conda-forge/glum)

[![PypiVersion](https://img.shields.io/pypi/v/glum.svg?logo=pypi&logoColor=white)](https://pypi.org/project/glum)

[![PythonVersion](https://img.shields.io/pypi/pyversions/glum?logoColor=white&logo=python)](https://pypi.org/project/glum)

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.14991108.svg)](https://doi.org/10.5281/zenodo.14991108)

[Documentation](https://glum.readthedocs.io/en/latest/)

Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `glum`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!

The goal of `glum` is to be at least as feature-complete as existing GLM libraries like `glmnet` or `h2o`. It supports

* Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”

* L1 regularization, which produces sparse and easily interpretable solutions

* L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects

* Elastic net regularization

* Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions

* Box constraints, linear inequality constraints, sample weights, offsets

This repo also includes tools for benchmarking GLM implementations in the `glum_benchmarks` module. For details on the benchmarking, [see here](src/glum_benchmarks/README.md). Although the performance of `glum` relative to `glmnet` and `h2o` depends on the specific problem, we find that when N >> K (there are more observations than predictors), it is consistently much faster for a wide range of problems.

![Performance benchmarks](docs/_static/headline_benchmark.png#gh-light-mode-only)

![Performance benchmarks](docs/_static/headline_benchmark_dark.png#gh-dark-mode-only)

For more information on `glum`, including tutorials and API reference, please see [the documentation](https://glum.readthedocs.io/en/latest/).

Why did we choose the name `glum`? We wanted a name that had the letters GLM and wasn't easily confused with any existing implementation. And we thought glum sounded like a funny name (and not glum at all!). If you need a more professional sounding name, feel free to pronounce it as G-L-um. Or maybe it stands for "Generalized linear... ummm... modeling?"

# A classic example predicting housing prices

```python

>>> import pandas as pd

>>> from sklearn.datasets import fetch_openml

>>> from glum import GeneralizedLinearRegressor

>>>

>>> # This dataset contains house sale prices for King County, which includes

>>> # Seattle. It includes homes sold between May 2014 and May 2015.

>>> # The full version of this dataset can be found at:

>>> # https://www.openml.org/search?type=data&status=active&id=42092

>>> house_data = pd.read_parquet("data/housing.parquet")

>>>

>>> # Use only select features

>>> X = house_data[

...     [

...         "bedrooms",

...         "bathrooms",

...         "sqft_living",

...         "floors",

...         "waterfront",

...         "view",

...         "condition",

...         "grade",

...         "yr_built",

...         "yr_renovated",

...     ]

... ].copy()

>>>

>>>

>>> # Model whether a house had an above or below median price via a Binomial

>>> # distribution. We'll be doing L1-regularized logistic regression.

>>> price = house_data["price"]

>>> y = (price < price.median()).values.astype(int)

>>> model = GeneralizedLinearRegressor(

...     family='binomial',

...     l1_ratio=1.0,

...     alpha=0.001

... )

>>>

>>> _ = model.fit(X=X, y=y)

>>>

>>> # .report_diagnostics shows details about the steps taken by the iterative solver.

>>> diags = model.get_formatted_diagnostics(full_report=True)

>>> diags[['objective_fct']]

        objective_fct

n_iter               

0            0.693091

1            0.489500

2            0.449585

3            0.443681

4            0.443498

5            0.443497

>>>

>>> # Models can also be built with formulas from formulaic.

>>> model_formula = GeneralizedLinearRegressor(

...     family='binomial',

...     l1_ratio=1.0,

...     alpha=0.001,

...     formula="bedrooms + np.log(bathrooms + 1) + bs(sqft_living, 3) + C(waterfront)"

... )

>>> _ = model_formula.fit(X=house_data, y=y)

```

# Installation

Please install the package through conda-forge:

```bash

conda install glum -c conda-forge

```

# Performance

For optimal performance on an x86_64 architecture, we recommend using the MKL library

(`conda install mkl`). By default, conda usually installs the openblas version, which

is slower, but supported on all major architecture and OS.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Quantco/glum

Awesome Lists containing this project

README