{"id":20166031,"url":"https://github.com/pabannier/sparseglm","last_synced_at":"2025-04-10T01:21:45.883Z","repository":{"id":90435707,"uuid":"442169941","full_name":"PABannier/sparseglm","owner":"PABannier","description":"Fast and modular solver for sparse generalized linear models","archived":false,"fork":false,"pushed_at":"2024-09-29T10:37:55.000Z","size":839,"stargazers_count":8,"open_issues_count":12,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-24T03:03:40.940Z","etag":null,"topics":["data-science","machine-learning","optimization"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/sparseglm","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PABannier.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-27T13:35:46.000Z","updated_at":"2024-10-02T12:04:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"d806495a-74d1-49c7-888f-031692d911cc","html_url":"https://github.com/PABannier/sparseglm","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PABannier%2Fsparseglm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PABannier%2Fsparseglm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PABannier%2Fsparseglm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PABannier%2Fsparseglm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PABannier","download_url":"https://codeload.github.com/PABannier/sparseglm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248138238,"owners_count":21053837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","machine-learning","optimization"],"created_at":"2024-11-14T00:42:25.731Z","updated_at":"2025-04-10T01:21:45.862Z","avatar_url":"https://github.com/PABannier.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sparseglm\n\n![build](https://github.com/PABannier/sparseglm/actions/workflows/cargo.yml/badge.svg)\n![build](https://github.com/PABannier/sparseglm/actions/workflows/pytest.yml/badge.svg)\n![build](https://github.com/PABannier/sparseglm/actions/workflows/build_doc.yml/badge.svg)\n\nA fast and modular coordinate descent solver for sparse generalized linear models\nwith **convex** and **non-convex** penalties.\n\nThe optimization algorithm is explained [here](https://arxiv.org/abs/2204.07826).\nThis work has been accepted at NeurIPS 2022.\nIt offers theoretical guarantees of convergence and demonstrates the superiority\nof this solver over existing alternatives. The original package written in pure Python can be found\nhere: [skglm](https://github.com/scikit-learn-contrib/skglm).\n\n`sparseglm` leverages [Anderson acceleration](https://github.com/mathurinm/andersoncd)\nand [working sets](https://github.com/scikit-learn-contrib/skglm) to propose a **fast** and\n**memory-efficient** solver on a wide variety of algorithms. It can solve problems\nwith millions of samples and features in seconds. It supports **dense** and\n**sparse** matrices via CSC arrays.\n\nThe philosophy of `sparseglm` lies in providing a highly flexible API.\nBy supplying the datafit term and penalty term, one can implement any sparse Generalized Linear Model (GLM) in under 30 lines of code, making it effortless to introduce new estimators.\n\n```rust\n// Load data and wrap them in a Dataset\nlet dataset = DatasetBase::from((x, y));\n\n// Define a datafit (here a quadratic datafit for regression)\nlet mut datafit = Quadratic::new();\n\n// Define a penalty (here a L1 penalty for Lasso)\nlet penalty = L1::new(0.7);\n\n// Instantiate a Solver with default parameters\nlet solver = Solver::new();\n\n// Solve the problem using coordinate descent\nlet coefficients = solver.solve(\u0026dataset, \u0026mut datafit, \u0026penalty).unwrap();\n```\n\nFor the most well-known models like `Lasso` or `ElasticNet`, `sparseglm` already have off-the-shelf\nimplementations.\n\n```rust\n// Load data and wrap them in a Dataset\nlet dataset = DatasetBase::from((x, y));\n\n// Instantiate and fit the estimator\nlet estimator = Lasso::params()\n                  .alpha(2.)\n                  .fit(\u0026dataset)\n                  .unwrap();\n\n// Get the fitted coefficients\nlet coefficients = estimator.coefficients();\n```\n\n## Performance\n\n### Lasso\n\nWe provide below a demonstration of `sparseglm` against other fast coordinate\ndescent solvers using the optimization benchmarking tool [Benchopt](https://github.com/benchopt/benchopt).\nThe benchmark below solves a Lasso optimization problem. We select three solvers:\n[scikit-learn](https://github.com/scikit-learn/scikit-learn), [celer](https://github.com/mathurinm/celer)\nand `sparseglm`. The solvers are tested at different level of regularization from high sparsity to low\nsparsity.\n\n![](./docs/benchmark_lasso.png)\n\nThe simulations were made on two different datasets: one sparse and one dense.\n[rcv1](https://scikit-learn.org/0.18/datasets/rcv1.html) is a dataset made of more than 804,414 samples\nand 47,236 features. The data comes in the form of a sparse matrix.\nFor the dense dataset, we simulated a dense design matrix of 1,000 samples and 10,000 features.\n\n### Multi-Task Lasso\n\n![](./docs/benchmark_multi_task_lasso.png)\n\nThe simulations were made on two different datasets: one sparse and one dense.\nBoth datasets contain 100 samples, 3,000 samples and 80 tasks.\n\n## Roadmap\n\nCurrently we support:\n\n| Model                      |    Single task     |     Multi task     | Convexity  |\n| -------------------------- | :----------------: | :----------------: | :--------: |\n| Lasso                      | :heavy_check_mark: | :heavy_check_mark: |   Convex   |\n| MCP                        | :heavy_check_mark: | :heavy_check_mark: | Non-convex |\n| Elastic-Net                | :heavy_check_mark: | :heavy_check_mark: |   Convex   |\n| L0.5                       | :heavy_check_mark: | :heavy_check_mark: | Non-convex |\n| Indicator box              |         -          |         -          |   Convex   |\n| Sparse logistic regression | :heavy_check_mark: |         -          |   Convex   |\n| Dual SVM with hinge loss   |         -          |         -          |   Convex   |\n\n## Building and installing the Python package locally\n\nThis repo includes Python bindings to run the existing estimators (in the `Estimators` crate)\nin a Python environment. To install it, run at the root of the repo:\n\n```bash\n\n# Install requirements\npip install -r requirements.txt\n\n# Compile and build Python wheel\ncd python\npython ./setup.py install\n```\n\n## Contributing\n\n### Testing\n\nTo run the tests, run:\n\n```shell\ncargo test\n```\n\n### Benchmarking\n\nThe crates also features benchmarks. To run them, run:\n\n```shell\ncargo bench\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpabannier%2Fsparseglm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpabannier%2Fsparseglm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpabannier%2Fsparseglm/lists"}