{"id":30509282,"url":"https://github.com/finite-sample/alsgls","last_synced_at":"2025-08-26T00:53:51.746Z","repository":{"id":307930372,"uuid":"1031119182","full_name":"finite-sample/alsgls","owner":"finite-sample","description":"Factor Analytic ALS for GLS","archived":false,"fork":false,"pushed_at":"2025-08-18T07:20:54.000Z","size":20,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-18T07:21:58.391Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/finite-sample.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-03T03:48:02.000Z","updated_at":"2025-08-18T07:20:28.000Z","dependencies_parsed_at":"2025-08-18T07:22:01.803Z","dependency_job_id":"dc8f0967-ce4c-4e58-a418-acf46fa1636f","html_url":"https://github.com/finite-sample/alsgls","commit_stats":null,"previous_names":["finite-sample/alssur","finite-sample/alsgls"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/finite-sample/alsgls","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Falsgls","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Falsgls/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Falsgls/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Falsgls/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/finite-sample","download_url":"https://codeload.github.com/finite-sample/alsgls/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Falsgls/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272154914,"owners_count":24882931,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-25T02:00:12.092Z","response_time":1107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-26T00:53:49.349Z","updated_at":"2025-08-26T00:53:51.721Z","avatar_url":"https://github.com/finite-sample.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## A Lightweight ALS Solver for Iterative GLS\n\n[![PyPI version](https://img.shields.io/pypi/v/alsgls.svg)](https://pypi.org/project/alsgls/)\n[![PyPI Downloads](https://static.pepy.tech/badge/alsgls)](https://pepy.tech/projects/alsgls)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n\nWhen a GLS problem involves hundreds of equations, the $K × K$ covariance matrix becomes the computational bottleneck.  A simple statistical remedy is to assume that most of the cross‑equation dependence can be captured by a *handful of latent factors* plus equation‑specific noise.  This “low‑rank + diagonal” assumption slashes the number of unknowns from roughly $K^²$ to about $K×k$ parameters, where **k** (the latent factor rank) is much smaller than $K$.  The model alone, however, does **not** guarantee speed: we still have to fit the parameters.\n\n### Installation\n\nInstall the library from PyPI:\n\n```bash\npip install alsgls\n```\n\nFor local development, clone the repo and use an editable install:\n\n```bash\npip install -e .\n```\n\n### Usage\n\n```python\nfrom alsgls import als_gls, simulate_sur, nll_per_row, XB_from_Blist\n\nXs_tr, Y_tr, Xs_te, Y_te = simulate_sur(N_tr=240, N_te=120, K=60, p=3, k=4)\nB, F, D, mem, _ = als_gls(Xs_tr, Y_tr, k=4)\nYhat_te = XB_from_Blist(Xs_te, B)\nnll = nll_per_row(Y_te - Yhat_te, F, D)\n```\n\nSee `examples/compare_als_vs_em.py` for a complete ALS versus EM comparison.\n\n### Documentation and notebooks\n\nBackground material and reproducible experiments are available in the notebooks under [`als_sim/`](als_sim/), such as [`als_sim/als_comparison.ipynb`](als_sim/als_comparison.ipynb) and [`als_sim/als_sur.ipynb`](als_sim/als_sur.ipynb).\n\n### Solving low‑rank GLS: EM versus ALS\n\nThe classic EM algorithm alternates between updating the regression coefficients $\\beta$ and updating the factor loadings $F$ and the diagonal noise $D$.  Even though $\\hat{\\Sigma}$ is low‑rank, EM’s M‑step recreates the **full** $K × K$ inverse, wiping out the memory win.\n\nAn alternative is **Alternating‑Least‑Squares (ALS)**. The Woodbury identity reduces the expensive inverse to a tiny k × k system, and the β‑update can be written without explicitly forming the dense matrix at all.  In practice, ALS converges in 5–6 sweeps and never allocates more than $O(K k)$ memory, while EM allocates $O(K^²)$.\n\n**Rule of thumb:** if your GLS routine keeps looping between $\\beta$ and a fresh $\\hat{\\Sigma}$, replacing the $\\hat{\\Sigma}$‑update by a factor‑ALS step yields the same statistical fit with an order‑of‑magnitude smaller memory footprint.\n\n### Beyond SUR: where the idea travels\n\nRandom‑effects models, feasible GLS with estimated heteroskedastic weights, optimal‑weight GMM, and spatial autoregressive GLS all iterate β ↔ Σ̂.  Each can adopt the same ALS trick: treat the weight matrix as low‑rank + diagonal, invert only the k × k core, and avoid the dense K × K algebra.  Memory savings in published examples range from 5× to 20×, depending on k.\n\n### A concrete case‑study: Seemingly‑Unrelated Regressions\n\nTo show the magnitude, we ran a Monte‑Carlo experiment with N = 300 observations, three regressors, rank‑3 factors, and K set to 50, 80, 120.  EM was given 45 iterations; ALS, six sweeps.  The largest array EM holds is the dense Σ⁻¹, whereas ALS’s largest is the skinny factor matrix F.  The table summarises six replications:\n\n|   K | β‑RMSE EM | β‑RMSE ALS | Peak MB EM | Peak MB ALS | Memory ratio |\n| --: | :-------: | :--------: | ---------: | ----------: | -----------: |\n|  50 |   0.021   |    0.021   |     0.020  |      0.002  |         10×  |\n|  80 |   0.020   |    0.020   |     0.051  |      0.003  |         17×  |\n| 120 |   0.020   |    0.020   |     0.115  |      0.004  |         29×  |\n\nStatistically, the two estimators are indistinguishable (paired‑test p ≥ 0.14).  Computationally, ALS needs only a few megabytes whereas EM needs tens to hundreds.\n\n### 5  Choosing a solver in practice\n\nFor small systems ($K \u003c 50$), dense GLS or even separate OLS is fine.  Between 50 and 300 equations, a low‑rank **factor‑ALS** solver gives the same estimates at roughly one‑tenth the memory and runs happily on a GPU.  Once K enters the hundreds, any dense inverse becomes prohibitive; structured approaches such as factor‑ALS or sparse/banded $\\hat{\\Sigma}$ are mandatory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Falsgls","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffinite-sample%2Falsgls","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Falsgls/lists"}