{"id":29830813,"url":"https://github.com/finite-sample/onlinerake","last_synced_at":"2025-07-29T10:11:41.032Z","repository":{"id":305357945,"uuid":"1022488161","full_name":"finite-sample/onlinerake","owner":"finite-sample","description":"Online raking with SGD or MWU","archived":false,"fork":false,"pushed_at":"2025-07-26T19:02:32.000Z","size":287,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-26T23:31:44.421Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/finite-sample.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-19T07:15:17.000Z","updated_at":"2025-07-26T19:02:36.000Z","dependencies_parsed_at":"2025-07-19T19:29:09.876Z","dependency_job_id":"dd7b637c-ac6e-4da7-bdc6-ac8fdf6798eb","html_url":"https://github.com/finite-sample/onlinerake","commit_stats":null,"previous_names":["finite-sample/onlinerake"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/finite-sample/onlinerake","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fonlinerake","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fonlinerake/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fonlinerake/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fonlinerake/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/finite-sample","download_url":"https://codeload.github.com/finite-sample/onlinerake/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fonlinerake/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267668843,"owners_count":24124972,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-29T10:11:34.168Z","updated_at":"2025-07-29T10:11:41.021Z","avatar_url":"https://github.com/finite-sample.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## onlinerake: Streaming Survey Raking Via MWU and SGD\n\n[![PyPI version](https://img.shields.io/pypi/v/onlinerake.svg)](https://pypi.org/project/onlinerake/)\n[![PyPI Downloads](https://static.pepy.tech/badge/onlinerake)](https://pepy.tech/projects/onlinerake)\n\nModern online surveys and passive data collection streams generate\nresponses one record at a time.  Classic weighting methods such as\niterative proportional fitting (IPF, or “raking”) and calibration\nweighting are inherently *batch* procedures: they reprocess the entire\ndataset whenever a new case arrives.  The `onlinerake` package\nprovides **incremental**, per‑observation updates to survey weights so\nthat weighted margins track known population totals in real time.\n\nThe package implements two complementary algorithms:\n\n* **SGD raking** – an additive update that performs stochastic\n  gradient descent on a squared–error loss over the margins.  It\n  produces smooth weight trajectories and maintains high effective\n  sample size (ESS).\n* **MWU raking** – a multiplicative update inspired by the\n  multiplicative‑weights update rule.  It corresponds to mirror\n  descent under the Kullback–Leibler divergence and yields weight\n  distributions reminiscent of classic IPF.  However, it can produce\n  heavier tails when the learning rate is large.\n\nBoth methods share the same API: call `.partial_fit(obs)` for each\nincoming observation and inspect properties such as `.margins`, `.loss`\nand `.effective_sample_size` to monitor progress.\n\n## Installation\n\nClone or download this repository and install in editable mode:\n\n```bash\ngit clone \u003crepo-url\u003e\ncd onlinerake\npip install -e .\n```\n\nNo external dependencies are required beyond `numpy` and `pandas`.\n\n## Usage\n\n```python\nfrom onlinerake import OnlineRakingSGD, OnlineRakingMWU, Targets\n\n# define target population margins (proportion of the population with indicator = 1)\ntargets = Targets(age=0.5, gender=0.5, education=0.4, region=0.3)\n\n# instantiate a raker\nraker = OnlineRakingSGD(targets, learning_rate=5.0)\n\n# stream demographic observations\nfor obs in stream_of_dicts:\n    raker.partial_fit(obs)\n    print(raker.margins)  # current weighted margins\n\nprint(\"final effective sample size\", raker.effective_sample_size)\n```\n\nTo use the multiplicative‑weights version, replace\n`OnlineRakingSGD` with `OnlineRakingMWU` and adjust the\n`learning_rate` (a typical default is `1.0`).  See the docstrings\nfor full parameter descriptions.\n\n## Simulation results\n\nTo understand the behaviour of the two update rules we simulated\nthree typical non‑stationary bias patterns: a **linear drift** in\ndemographic composition, a **sudden shift** halfway through the stream,\nand an **oscillation** around the target frame.  For each scenario we\ngenerated 300 observations per seed and averaged results over five\nrandom seeds.  SGD used a learning rate of 5.0 and MWU used a\nlearning rate of 1.0 with three update steps per observation.  The\ntable below summarises the mean improvement in absolute margin error\nrelative to the unweighted baseline (positive values indicate an\nimprovement), the final effective sample size (ESS) and the mean final\nloss (squared‑error on margins).  Higher ESS and larger improvements\nare better.\n\n| Scenario | Method | Age Imp (%) | Gender Imp (%) | Education Imp (%) | Region Imp (%) | Overall Imp (%) | Final ESS | Final Loss |\n|---------|--------|-------------|---------------|------------------|---------------|----------------|---------:|-----------:|\n| linear | SGD | 82.8 | 78.6 | 76.8 | 67.5 | 77.0 | 251.8 | 0.00147 |\n| linear | MWU | 57.2 | 53.6 | 46.9 | 34.6 | 48.8 | 240.9 | 0.00676 |\n| sudden | SGD | 82.9 | 82.3 | 79.6 | 63.5 | 79.5 | 225.5 | 0.00102 |\n| sudden | MWU | 52.6 | 51.2 | 46.3 | 26.3 | 47.3 | 175.9 | 0.01235 |\n| oscillating | SGD | 69.7 | 78.5 | 65.6 | 72.0 | 72.2 | 278.7 | 0.00023 |\n| oscillating | MWU | 49.6 | 57.3 | 48.3 | 50.1 | 52.0 | 276.0 | 0.00048 |\n\n**Interpretation**\n\n* In all scenarios the online rakers dramatically reduce the margin\n  errors relative to the unweighted baseline.  For example, in the\n  sudden‑shift scenario the SGD raker reduces the average age error\n  from 0.20 to about 0.03 (a 83% improvement).\n* The SGD update consistently yields *higher* improvements and lower\n  final loss than the MWU update, albeit at the cost of choosing a\n  more aggressive learning rate.\n* The MWU update, while less accurate in these settings, maintains\n  comparable effective sample sizes and might be preferable when\n  multiplicative adjustments are desired (e.g., when starting from\n  unequal base weights).\n\nYou can reproduce these results or design new experiments by running\n\n```bash\npython -m onlinerake.simulation\n```\n\nfrom the repository root.  See the source of\n`onlinerake/simulation.py` for details.\n\n## Contributing\n\nPull requests are welcome!  Feel free to open issues if you find bugs\nor have suggestions for new features, such as support for multi‑level\ncontrols or adaptive learning‑rate schedules.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Fonlinerake","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffinite-sample%2Fonlinerake","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Fonlinerake/lists"}