{"id":29830809,"url":"https://github.com/finite-sample/leximin-matching","last_synced_at":"2025-07-29T10:11:38.632Z","repository":{"id":300244982,"uuid":"1005661607","full_name":"finite-sample/leximin-matching","owner":"finite-sample","description":"Matching Based on Leximin Objective","archived":false,"fork":false,"pushed_at":"2025-06-20T15:37:27.000Z","size":5,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-20T16:41:25.882Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/finite-sample.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-20T15:37:02.000Z","updated_at":"2025-06-20T15:37:30.000Z","dependencies_parsed_at":"2025-06-20T16:41:28.909Z","dependency_job_id":"165eefaa-d7dc-462e-9f71-b10cdd72b58b","html_url":"https://github.com/finite-sample/leximin-matching","commit_stats":null,"previous_names":["finite-sample/leximin-matching"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/finite-sample/leximin-matching","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fleximin-matching","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fleximin-matching/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fleximin-matching/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fleximin-matching/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/finite-sample","download_url":"https://codeload.github.com/finite-sample/leximin-matching/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/finite-sample%2Fleximin-matching/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267668843,"owners_count":24124972,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-29T10:11:33.119Z","updated_at":"2025-07-29T10:11:38.617Z","avatar_url":"https://github.com/finite-sample.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"## Leximin Matching: Minimizing Maximum Covariate Distance\n\nThe Hungarian algorithm minimizes total covariate distance between treated and control units, ensuring optimal aggregate match quality. But this can leave some treated units with particularly poor covariate matches. Leximin matching offers an alternative objective that minimizes the maximum covariate distance first, then the second-maximum, and so on.\n\n## Distance Measurement and Objective Functions\n\nWe measure covariate distance using Euclidean distance in standardized covariate space. For treated unit i with covariates X_i = [age_i, income_i, education_i] and control unit j with X_j = [age_j, income_j, education_j], the distance is:\n\nd(i,j) = √[(age_i - age_j)² + (income_i - income_j)² + (education_i - education_j)²]\n\nwhere covariates are standardized to have mean 0 and variance 1.\n\n**Hungarian:** Minimize ∑ᵢ d(i, matched_j) over all valid assignments\n\n**Leximin:** Minimize max{d(i, matched_j)} first, then second-max, then third-max, etc.\n\n## When Hungarian and Leximin Disagree\n\nConsider 3 treated units, 3 controls, with two possible complete assignments:\n\n**Assignment A:** distances [1.0, 1.0, 9.0] → total = 11.0, max = 9.0  \n**Assignment B:** distances [2.0, 3.0, 7.0] → total = 12.0, max = 7.0\n\n- **Hungarian chooses A** (minimizes total: 11.0 \u003c 12.0)\n- **Leximin chooses B** (minimizes max: 7.0 \u003c 9.0)\n\nHungarian accepts the terrible 9.0-distance match because it reduces total cost. Leximin rejects it to avoid leaving any unit with an extremely poor match. \"Lexicographically\" means optimizing distances in worst-to-best order: first minimize the worst distance, then among solutions achieving that minimum, minimize the second-worst, and so on.\n\n## Practical Application: Citizens' Assembly Selection\n\nA recent Nature paper applied leximin to selecting representative panels for democratic participation. Researchers developed algorithms that maximize the minimum probability any individual gets selected for a citizens' assembly, while maintaining demographic quotas. Their LEXIMIN algorithm has been deployed by organizations across multiple countries, selecting over 40 assemblies.\n\nThe parallel to matching is direct: ensure representativeness without systematically excluding any demographic combinations - analogous to achieving covariate balance without leaving treated units with extremely poor matches.\n\n## Empirical Evidence\n\nWe compared Hungarian and leximin across three covariate distribution scenarios:\n\n**Balanced distributions:** Both methods achieved similar maximum distances (2.0027) and treatment effect estimates. Method choice made minimal difference.\n\n**Clustered distributions:** Leximin reduced maximum covariate distance by 19.6% (2.05 → 1.64) compared to Hungarian, though with slightly worse average balance.\n\n**Sparse distributions with outliers:** Leximin achieved better treatment effect estimation (bias: -0.014 vs 0.330) despite having worse average covariate balance.\n\nThe consistent pattern: leximin sacrifices average balance to protect worst-matched units, which can improve causal inference when covariate distributions are challenging.\n\n## Implementation\n\nThe computational challenge is that naive LP formulations with fractional variables produce incorrect results. Partial assignments artificially reduce maximum distance constraints. Solutions include:\n\n1. **Integer programming** (computationally expensive)\n2. **Bottleneck assignment** (reformulate as minimum bottleneck matching)\n\nWe use bottleneck assignment, which finds the minimum possible maximum edge weight in a perfect matching - equivalent to leximin for most practical problems while remaining computationally tractable.\n\n## When to Use Leximin\n\n**Choose leximin when:**\n- Covariate distributions are clustered or contain outliers\n- Worst-case matches pose analytical risks\n- Individual match quality matters more than aggregate efficiency\n\n**Choose Hungarian when:**\n- Covariates are well-distributed\n- Computational speed is critical\n- Theoretical optimality guarantees are required\n\n## Limitations\n\nResults come from specific simulation settings. The 19.6% maximum distance improvement and better bias performance need validation across broader contexts. The relationship between individual fairness and causal inference quality requires deeper investigation.\n\nExtension to 1-to-k matching adds complexity. Our sequential approach provides a reasonable heuristic, but optimal k-matching under leximin criteria remains algorithmically challenging.\n\n## Conclusion\n\nLeximin matching minimizes maximum covariate distance rather than total distance. In challenging scenarios with clustered covariates or outlier units, protecting worst-matched individuals can improve treatment effect estimation even when average covariate balance appears worse.\n\nThe choice between Hungarian and leximin reflects competing statistical objectives: aggregate optimality versus individual match quality. Both have their place depending on the covariate distribution and research priorities.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Fleximin-matching","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffinite-sample%2Fleximin-matching","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffinite-sample%2Fleximin-matching/lists"}