{"id":21182628,"url":"https://github.com/fandreuz/minimum-covariate-imbalance","last_synced_at":"2025-03-14T19:44:06.973Z","repository":{"id":107377214,"uuid":"492191359","full_name":"fandreuz/minimum-covariate-imbalance","owner":"fandreuz","description":"Optimization algorithms for the minimum covariate imbalance problem. Used Gurobi(py), NetworkX and Google OR-Tools","archived":false,"fork":false,"pushed_at":"2022-06-16T21:19:30.000Z","size":40,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-21T12:33:13.275Z","etag":null,"topics":["gurobi","networkx","optimization","or-tools","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fandreuz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-14T11:05:11.000Z","updated_at":"2022-06-16T21:18:08.000Z","dependencies_parsed_at":"2023-04-08T15:21:16.802Z","dependency_job_id":null,"html_url":"https://github.com/fandreuz/minimum-covariate-imbalance","commit_stats":null,"previous_names":["fandreuz/minimum-covariate-imbalance"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fandreuz%2Fminimum-covariate-imbalance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fandreuz%2Fminimum-covariate-imbalance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fandreuz%2Fminimum-covariate-imbalance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fandreuz%2Fminimum-covariate-imbalance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fandreuz","download_url":"https://codeload.github.com/fandreuz/minimum-covariate-imbalance/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243639261,"owners_count":20323505,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gurobi","networkx","optimization","or-tools","python"],"created_at":"2024-11-20T17:57:32.787Z","updated_at":"2025-03-14T19:44:06.962Z","avatar_url":"https://github.com/fandreuz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Optimization models for the minimum covariate imbalance problem\n\nIn this repository we implemented and experimented several optimization\nmodels for the minimum covariate imbalance problem. The problem is proven\nto be NP-hard when $P$ (the number of covariates) is strictly greater than 2.\n\nWe focused on some fast methods for the case $P = 2$, though some of the\nfunctions can be used also in a more general case.\n\n## Tools\n\nWe employed several different tools in order to evaluate the best one for the\nproblem. All the code is written in Python, with the support of some NumPy\nfunctions here and there. We used the following optimization engines:\n- Gurobi\n- Google OR-Tools\n- NetworkX (Minimum Cost Network Flow solver)\n\n## Overview\n\nWe consider a so-called *treatment sample* of size $n$ represented by\n$\\ell_{p,i}$ such that $p \\in \\{1, \\dots, P\\}$ is the index of a covariate and\n$i \\in k_p$ is the level among the allowed levels for the $p$-th covariate.\nThe objective of the problem is the identification, among another set called\n*control sample*, of a subset $S$ of size $n$ such that the following holds:\n\n$$S = \\min_{T: |T| = n} \\sum_{p=1}^P \\sum_{i=1}^{k_p} ||T \\cap L_{p,i}'| - \\ell_{p,i}|$$\n\nwhere $T \\cap L'_{p,i}$ is the subset of $T$ such that all the elements are in\nthe $i$-th level with respect to the $p$-th covariate.\n\n## Models implemented\n\n- MIP model ([1], Section 2)\n  - Gurobi\n- Alternative MIP model ([1], Section 3)\n  - Gurobi\n- MCNF model ([1], Section 4)\n  - Gurobi\n  - NetworkX\n  - Google OR-Tools\n- General (`q != n`) MCNF model ([1], Section 6)\n  - NetworkX\n  - Gurobi\n\n## Running the code\n\nFirst of all we generate randomly a problem using the utility functions inside\nthe module `utils`:\n\n```python\nfrom utils import generate_problems\n\nn = 5\nn_prime = 15\nk0 = k1 = 5\n\nl, L_prime = generate_problems(n, n_prime, k0, k1)\n```\n\nYou can now use any method from the modules in the repository in order to solve\nthe problem:\n\n```python\n# brute force solver\nfrom brute_force import brute_force\nbrute_force(l, L_prime)\n\n# MIP model\nfrom mip_formulation import min_imbalance_solver, min_imbalance_solver_alt, min_imbalance_solver_mcnf\nmin_imbalance_solver(l, L_prime)\nmin_imbalance_solver_alt(l, L_prime)\nmin_imbalance_solver_mcnf(l, L_prime)\n\n# MCNF model\nfrom minimum_network_flow import min_imbalance_solver_networkx, min_imbalance_solver_google\nmin_imbalance_solver_networkx(l, L_prime)\nmin_imbalance_solver_google(l, L_prime)\n```\n\n## Benchmarks\n\n### Increasing `n` and `k1, k2` (`n' = 500`, `k1 = k2 = n/2`)\n![1](https://user-images.githubusercontent.com/8464342/173231385-72e6c808-6050-4203-a330-dd35437c62c0.png)\n\n### Increasing `n'` (`n = 50`, `k1, k2 = 50`)\n![2](https://user-images.githubusercontent.com/8464342/173231392-fdc6dbe3-4568-4cc2-b0d3-2f6bbe2631fd.png)\n\n### Increasing `k1, k2` (`n = 100`, `n' = 1.000.000`)\n![3](https://user-images.githubusercontent.com/8464342/173231394-1d44401c-6b3d-47e0-9f50-996de39331ba.png)\n\n### Legend\n\n- Gurobi:\n  - `Integer` : MIP formulation in [1]\n  - `Integer` : Alternative MIP formulation in [1]\n  - `Integer MCNF` : MCNF formulation implemented like a MIP\n- Google OR-Tools:\n  - `MCNF OR` : MCFN formulation\n- NetworkX:\n  - `MCNF NX` : MCFN formulation\n\n## Reference\n\n[1] Network flow methods for the minimum covariate imbalance problem\n\nDorit S. Hochbaum, Xu Rao, Jason Sauppe\n\nhttps://arxiv.org/pdf/2007.06828.pdf\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffandreuz%2Fminimum-covariate-imbalance","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffandreuz%2Fminimum-covariate-imbalance","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffandreuz%2Fminimum-covariate-imbalance/lists"}