https://github.com/fandreuz/minimum-covariate-imbalance
Optimization algorithms for the minimum covariate imbalance problem. Used Gurobi(py), NetworkX and Google OR-Tools
https://github.com/fandreuz/minimum-covariate-imbalance
gurobi networkx optimization or-tools python
Last synced: over 1 year ago
JSON representation
Optimization algorithms for the minimum covariate imbalance problem. Used Gurobi(py), NetworkX and Google OR-Tools
- Host: GitHub
- URL: https://github.com/fandreuz/minimum-covariate-imbalance
- Owner: fandreuz
- License: mit
- Created: 2022-05-14T11:05:11.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2022-06-16T21:19:30.000Z (about 4 years ago)
- Last Synced: 2025-01-21T12:33:13.275Z (over 1 year ago)
- Topics: gurobi, networkx, optimization, or-tools, python
- Language: Python
- Homepage:
- Size: 39.1 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Optimization models for the minimum covariate imbalance problem
In this repository we implemented and experimented several optimization
models for the minimum covariate imbalance problem. The problem is proven
to be NP-hard when $P$ (the number of covariates) is strictly greater than 2.
We focused on some fast methods for the case $P = 2$, though some of the
functions can be used also in a more general case.
## Tools
We employed several different tools in order to evaluate the best one for the
problem. All the code is written in Python, with the support of some NumPy
functions here and there. We used the following optimization engines:
- Gurobi
- Google OR-Tools
- NetworkX (Minimum Cost Network Flow solver)
## Overview
We consider a so-called *treatment sample* of size $n$ represented by
$\ell_{p,i}$ such that $p \in \{1, \dots, P\}$ is the index of a covariate and
$i \in k_p$ is the level among the allowed levels for the $p$-th covariate.
The objective of the problem is the identification, among another set called
*control sample*, of a subset $S$ of size $n$ such that the following holds:
$$S = \min_{T: |T| = n} \sum_{p=1}^P \sum_{i=1}^{k_p} ||T \cap L_{p,i}'| - \ell_{p,i}|$$
where $T \cap L'_{p,i}$ is the subset of $T$ such that all the elements are in
the $i$-th level with respect to the $p$-th covariate.
## Models implemented
- MIP model ([1], Section 2)
- Gurobi
- Alternative MIP model ([1], Section 3)
- Gurobi
- MCNF model ([1], Section 4)
- Gurobi
- NetworkX
- Google OR-Tools
- General (`q != n`) MCNF model ([1], Section 6)
- NetworkX
- Gurobi
## Running the code
First of all we generate randomly a problem using the utility functions inside
the module `utils`:
```python
from utils import generate_problems
n = 5
n_prime = 15
k0 = k1 = 5
l, L_prime = generate_problems(n, n_prime, k0, k1)
```
You can now use any method from the modules in the repository in order to solve
the problem:
```python
# brute force solver
from brute_force import brute_force
brute_force(l, L_prime)
# MIP model
from mip_formulation import min_imbalance_solver, min_imbalance_solver_alt, min_imbalance_solver_mcnf
min_imbalance_solver(l, L_prime)
min_imbalance_solver_alt(l, L_prime)
min_imbalance_solver_mcnf(l, L_prime)
# MCNF model
from minimum_network_flow import min_imbalance_solver_networkx, min_imbalance_solver_google
min_imbalance_solver_networkx(l, L_prime)
min_imbalance_solver_google(l, L_prime)
```
## Benchmarks
### Increasing `n` and `k1, k2` (`n' = 500`, `k1 = k2 = n/2`)

### Increasing `n'` (`n = 50`, `k1, k2 = 50`)

### Increasing `k1, k2` (`n = 100`, `n' = 1.000.000`)

### Legend
- Gurobi:
- `Integer` : MIP formulation in [1]
- `Integer` : Alternative MIP formulation in [1]
- `Integer MCNF` : MCNF formulation implemented like a MIP
- Google OR-Tools:
- `MCNF OR` : MCFN formulation
- NetworkX:
- `MCNF NX` : MCFN formulation
## Reference
[1] Network flow methods for the minimum covariate imbalance problem
Dorit S. Hochbaum, Xu Rao, Jason Sauppe
https://arxiv.org/pdf/2007.06828.pdf