https://github.com/phydev/mice
Multiple imputation with chained equation implemented from scratch. This is a low performance implementation meant for pedagogical purposes only.
https://github.com/phydev/mice
data-cleaning data-science imputation mice-algorithm missingness multiple-imputation
Last synced: 7 months ago
JSON representation
Multiple imputation with chained equation implemented from scratch. This is a low performance implementation meant for pedagogical purposes only.
- Host: GitHub
- URL: https://github.com/phydev/mice
- Owner: phydev
- License: gpl-3.0
- Created: 2022-06-03T11:12:12.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-23T22:16:28.000Z (over 2 years ago)
- Last Synced: 2025-01-22T10:11:32.463Z (9 months ago)
- Topics: data-cleaning, data-science, imputation, mice-algorithm, missingness, multiple-imputation
- Language: Python
- Homepage:
- Size: 137 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MICE - Multiple Imputation by Chained Equations
Multiple imputation by chained equation implemented from scratch.## Example 1: iris dataset
Load the iris data from sklearn and introduce missing values with [pyampute package](https://github.com/RianneSchouten/pyampute)
```python
from sklearn.datasets import load_iris
from pyampute.ampute import MultivariateAmputationiris = load_iris(as_frame=True, return_X_y=False)["data"]
ma = MultivariateAmputation()
X_amp = ma.fit_transform(iris.to_numpy()) # pyampute requires the input as numpy array```
Now we can apply MICE in the amputed dataset
```python
from src import mice
imp = mice.mice(X, n_iterations = 20, m_imputations = 10, seed=42)
```## Example 2: distribution plot for the sample data
After imputation you should make diagnostic plots and check the distribution of the multiply imputed datasets comparing with the complete case data. Bellow you can find the plot for the example we provide in /tests directory:```python
import seaborn as sns
import matplotlib.pyplot as pltp = 3 # column to be plotted
custom_lines = [plt.Line2D([0], [0], color="red", lw=4),
plt.Line2D([0], [0], color="grey", lw=4),
plt.Line2D([0], [0], color="blue", lw=4)]fig, ax = plt.subplots()
for m in range(len(imp)):
sns.kdeplot(imp[m][:, p], label="Imputed", color="black", lw=0.2, ax=ax)
sns.kdeplot(X_amp[:,p], label="Missing", color="blue", ax=ax)
sns.kdeplot(df.to_numpy()[:, p], label="Complete", color="red",ax=ax)
plt.xlabel("Age (years)")
ax.legend(custom_lines, ['Complete', 'Imputed', 'Missing'], loc="upper left")
plt.savefig("qol_distribution_mice.png")
```
## Beware
This is a low performance implementation meant for pedagogical purposes only. There are several limitations and improvements that can be made, for research please use one of the available packages for multiple imputation:
- [mice](https://cran.r-project.org/web/packages/mice/index.html)
- [miceRanger](https://github.com/FarrellDay/miceRanger)
- [sklearn.imputer](https://scikit-learn.org/stable/modules/impute.html)