{"id":21312539,"url":"https://github.com/phydev/mice","last_synced_at":"2025-03-15T20:46:29.710Z","repository":{"id":151473223,"uuid":"499481408","full_name":"phydev/mice","owner":"phydev","description":"Multiple imputation with chained equation implemented from scratch. This is a low performance implementation meant for pedagogical purposes only. ","archived":false,"fork":false,"pushed_at":"2023-01-23T22:16:28.000Z","size":140,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-22T10:11:32.463Z","etag":null,"topics":["data-cleaning","data-science","imputation","mice-algorithm","missingness","multiple-imputation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phydev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-06-03T11:12:12.000Z","updated_at":"2022-06-08T08:44:28.000Z","dependencies_parsed_at":"2023-05-23T21:15:43.554Z","dependency_job_id":null,"html_url":"https://github.com/phydev/mice","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phydev%2Fmice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phydev%2Fmice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phydev%2Fmice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phydev%2Fmice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phydev","download_url":"https://codeload.github.com/phydev/mice/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243790949,"owners_count":20348379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-cleaning","data-science","imputation","mice-algorithm","missingness","multiple-imputation"],"created_at":"2024-11-21T17:34:19.374Z","updated_at":"2025-03-15T20:46:29.665Z","avatar_url":"https://github.com/phydev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MICE - Multiple Imputation by Chained Equations\nMultiple imputation by chained equation implemented from scratch. \n\n## Example 1: iris dataset\n\nLoad the iris data from sklearn and introduce missing values with [pyampute package](https://github.com/RianneSchouten/pyampute)\n```python\nfrom sklearn.datasets import load_iris\nfrom pyampute.ampute import MultivariateAmputation\n\niris = load_iris(as_frame=True, return_X_y=False)[\"data\"]\nma = MultivariateAmputation()\nX_amp = ma.fit_transform(iris.to_numpy()) # pyampute requires the input as numpy array\n\n```\nNow we can apply MICE in the amputed dataset\n```python\nfrom src import mice\nimp = mice.mice(X, n_iterations = 20, m_imputations = 10, seed=42)\n```\n\n\n## Example 2: distribution plot for the sample data\nAfter imputation you should make diagnostic plots and check the distribution of the multiply imputed datasets comparing with the complete case data. Bellow you can find the plot for the example we provide in  /tests directory:\n\n```python\nimport seaborn as sns\nimport matplotlib.pyplot as plt\n\np = 3 # column to be plotted\ncustom_lines = [plt.Line2D([0], [0], color=\"red\", lw=4),\n                plt.Line2D([0], [0], color=\"grey\", lw=4),\n                plt.Line2D([0], [0], color=\"blue\", lw=4)]\n\nfig, ax = plt.subplots()\n\nfor m in range(len(imp)):\n    sns.kdeplot(imp[m][:, p], label=\"Imputed\", color=\"black\", lw=0.2, ax=ax)\nsns.kdeplot(X_amp[:,p], label=\"Missing\", color=\"blue\", ax=ax)\nsns.kdeplot(df.to_numpy()[:, p], label=\"Complete\", color=\"red\",ax=ax)\nplt.xlabel(\"Age (years)\")\nax.legend(custom_lines, ['Complete', 'Imputed', 'Missing'], loc=\"upper left\")\nplt.savefig(\"qol_distribution_mice.png\")\n```\n\n![Figure showing the distribution lines for 10 imputed datasets, the original dataset and the amputed dataset with missing values.](https://github.com/phydev/mice/blob/main/tests/qol_distribution_mice.png)\n\n## Beware\nThis is a low performance implementation meant for pedagogical purposes only. There are several limitations and improvements that can be made, for research please use one of the available packages for multiple imputation:\n- [mice](https://cran.r-project.org/web/packages/mice/index.html)\n- [miceRanger](https://github.com/FarrellDay/miceRanger)\n- [sklearn.imputer](https://scikit-learn.org/stable/modules/impute.html)\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphydev%2Fmice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphydev%2Fmice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphydev%2Fmice/lists"}