{"id":17131846,"url":"https://github.com/raamana/missingdata","last_synced_at":"2025-04-13T07:55:18.137Z","repository":{"id":57441983,"uuid":"167450417","full_name":"raamana/missingdata","owner":"raamana","description":"missing data handing: visualize and impute","archived":false,"fork":false,"pushed_at":"2019-07-31T13:33:24.000Z","size":1594,"stargazers_count":18,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-13T07:55:12.659Z","etag":null,"topics":["biostatistics","data-science","dirty-data","epidemiology","imputation","machine-learning","missing-data","missing-values","neuroscience","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/raamana.png","metadata":{"files":{"readme":"README.rst","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-01-24T22:54:53.000Z","updated_at":"2024-08-12T00:23:32.000Z","dependencies_parsed_at":"2022-09-26T17:21:00.690Z","dependency_job_id":null,"html_url":"https://github.com/raamana/missingdata","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raamana%2Fmissingdata","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raamana%2Fmissingdata/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raamana%2Fmissingdata/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/raamana%2Fmissingdata/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/raamana","download_url":"https://codeload.github.com/raamana/missingdata/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248681491,"owners_count":21144700,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biostatistics","data-science","dirty-data","epidemiology","imputation","machine-learning","missing-data","missing-values","neuroscience","visualization"],"created_at":"2024-10-14T19:25:05.608Z","updated_at":"2025-04-13T07:55:18.106Z","avatar_url":"https://github.com/raamana.png","language":"Python","readme":"===========\nmissingdata\n===========\n\n\n.. image:: https://img.shields.io/pypi/v/missingdata.svg\n        :target: https://pypi.python.org/pypi/missingdata\n\n.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3352336.svg\n        :target: https://doi.org/10.5281/zenodo.3352336\n        :alt: citation\n\n.. image:: https://img.shields.io/travis/raamana/missingdata.svg\n        :target: https://travis-ci.org/raamana/missingdata\n\n.. image:: https://readthedocs.org/projects/missingdata/badge/?version=latest\n        :target: https://missingdata.readthedocs.io/en/latest/?badge=latest\n        :alt: Documentation Status\n\n\nmissing data visualization and imputation\n\nGoals\n--------\n\nTo provide an easy to use yet thorough assessment of missing values in one's dataset:\n\n - in addition to the blackholes plot bellow,\n - show the variable-to-variable, subject-to-subject co-missingness, and\n - quantify the TYPE of missingness etc\n\n\nNote\n~~~~~~~~~~~~~\n\n    To easily manage your data with missing values etc, I strongly recommend you to move away from CSV files and start managing your data in self-contained flexible data structures like `pyradigm \u003chttp://github.com/raamana/pyradigm\u003e`_, as your data, as well your needs, will only get bigger \u0026 more complicated e.g. with mixed-types, missing values and large number of groups.\n\n\nThese would be great contributions if you have time.\n\n\nFeatures\n--------\n\n* visualization\n* imputation (coming!)\n* other handling\n\n\nblackholes plot\n~~~~~~~~~~~~~~~~\n\n.. image:: docs/flyer.png\n\n\nState\n-------\n - Software is beta and under dev. **Update regularly and quite often!!**\n - Contributions most welcome, esp. reporting bugs and improving usability.\n\n\nInstallation\n--------------\n\n.. code-block:: bash\n\n    pip install -U missingdata\n\n\nWe encourage you to update quite often, when you run into any issues.\n\nUsage\n------------\n\nTake a look at the help text first before diving in to use it - with the following code:\n\n.. code-block:: python\n\n    from missingdata import blackholes\n    help(blackholes)\n\n\nI encourage you to read the text for each parameter carefully to understand the behaviour of this plotting mechanism.\n\n.. note::\n\n    If you don't see any labels (for rows or columns), when you try the blackholes plot for the first time, it may be because the total effective number of rows/cols being displayed, after applying ``filter_spec_*``, exceeded a preset number (60/80) and we removed the labels to avoid them getting occluded or becoming illegible. You can use the parameter freq_thresh_show_labels to bring the effective number of rows/cols down to display to a smaller number, or pass ``show_all_labels=True`` to force the display of labels. If number of subjects or variables is large, you may want to increase ``figsize`` (width or height), to minimize occlusion and improve label readability.\n\n    **Also**, the defaults chosen may not work for you, hence I strongly encourage you to control as many parameters as needed to customize the plot to your liking. If a feature you need is not served currently, send a PR with improvements, or open an issue. Thanks.\n\n\nLet's say you have all the data in a pandas `DataFrame`, where subject IDs are in a ``'sub_ids'`` column and variable names are in a ``'var_names'`` column, and they belong to groups identified by ``sub_class`` and ``var_group``, you can use the following code produce the ``blackholes`` plot:\n\n.. code-block:: python\n\n\n    blackholes(data_frame,\n               label_rows_with='sub_ids', label_cols_with='var_names',\n               group_rows_by=sub_class, group_cols_by=var_group)\n\n\n\nIf you were interested in seeing subjects/variables with least amount of missing data, you can control miss perc window\nwith ``filter_spec_samples`` and/or ``filter_spec_variables`` by passing a tuple of two floats e.g. (0, 0.1) which\nwill filter away those with more than 10% of missing data.\n\n.. code-block:: python\n\n    blackholes(data_frame,\n               label_rows_with='sub_ids', label_cols_with='var_names',\n               filter_spec_samples=(0, 0.1))\n\n\nThe other parameters for the function are self-explanatory.\n\nPlease open an `issue \u003chttps://github.com/raamana/missingdata/issues/new\u003e`_\nif you find something confusing, or have feedback to improve, or identify a bug. **Thanks**.\n\n\nCitation\n----------------\n\nIf you find this package useful, I'd greatly appreciate if cite this package via:\n\n.. parsed-literal::\n\n    Pradeep Reddy Raamana, (2019), \"missingdata python library for visualization and handling of missing values\" (Version v0.1). Zenodo. http://doi.org/10.5281/zenodo.3352336\n    DOI: 10.5281/zenodo.3352336\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraamana%2Fmissingdata","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraamana%2Fmissingdata","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraamana%2Fmissingdata/lists"}