{"id":20986342,"url":"https://github.com/mgobeaalcoba/missing_mga","last_synced_at":"2025-05-14T17:32:49.608Z","repository":{"id":239435388,"uuid":"799541207","full_name":"Mgobeaalcoba/missing_mga","owner":"Mgobeaalcoba","description":"A python package that extends the Pandas API and allows us to work with multiple tabulation and graphing methods with null values.","archived":false,"fork":false,"pushed_at":"2024-06-27T20:43:17.000Z","size":48,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-10T07:08:08.406Z","etag":null,"topics":["deployment-automation","extensions","package","pandas","pip","pypi","pypi-package","python","workflow-automation"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/missing-mga/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mgobeaalcoba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-12T13:04:11.000Z","updated_at":"2024-06-27T20:43:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"9e58fe72-96ea-464f-a2f3-98de3a592442","html_url":"https://github.com/Mgobeaalcoba/missing_mga","commit_stats":null,"previous_names":["mgobeaalcoba/missing_mga"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mgobeaalcoba%2Fmissing_mga","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mgobeaalcoba%2Fmissing_mga/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mgobeaalcoba%2Fmissing_mga/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mgobeaalcoba%2Fmissing_mga/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mgobeaalcoba","download_url":"https://codeload.github.com/Mgobeaalcoba/missing_mga/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225304013,"owners_count":17453037,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deployment-automation","extensions","package","pandas","pip","pypi","pypi-package","python","workflow-automation"],"created_at":"2024-11-19T06:13:09.809Z","updated_at":"2024-11-19T06:13:10.341Z","avatar_url":"https://github.com/Mgobeaalcoba.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Extends Pandas DataFrame with a new method to work with missing values\n\n![Visitors](https://api.visitorbadge.io/api/visitors?path=https%3A%2F%2Fgithub.com%2FMgobeaalcoba%2Fmissing_mga\u0026label=Visitors\u0026countColor=%23263759)\n\n## Introduction\n\nThis package extends the Pandas DataFrame with a new methods to work with missing values. The new method lives in the extension class MissingMethods and is called missing. This methods allows to work with missing values in a more intuitive way.\n\nThis class provides several methods for handling missing values in a DataFrame. Here's a brief explanation of each method:\n\n1. **number_missing**: Returns the total number of missing values in the DataFrame.\n2. **number_missing_by_column**: Returns the number of missing values for each column.\n3. **number_complete**: Returns the total number of complete (non-missing) values in the DataFrame.\n4. **number_complete_by_column**: Returns the number of complete values for each column.\n5. **impute_mean** Input a value in the missing values of the DataFrame using the mean of each column.\n6. **impute_median** Input a value in the missing values of the DataFrame using the median of each column.\n7. **impute_mode** Input a value in the missing values of the DataFrame using the mode of each column.\n8. **impute_knn(n_neighbors=5)** Input a value in the missing values of the DataFrame using the K-Nearest Neighbors algorithm.\n9. **missing_value_heatmap** Generates a heatmap showing the distribution of missing values in the DataFrame.\n10. **drop_missing_rows(thresh=0.5)** Deletes the rows that contain missing values above the specified percentage.\n11. **drop_missing_columns(thresh=0.5)** Deletes the columns that contain missing values above the specified percentage.\n12. **missing_variable_summary**: Generates a summary table showing the count and percentage of missing values for each variable (column).\n13. **missing_case_summary**: Generates a summary table showing the count and percentage of missing values for each case (row).\n14. **missing_variable_table**: Generates a table showing the distribution of missing values across variables.\n15. **missing_case_table**: Generates a table showing the distribution of missing values across cases.\n16. **missing_variable_span**: Analyzes the missing values in a variable over a specified span and returns a DataFrame summarizing the percentage of missing and complete values.\n17. **missing_variable_run**: Identifies runs of missing and complete values in a specified variable and returns a DataFrame summarizing their lengths.\n18. **sort_variables_by_missingness**: Sorts the DataFrame columns based on the number of missing values in each column.\n19. **create_shadow_matrix**: Creates a shadow matrix indicating missing values with a specified string.\n20. **bind_shadow_matrix**: Binds the original DataFrame with its shadow matrix indicating missing values.\n21. **missing_scan_count**: Counts occurrences of specified values in the DataFrame and returns the counts per variable.\n22. **missing_variable_plot**: Plots a horizontal bar chart showing the number of missing values for each variable.\n23. **missing_case_plot**: Plots a histogram showing the distribution of missing values across cases.\n24. **missing_variable_span_plot**: Plots a stacked bar chart showing the percentage of missing and complete values over a repeating span for a specified variable.\n25. **missing_upsetplot**: Generates an UpSet plot to visualize the combinations of missing values across variables.\n\nThese methods provide comprehensive tools for analyzing and visualizing missing values in a DataFrame. They can be used to gain insights into the patterns and distribution of missing values, as well as to inform data cleaning and imputation strategies.\n\n## Installation\n\nTo install the package, you can use pip:\n\n```shell\npip install missing-mga\n```\n\n## Usage\n\nTo use the package, you need to import the MissingMethods class from the pandas_missing module:\n\n```python\nimport missing_mga as missing\n```\n\nThen, you can create a DataFrame and use the missing method to access the missing value handling methods:\n\n```python\nimport pandas as pd\n\n# Create a DataFrame\ndata = {\n    'A': [1, 2, None, 4, 5],\n    'B': [None, 2, 3, 4, 5],\n    'C': [1, 2, 3, 4, 5],\n    'D': [1, 2, 3, 4, 5],    \n}\n\ndf = pd.DataFrame(data)\n\n# Use the missing method to access the missing value handling methods\ndf.missing.number_missing()\n```\n\nThis will return the total number of missing values in the DataFrame.\n\n## Contributing\n\nIf you have any suggestions, bug reports, or feature requests, please open an issue on the GitHub repository. We welcome contributions from the community, and pull requests are always appreciated.\n\n## License\n\nThis package is licensed under the MIT License. See the [LICENSE]()\n\n## Acknowledgements\n\nThis package was inspired by the [naniar](https://naniar.njtierney.com/) package in R, which provides similar functionality for working with missing values in data frames. We would like to thank the authors of naniar for their work and for providing a valuable resource for the data science community.\n\n## References\n\n- [naniar: Data Structures, Summaries, and Visualisations for Missing Data](https://naniar.njtierney.com/)\n- [Handling Missing Data in Pandas](https://towardsdatascience.com/handling-missing-data-in-pandas-ba0b2ee0f4e4)\n- [Working with Missing Data in Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html)\n\n## Metrics\n\nYou can find the metrics of this package in the following link: [Metrics](https://lookerstudio.google.com/s/m-3EH05N9W8)\n\n## Contact\n\nIf you have any questions or need further assistance, please contact the package maintainer: gobeamariano@gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgobeaalcoba%2Fmissing_mga","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmgobeaalcoba%2Fmissing_mga","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgobeaalcoba%2Fmissing_mga/lists"}