{"id":20070444,"url":"https://github.com/royruddle/vizdataquality","last_synced_at":"2025-05-05T19:33:18.203Z","repository":{"id":220098595,"uuid":"668105811","full_name":"royruddle/vizdataquality","owner":"royruddle","description":"Python package for visualizing data quality","archived":false,"fork":false,"pushed_at":"2025-03-06T14:23:37.000Z","size":8017,"stargazers_count":2,"open_issues_count":3,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-09T03:51:11.512Z","etag":null,"topics":["data","data-science","data-visualization","jupyter-notebook","missing-data","python"],"latest_commit_sha":null,"homepage":"https://vizdataquality.readthedocs.io","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/royruddle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-19T03:32:36.000Z","updated_at":"2025-03-06T14:23:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"2c73a7b0-e996-41e8-a5ad-fef8d0cac3a7","html_url":"https://github.com/royruddle/vizdataquality","commit_stats":null,"previous_names":["royruddle/vizdataquality"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royruddle%2Fvizdataquality","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royruddle%2Fvizdataquality/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royruddle%2Fvizdataquality/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royruddle%2Fvizdataquality/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/royruddle","download_url":"https://codeload.github.com/royruddle/vizdataquality/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252563071,"owners_count":21768395,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-science","data-visualization","jupyter-notebook","missing-data","python"],"created_at":"2024-11-13T14:23:00.571Z","updated_at":"2025-05-05T19:33:15.022Z","avatar_url":"https://github.com/royruddle.png","language":"Jupyter Notebook","readme":"[![Python Package](https://github.com/royruddle/vizdataquality/actions/workflows/main.yml/badge.svg)](https://github.com/royruddle/vizdataquality/actions/workflows/main.yml)\n# vizdataquality\nThis is a Python package for visualizing data quality, and has two main parts. One is software that helps you comprehensively profile and investigate data quality using this six-step workflow:\n1. Look at your data (is anything obviously wrong?)\n2. Watch out for special values\n3. Is any data missing?\n4. Check each variable\n5. Check combinations of variables\n6. Profile the cleaned data\n\nThe other is software for investigating patterns and structures of missing values in your data. When a given pattern of missing values has been found to be associated with other factors or attributes of the data then it becomes a \"structure of missingness\". Patterns and structures of missing values are part of Step 5 of the workflow, because they involve combinations of variables.\n\n## Documentation\n[The vizdataquality documentation](https://vizdataquality.readthedocs.io/en/latest/index.html) is hosted on Read the Docs.\n\n## Installation\nWe recommend installing vizdataquality in a python virtual environment or Conda environment.\n\nTo install [vizdataquality](https://pypi.org/project/vizdataquality/), most users should run:\n\n```\npip install 'vizdataquality'\n```\n\n## Tutorials\nThe package includes notebooks that show you how to:\n- [Calculate a set of data quality attributes and output them to a file](https://github.com/royruddle/vizdataquality/blob/main/notebooks/Simple%20example.ipynb)\n- Use each type of plot, e.g., [datetime value distribution](https://github.com/royruddle/vizdataquality/blob/main/notebooks/Datetime%20value%20distribution.ipynb)\n- [Create a report](https://github.com/royruddle/vizdataquality/blob/main/notebooks/Report.ipynb) while you investigate data quality and profile a dataset\n- [Apply the six-step workflow to an open parking fines dataset](https://github.com/royruddle/vizdataquality/blob/main/notebooks/Workflow%20(parking%20fines).ipynb)\n\nAfter installing vizdataquality, to follow theses tutorials interactively you will need to clone or download this repository. Then start jupyter from within it:\n\n```\npython -m jupyter notebook notebooks\n```\n\n## Development\n- Documentation is built on readthedocs.com from main branch\n- PyPi pulls on creating a release on project repository on GitHub.\n\n## Notice\nThe vizdataquality software is released under the Apache Licence, version 2.0. See [LICENCE](./LICENCE) for details.\n\nThe file missing_data_functions.py contains some code that has been derived from [setvis](https://pypi.org/project/setvis/), which uses the same licence as vizdataquality. The same person leads the development of both packages. \n\n## Acknowledgements\nThe development of the vizdataquality software was supported by funding from the Engineering and Physical Sciences Research Council (EP/N013980/1; EP/R511717/1) and the Alan Turing Institute.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyruddle%2Fvizdataquality","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froyruddle%2Fvizdataquality","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyruddle%2Fvizdataquality/lists"}