https://github.com/open-risk/dataqualitytoolkit
Python toolkit for evaluating and visualizing the data quality of excel spreadsheets
https://github.com/open-risk/dataqualitytoolkit
data-quality data-quality-measurement data-science excel spreadsheet
Last synced: 25 days ago
JSON representation
Python toolkit for evaluating and visualizing the data quality of excel spreadsheets
- Host: GitHub
- URL: https://github.com/open-risk/dataqualitytoolkit
- Owner: open-risk
- License: apache-2.0
- Created: 2018-06-21T19:38:27.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2024-09-05T19:30:14.000Z (8 months ago)
- Last Synced: 2025-03-24T12:21:39.108Z (about 1 month ago)
- Topics: data-quality, data-quality-measurement, data-science, excel, spreadsheet
- Language: Python
- Homepage: https://www.openriskmanagement.com
- Size: 438 KB
- Stars: 8
- Watchers: 1
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.rst
- License: LICENSE.txt
Awesome Lists containing this project
README
DataQualityToolkit
==================
A Python toolkit for evaluating and visualizing the data quality of excel spreadsheets, csv files or other tabular data
Purpose of the project
======================DataQualityToolkit is a Python powered library for the evaluation and visualization of the data
quality of data provided in excel spreadsheets, csv files or other tabular data fetched from the webGeneral Info
=========================Author: Open Risk, http://www.openriskmanagement.com
License: Apache 2.0
Documentation: Open Risk Manual, http://www.openriskmanual.org/wiki/Data_Quality
Training: Open Risk Academy, https://www.openriskacademy.com/login/index.php
Development website: https://github.com/open-risk/DataQualityToolkit
Discussion: https://www.openriskcommons.org/
Functionality
=============NB: The 0.2 release is (still) a heavily (pre-)alpha version.
You can use DataQualityToolkit to:
- Automatically produce validation reports and visualizations given an existing set of validation rules
- Add to the validation rules
- There is an assumption that the spreadsheets are formatted in standard columnar format with all worksheets starting at the same header row
- There are many assumptions about the structure of wikitables (www source case)File structure
==============* datasets/ Contains datasets useful for getting started with the DataQualityToolkit
* examples/ Contains examples
* DQToolkit.py Main objectsUsage
=====Look at the examples directory on how to produce the visuals include in this README file
Dependencies
============- DataQualityToolkit is written in Python and depends on the standard numerical and data processing Python libraries (Numpy, Scipy, Pandas)
- The Visualization API depends on Matplotlib