Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/open-risk/dataqualitytoolkit
Python toolkit for evaluating and visualizing the data quality of excel spreadsheets
https://github.com/open-risk/dataqualitytoolkit
data-quality data-quality-measurement data-science excel spreadsheet
Last synced: 2 months ago
JSON representation
Python toolkit for evaluating and visualizing the data quality of excel spreadsheets
- Host: GitHub
- URL: https://github.com/open-risk/dataqualitytoolkit
- Owner: open-risk
- License: apache-2.0
- Created: 2018-06-21T19:38:27.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-09-30T12:00:56.000Z (about 1 year ago)
- Last Synced: 2023-09-30T13:21:18.817Z (about 1 year ago)
- Topics: data-quality, data-quality-measurement, data-science, excel, spreadsheet
- Language: Python
- Homepage: https://www.openriskmanagement.com
- Size: 432 KB
- Stars: 7
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.rst
- License: LICENSE.txt
Awesome Lists containing this project
README
DataQualityToolkit
==================
A Python toolkit for evaluating and visualizing the data quality of excel spreadsheets, csv files or other tabular data![Alt text](DQToolkit.png?raw=true "DQToolkit Visual")
Purpose of the project
======================DataQualityToolkit is a Python powered library for the evaluation and visualization of the data
quality of data provided in excel spreadsheets, csv files or other tabular data fetched from the webGeneral Info
=========================Author: Open Risk, http://www.openriskmanagement.com
License: Apache 2.0
Documentation: Open Risk Manual, http://www.openriskmanual.org/wiki/Data_Quality
Training: Open Risk Academy, https://www.openriskacademy.com/login/index.php
Development website: https://github.com/open-risk/DataQualityToolkit
Discussion: https://www.openriskcommons.org/
Functionality
=============NB: The 0.2 release is (still) a heavily (pre-)alpha version.
You can use DataQualityToolkit to:
- Automatically produce validation reports and visualizations given an existing set of validation rules
- Add to the validation rules
- There is an assumption that the spreadsheets are formatted in standard columnar format with all worksheets starting at the same header row
- There are many assumptions about the structure of wikitables (www source case)File structure
==============* datasets/ Contains datasets useful for getting started with the DataQualityToolkit
* examples/ Contains examples
* DQToolkit.py Main objectsUsage
=====Look at the examples directory on how to produce the visuals include in this README file
Dependencies
============- DataQualityToolkit is written in Python and depends on the standard numerical and data processing Python libraries (Numpy, Scipy, Pandas)
- The Visualization API depends on Matplotlib