Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/open-risk/dataqualitytoolkit

Python toolkit for evaluating and visualizing the data quality of excel spreadsheets
https://github.com/open-risk/dataqualitytoolkit

data-quality data-quality-measurement data-science excel spreadsheet

Last synced: 2 months ago
JSON representation

Python toolkit for evaluating and visualizing the data quality of excel spreadsheets

Awesome Lists containing this project

README

        

DataQualityToolkit
==================
A Python toolkit for evaluating and visualizing the data quality of excel spreadsheets, csv files or other tabular data

![Alt text](DQToolkit.png?raw=true "DQToolkit Visual")

Purpose of the project
======================

DataQualityToolkit is a Python powered library for the evaluation and visualization of the data
quality of data provided in excel spreadsheets, csv files or other tabular data fetched from the web

General Info
=========================

Author: Open Risk, http://www.openriskmanagement.com

License: Apache 2.0

Documentation: Open Risk Manual, http://www.openriskmanual.org/wiki/Data_Quality

Training: Open Risk Academy, https://www.openriskacademy.com/login/index.php

Development website: https://github.com/open-risk/DataQualityToolkit

Discussion: https://www.openriskcommons.org/

Functionality
=============

NB: The 0.2 release is (still) a heavily (pre-)alpha version.

You can use DataQualityToolkit to:
- Automatically produce validation reports and visualizations given an existing set of validation rules
- Add to the validation rules
- There is an assumption that the spreadsheets are formatted in standard columnar format with all worksheets starting at the same header row
- There are many assumptions about the structure of wikitables (www source case)

File structure
==============

* datasets/ Contains datasets useful for getting started with the DataQualityToolkit
* examples/ Contains examples
* DQToolkit.py Main objects

Usage
=====

Look at the examples directory on how to produce the visuals include in this README file

Dependencies
============

- DataQualityToolkit is written in Python and depends on the standard numerical and data processing Python libraries (Numpy, Scipy, Pandas)
- The Visualization API depends on Matplotlib