Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/thee-unruly/data-validation-and-verification


https://github.com/thee-unruly/data-validation-and-verification

Last synced: 4 days ago
JSON representation

Awesome Lists containing this project

README

        

1. **Data Validation Techniques**

These techniques help ensure that the migrated data meets quality standards and requirements.

**Field-by-Field Validation**: Compare each data field in the *source and target systems* to ensure that *values match exactly*.

**Row Counts**: Check the number of rows in each table in the *source and destination systems* to verify that *no records are missing or duplicated*.

**Data Type Validation**: Ensure that the data types in the destination match those in the source, and verify that *data has not been truncated or altered* during migration.

**Range and Boundary Checks**: Confirm that numeric *values fall within the expected ranges* and that *dates are valid and correctly formatted*.

**Null/Blank Value Checks**: Verify that *fields that shouldn't contain null or blank values* are properly populated in the target system.

2. **Data Verification Techniques**

These techniques help confirm that the migrated data works correctly in its new environment.

**Business Rule Validation**: Check that the migrated *data meets the defined business rules and logic*. For example, ensure that calculated fields, such as totals or percentages, match between the source and target.

**Data Integrity Checks**: Verify *relationships and foreign key constraints to ensure referential integrity*. Ensure that all primary keys, foreign keys, and unique constraints are intact and functioning correctly in the target system.

**Data Quality Audits**: *Identify and fix issues* such as duplicates, inconsistencies, or missing data.

**Spot Checks**: *Randomly select a subset of records and manually compare* them between the *source and target systems for accuracy*.

3. **Automation of Data Validation and Verification**

Automating these processes can save time and increase accuracy:

**SQL Queries**: Use *SQL queries to validate data counts, data types, ranges, and referential integrity*.

**Python Scripts**: Write custom Python scripts to compare data between the source and destination systems, leveraging libraries like *pandas or PySpark*.