Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thee-unruly/data-validation-and-verification
https://github.com/thee-unruly/data-validation-and-verification
Last synced: 4 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/thee-unruly/data-validation-and-verification
- Owner: Thee-Unruly
- Created: 2024-10-01T07:07:42.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-10-01T08:04:38.000Z (about 1 month ago)
- Last Synced: 2024-10-28T14:22:22.334Z (17 days ago)
- Language: Python
- Size: 3.91 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
1. **Data Validation Techniques**
These techniques help ensure that the migrated data meets quality standards and requirements.
**Field-by-Field Validation**: Compare each data field in the *source and target systems* to ensure that *values match exactly*.
**Row Counts**: Check the number of rows in each table in the *source and destination systems* to verify that *no records are missing or duplicated*.
**Data Type Validation**: Ensure that the data types in the destination match those in the source, and verify that *data has not been truncated or altered* during migration.
**Range and Boundary Checks**: Confirm that numeric *values fall within the expected ranges* and that *dates are valid and correctly formatted*.
**Null/Blank Value Checks**: Verify that *fields that shouldn't contain null or blank values* are properly populated in the target system.
2. **Data Verification Techniques**
These techniques help confirm that the migrated data works correctly in its new environment.
**Business Rule Validation**: Check that the migrated *data meets the defined business rules and logic*. For example, ensure that calculated fields, such as totals or percentages, match between the source and target.
**Data Integrity Checks**: Verify *relationships and foreign key constraints to ensure referential integrity*. Ensure that all primary keys, foreign keys, and unique constraints are intact and functioning correctly in the target system.
**Data Quality Audits**: *Identify and fix issues* such as duplicates, inconsistencies, or missing data.
**Spot Checks**: *Randomly select a subset of records and manually compare* them between the *source and target systems for accuracy*.
3. **Automation of Data Validation and Verification**
Automating these processes can save time and increase accuracy:
**SQL Queries**: Use *SQL queries to validate data counts, data types, ranges, and referential integrity*.
**Python Scripts**: Write custom Python scripts to compare data between the source and destination systems, leveraging libraries like *pandas or PySpark*.