Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/firoz-ahmad-likhon/great-expectations-example
Sample project to demonstrate the use of Great Expectations
https://github.com/firoz-ahmad-likhon/great-expectations-example
data-engineering data-quality data-validation great-expectations python
Last synced: 4 days ago
JSON representation
Sample project to demonstrate the use of Great Expectations
- Host: GitHub
- URL: https://github.com/firoz-ahmad-likhon/great-expectations-example
- Owner: firoz-ahmad-likhon
- Created: 2024-11-07T04:58:42.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-23T17:08:27.000Z (about 1 month ago)
- Last Synced: 2025-01-01T09:42:44.990Z (4 days ago)
- Topics: data-engineering, data-quality, data-validation, great-expectations, python
- Language: Python
- Homepage:
- Size: 43 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Introduction
This is a sample project to demonstrate the use of `Great Expectations` to validate and document data quality.
This example uses a sample transaction data set converting to `Pandas` DataFrame and then validate. It will automatically generate data documentation in HTML format and store the scanned result to postgres database.The official documentation for Great Expectations can be found at [Official website](https://docs.greatexpectations.io/docs/home/) and the glossary of terms can be found at [Glossary](https://docs.greatexpectations.io/docs/reference/learn/glossary).
## Pre-requisites
1. A Postgres database to save the scanned result.## Installation
To install the project, follow the steps below:
1. Clone the repository
2. Create a virtual environment using `python -m venv venv`
3. Activate the virtual environment using `source venv/bin/activate` or `venv\Scripts\activate` on Windows
4. Install the required packages using `pip install -r requirements.txt`
5. Copy `.env-example` to `.env` and update the values as per your environment.## Running the project
To run the project, follow the steps below:
1. Initialize Great Expectations using `python init.py`
2. Run the validation using `python main.py`
3. To recreate once the init.py file is modified, run: `python init.py --mode recreate`## Understanding the project
The project consists of two files:
1. `init.py`: This file initializes Great Expectations and creates the data context along with various configurations and rules.
2. `main.py`: This file scans the rules.
3. `data`: This folder contains the sample data to be validated.### Type Checking and Linting
This repo uses `pre-commit` hooks to check type and linting before committing the code.Install `pre-commit` by running `pip install pre-commit` and then run `pre-commit install` to install the hooks.
Perform below commands to:
1. Type Checking
`mypy . --pdb`
2. Linting
`ruff check .`### Testing
To run the tests, run `pytest` in the terminal.
The test contains the following:
1. Integration test on the context.