Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ngangawairimu/data-validation-using-python

Agricultural dataset validated using python code for usage. Building a data pipeline that will ingest and clean data with the press of a button.
https://github.com/ngangawairimu/data-validation-using-python

jupyter-notebook numpy pandas pytest python

Last synced: 24 days ago
JSON representation

Agricultural dataset validated using python code for usage. Building a data pipeline that will ingest and clean data with the press of a button.

Awesome Lists containing this project

README

        

## Data Validation Project

## Objective:
To validate the MD_agric_df dataset against weather station data, ensuring its accuracy and reliability for agricultural insights.

## Key Steps:

## Data Pipeline Development:
Built an automated data pipeline for seamless ingestion and cleaning of the MD_agric_df and weather datasets, significantly enhancing code readability and maintainability.

## Hypothesis Testing:
Conducted hypothesis testing to evaluate the representation of the MD_agric_df dataset against actual weather conditions, focusing on both means and variances of the distributions. This involved:

Creating a null hypothesis.
Cleaning and importing the MD_agric_df dataset.
Mapping and comparing it with nearby weather station data.
Performing t-tests to interpret results and validate data reliability.
Data Quality Checks:
Implemented rigorous data validation tests using Python and pytest, checking for:

### Correct DataFrame shapes.
Valid column names.
Non-negative elevation values.
Valid crop types and positive rainfall measurements.
## Tools Used:
Python, Pandas, pytest, Jupyter Notebook for exploratory data analysis.