Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ngangawairimu/data-validation-using-python
Agricultural dataset validated using python code for usage. Building a data pipeline that will ingest and clean data with the press of a button.
https://github.com/ngangawairimu/data-validation-using-python
jupyter-notebook numpy pandas pytest python
Last synced: 24 days ago
JSON representation
Agricultural dataset validated using python code for usage. Building a data pipeline that will ingest and clean data with the press of a button.
- Host: GitHub
- URL: https://github.com/ngangawairimu/data-validation-using-python
- Owner: ngangawairimu
- Created: 2024-05-04T09:14:58.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-19T05:19:11.000Z (2 months ago)
- Last Synced: 2024-12-19T06:24:17.899Z (2 months ago)
- Topics: jupyter-notebook, numpy, pandas, pytest, python
- Language: Jupyter Notebook
- Homepage:
- Size: 627 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Data Validation Project
## Objective:
To validate the MD_agric_df dataset against weather station data, ensuring its accuracy and reliability for agricultural insights.## Key Steps:
## Data Pipeline Development:
Built an automated data pipeline for seamless ingestion and cleaning of the MD_agric_df and weather datasets, significantly enhancing code readability and maintainability.## Hypothesis Testing:
Conducted hypothesis testing to evaluate the representation of the MD_agric_df dataset against actual weather conditions, focusing on both means and variances of the distributions. This involved:Creating a null hypothesis.
Cleaning and importing the MD_agric_df dataset.
Mapping and comparing it with nearby weather station data.
Performing t-tests to interpret results and validate data reliability.
Data Quality Checks:
Implemented rigorous data validation tests using Python and pytest, checking for:### Correct DataFrame shapes.
Valid column names.
Non-negative elevation values.
Valid crop types and positive rainfall measurements.
## Tools Used:
Python, Pandas, pytest, Jupyter Notebook for exploratory data analysis.