{"id":22013066,"url":"https://github.com/ngangawairimu/data-validation-using-python","last_synced_at":"2026-04-13T16:05:05.251Z","repository":{"id":238093546,"uuid":"795860864","full_name":"ngangawairimu/Data-Validation-using-python","owner":"ngangawairimu","description":"Agricultural dataset  validated using python code for usage. Building a data pipeline that will ingest and clean  data with the press of a button.","archived":false,"fork":false,"pushed_at":"2024-12-19T05:19:11.000Z","size":642,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-28T14:52:49.873Z","etag":null,"topics":["jupyter-notebook","numpy","pandas","pytest","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ngangawairimu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-04T09:14:58.000Z","updated_at":"2024-12-19T05:19:15.000Z","dependencies_parsed_at":"2024-11-30T03:16:48.970Z","dependency_job_id":"b8bb4374-0b57-461d-afb6-6672b8448c72","html_url":"https://github.com/ngangawairimu/Data-Validation-using-python","commit_stats":null,"previous_names":["ngangawairimu/validating_our_data","ngangawairimu/data-validation-using-python"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ngangawairimu%2FData-Validation-using-python","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ngangawairimu%2FData-Validation-using-python/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ngangawairimu%2FData-Validation-using-python/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ngangawairimu%2FData-Validation-using-python/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ngangawairimu","download_url":"https://codeload.github.com/ngangawairimu/Data-Validation-using-python/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245073605,"owners_count":20556582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["jupyter-notebook","numpy","pandas","pytest","python"],"created_at":"2024-11-30T03:16:51.943Z","updated_at":"2026-04-13T16:05:05.244Z","avatar_url":"https://github.com/ngangawairimu.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Data Validation Project\n\n## Objective:\nTo validate the MD_agric_df dataset against weather station data, ensuring its accuracy and reliability for agricultural insights.\n\n## Key Steps:\n\n## Data Pipeline Development:\nBuilt an automated data pipeline for seamless ingestion and cleaning of the MD_agric_df and weather datasets, significantly enhancing code readability and maintainability.\n\n## Hypothesis Testing:\nConducted hypothesis testing to evaluate the representation of the MD_agric_df dataset against actual weather conditions, focusing on both means and variances of the distributions. This involved:\n\nCreating a null hypothesis.\nCleaning and importing the MD_agric_df dataset.\nMapping and comparing it with nearby weather station data.\nPerforming t-tests to interpret results and validate data reliability.\nData Quality Checks:\nImplemented rigorous data validation tests using Python and pytest, checking for:\n\n### Correct DataFrame shapes.\nValid column names.\nNon-negative elevation values.\nValid crop types and positive rainfall measurements.\n## Tools Used:\nPython, Pandas, pytest, Jupyter Notebook for exploratory data analysis.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fngangawairimu%2Fdata-validation-using-python","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fngangawairimu%2Fdata-validation-using-python","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fngangawairimu%2Fdata-validation-using-python/lists"}