{"id":31132693,"url":"https://github.com/ronverse17/automate-data-cleaning","last_synced_at":"2026-04-28T12:03:22.530Z","repository":{"id":314944052,"uuid":"1057449433","full_name":"ronverse17/Automate-Data-Cleaning","owner":"ronverse17","description":"This project automates messy data cleaning tasks - like fixing column names, filling missing values, and spotting outliers, so analysts and data scientists can spend more time on insights, not preprocessing.","archived":false,"fork":false,"pushed_at":"2025-09-15T19:26:19.000Z","size":38,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-15T20:30:35.565Z","etag":null,"topics":["numpy","pandas","pipeline","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ronverse17.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-15T18:35:12.000Z","updated_at":"2025-09-15T19:30:03.000Z","dependencies_parsed_at":"2025-09-16T22:31:16.036Z","dependency_job_id":null,"html_url":"https://github.com/ronverse17/Automate-Data-Cleaning","commit_stats":null,"previous_names":["ronverse17/automate-data-cleaning"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/ronverse17/Automate-Data-Cleaning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ronverse17%2FAutomate-Data-Cleaning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ronverse17%2FAutomate-Data-Cleaning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ronverse17%2FAutomate-Data-Cleaning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ronverse17%2FAutomate-Data-Cleaning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ronverse17","download_url":"https://codeload.github.com/ronverse17/Automate-Data-Cleaning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ronverse17%2FAutomate-Data-Cleaning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32379629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T11:25:28.583Z","status":"ssl_error","status_checked_at":"2026-04-28T11:25:05.435Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["numpy","pandas","pipeline","python"],"created_at":"2025-09-18T05:01:41.849Z","updated_at":"2026-04-28T12:03:22.512Z","avatar_url":"https://github.com/ronverse17.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Automated Data Cleaning Pipeline Project\n## 📌 Project Overview\nThis is an automated data preprocessing pipeline built using Python. \nThis class provides common data cleaning steps to prepare data for analysis or machine learning algorithms. This tool standardizes column names, handles missing values, detects outliers, optimizes categorical data types \u0026 generates a cleaning report.\n\n## ✨ Features\n- Standardizes column names to lower case (snake case)\n- Detects and imputes missing values (median for numeric, mode for categorical by default)\n- Standardizes string columns to lowercase with spaces removed if present at the beginning/end\n- Identifies constant columns and high-cardinality features\n- Detects potential numeric outliers using the IQR rule\n- Converts low cardinality object columns to category dtype\n- Generates a structured report summarizing cleaning actions\n\n## ⚙️ Requirements\n- Python\n- pandas\n- numpy\n\n## Clone the repository\n```bash\ngit clone https://github.com/ronverse17/Automate-Data-Cleaning.git\ncd Automate-Data-Cleaning\n```\n\n## 🚀 Usage\nFor usage, refer to the test_file.ipynb.\n\n## 📂 Files in this Repo\n- data_cleaner.py → Contains the pipeline for cleaning the DataFrame\n- test_data.csv → Dataset used for testing the pipeline\n- test_file.ipynb → Jupyter Notebook containing demo \u0026 test results for the pipeline \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fronverse17%2Fautomate-data-cleaning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fronverse17%2Fautomate-data-cleaning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fronverse17%2Fautomate-data-cleaning/lists"}