{"id":19571227,"url":"https://github.com/aneeshmurali-n/project-ml-data-preprocessing","last_synced_at":"2025-11-20T08:03:34.131Z","repository":{"id":254071707,"uuid":"845399749","full_name":"aneeshmurali-n/Project-ML-Data-Preprocessing","owner":"aneeshmurali-n","description":"The main objective of this project is to design and implement a robust data preprocessing system that addresses common challenges such as missing values, outliers, inconsistent formatting, and noise. By performing effective data preprocessing, the project aims to enhance the quality, reliability, and usefulness of the data for machine learning.","archived":false,"fork":false,"pushed_at":"2024-08-25T06:09:59.000Z","size":178,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-26T10:45:53.469Z","etag":null,"topics":["data-analysis","data-cleaning","data-encoding","data-exploration","feature-scaling","label-encoding","matplotlib","minmaxscaler","numpy","one-hot-encoding","outlier-detection","pandas","standardscaler"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aneeshmurali-n.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-21T07:12:21.000Z","updated_at":"2024-08-25T06:19:17.000Z","dependencies_parsed_at":"2024-08-21T08:53:54.587Z","dependency_job_id":null,"html_url":"https://github.com/aneeshmurali-n/Project-ML-Data-Preprocessing","commit_stats":null,"previous_names":["aneeshmurali-n/project-ml-data-preprocessing"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aneeshmurali-n/Project-ML-Data-Preprocessing","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneeshmurali-n%2FProject-ML-Data-Preprocessing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneeshmurali-n%2FProject-ML-Data-Preprocessing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneeshmurali-n%2FProject-ML-Data-Preprocessing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneeshmurali-n%2FProject-ML-Data-Preprocessing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aneeshmurali-n","download_url":"https://codeload.github.com/aneeshmurali-n/Project-ML-Data-Preprocessing/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneeshmurali-n%2FProject-ML-Data-Preprocessing/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":285397096,"owners_count":27164670,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-20T02:00:05.334Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-cleaning","data-encoding","data-exploration","feature-scaling","label-encoding","matplotlib","minmaxscaler","numpy","one-hot-encoding","outlier-detection","pandas","standardscaler"],"created_at":"2024-11-11T06:17:59.654Z","updated_at":"2025-11-20T08:03:34.115Z","avatar_url":"https://github.com/aneeshmurali-n.png","language":"Jupyter Notebook","readme":"# Project-ML-Data-Preprocessing\nThe main objective of this project is to design and implement a robust data preprocessing system that addresses common challenges such as missing values, outliers, inconsistent formatting, and noise. By performing effective data preprocessing, the project aims to enhance the quality, reliability, and usefulness of the data for machine learning.\n\n## Fulfilled Key Components:\n\n### Data Exploration:\nExplore the data, list down the unique values in each feature and find its length.\nPerform the statistical analysis and renaming of the columns.\n\n### Data Cleaning:\nFind the missing and inappropriate values, treat them appropriately.\nRemove all duplicate rows.\nFind the outliers.\nReplace the value 0 in age as NaN\nTreat the null values in all columns using any measures(removing/ replace the values with mean/median/mode)\n\n### Data Analysis:\nFilter the data with age \u003e40 and salary\u003c5000\nPlot the chart with age and salary\nCount the number of people from each place and represent it visually\n\n### Data Encoding:\nConvert categorical variables into numerical representations using techniques such as one-hot encoding, label encoding, making them suitable for analysis by machine learning algorithms.\n\n### Feature Scaling:\nAfter the process of encoding, perform the scaling of the features using standardscaler and minmaxscaler.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faneeshmurali-n%2Fproject-ml-data-preprocessing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faneeshmurali-n%2Fproject-ml-data-preprocessing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faneeshmurali-n%2Fproject-ml-data-preprocessing/lists"}