{"id":21840153,"url":"https://github.com/abdulrahmanaymann/data-mining","last_synced_at":"2025-08-21T14:05:02.335Z","repository":{"id":217495257,"uuid":"744015914","full_name":"abdulrahmanaymann/Data-Mining","owner":"abdulrahmanaymann","description":"data mining project involving two tasks: a regression problem and a classification problem.","archived":false,"fork":false,"pushed_at":"2024-01-16T13:37:11.000Z","size":897,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-21T15:42:57.184Z","etag":null,"topics":["classification","data-mining","imputation","jupyter-notebook","knn","linear-regression","outlier-detection","polynomial-regression","preprocessing","python","regression","scaling"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abdulrahmanaymann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2024-01-16T13:18:15.000Z","updated_at":"2024-10-01T11:20:18.000Z","dependencies_parsed_at":"2024-01-16T21:02:19.504Z","dependency_job_id":"55a8f645-fe78-48b7-83a2-35fb5701acca","html_url":"https://github.com/abdulrahmanaymann/Data-Mining","commit_stats":null,"previous_names":["abdulrahmanaymann/data-mining"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abdulrahmanaymann/Data-Mining","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahmanaymann%2FData-Mining","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahmanaymann%2FData-Mining/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahmanaymann%2FData-Mining/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahmanaymann%2FData-Mining/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abdulrahmanaymann","download_url":"https://codeload.github.com/abdulrahmanaymann/Data-Mining/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulrahmanaymann%2FData-Mining/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266476030,"owners_count":23935107,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","data-mining","imputation","jupyter-notebook","knn","linear-regression","outlier-detection","polynomial-regression","preprocessing","python","regression","scaling"],"created_at":"2024-11-27T21:24:51.521Z","updated_at":"2025-07-22T10:35:00.326Z","avatar_url":"https://github.com/abdulrahmanaymann.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Mining Project\n\n## Overview\n\nThis repository contains the code and documentation for a data mining project involving two tasks: a regression problem and a classification problem. The project covers various preprocessing steps, model training, testing, visualization, and evaluation using different techniques.\n\n## Tasks\n\n### 1. Regression Problem\n\n#### a. Impute a Categorical Missing Value\n\n- Utilized imputation method to handle categorical missing values in the dataset.\n\n#### b. Impute a Numerical Missing Value\n\n- Employed [specific imputation method] to handle numerical missing values in the dataset.\n\n#### c. Identify a Scaling Problem Visually\n\n- Visualized scaling issues in the dataset using [visualization method].\n\n#### d. Apply 2 Methods of Scaling to Treat Outliers\n\n- Applied [scaling method 1] and [scaling method 2] to address outliers in the dataset.\n\n#### e. Convert a Categorical Variable to Number(s)\n\n- Transformed categorical variables into numerical format.\n\n#### f. Generate 2 Regression Models with MAE and R2\n\n- Developed two regression models using [model 1] and [model 2], assessing Mean Absolute Error (MAE) and R-squared for each.\n\n#### g. Compare Both Models to Identify Which is Better\n\n- Conducted a thorough comparison of [model 1] and [model 2] to identify the superior performing regression model.\n\n### 2. Classification Problem\n\n#### a. Impute a Categorical Missing Value\n\n- Implemented imputation method to handle categorical missing values in the dataset.\n\n#### b. Impute a Numerical Missing Value\n\n- Utilized imputation method to handle numerical missing values in the dataset.\n\n#### c. Identify a Scaling Problem Visually\n\n- Visualized scaling issues in the dataset.\n\n#### d. Apply 2 Methods of Scaling to Treat Outliers\n\n- Employed Scaling methods to address outliers in the dataset.\n\n#### e. Convert a Categorical Variable to Number(s)\n\n- Transformed categorical variables into numerical format.\n\n#### f. Fit a Classification Model\n\n- Fitted a classification model using KNN.\n\n#### g. Evaluate Your Model\n\n- Assessed the classification model using confusion matrix and accuracy metrics.\n\n## Visualization\n\n- Visualized the dataset after performing the models, showcasing predicted results versus actual results.\n\n## Evaluation\n\n### Regression Problem\n\n- Used Mean Absolute Error (MAE) and R-squared to evaluate the regression models.\n\n### Classification Problem\n\n- Utilized confusion matrix and accuracy metrics to evaluate the classification model.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdulrahmanaymann%2Fdata-mining","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabdulrahmanaymann%2Fdata-mining","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdulrahmanaymann%2Fdata-mining/lists"}