{"id":18717949,"url":"https://github.com/zazi2002/machine-learning-project","last_synced_at":"2026-05-02T23:36:47.538Z","repository":{"id":255691542,"uuid":"853404377","full_name":"ZaZi2002/Machine-Learning-Project","owner":"ZaZi2002","description":"Introduction to Machine Learning project with the goal of improving the classification performance on a dataset by optimizing the number of features and weak learners.","archived":false,"fork":false,"pushed_at":"2024-09-06T15:57:23.000Z","size":414,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-19T14:01:07.568Z","etag":null,"topics":["dimentionality-reduction","ensemble-learning","numpy","pca","random-forest","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZaZi2002.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-06T15:33:31.000Z","updated_at":"2024-09-09T15:13:45.000Z","dependencies_parsed_at":"2024-09-06T18:57:28.676Z","dependency_job_id":"2d2ad356-ba48-4094-a2fa-214ee0c267f5","html_url":"https://github.com/ZaZi2002/Machine-Learning-Project","commit_stats":null,"previous_names":["zazi2002/machine-learning-project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ZaZi2002/Machine-Learning-Project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZaZi2002%2FMachine-Learning-Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZaZi2002%2FMachine-Learning-Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZaZi2002%2FMachine-Learning-Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZaZi2002%2FMachine-Learning-Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZaZi2002","download_url":"https://codeload.github.com/ZaZi2002/Machine-Learning-Project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZaZi2002%2FMachine-Learning-Project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001433,"owners_count":26083078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dimentionality-reduction","ensemble-learning","numpy","pca","random-forest","scikit-learn"],"created_at":"2024-11-07T13:18:39.042Z","updated_at":"2025-10-09T12:35:20.840Z","avatar_url":"https://github.com/ZaZi2002.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Machine Learning Project: Dimensionality Reduction and Ensemble Learning\n\n## Project Overview\nThis project is for Introduction to Machine Learning course and applies **dimensionality reduction** techniques, specifically **Principal Component Analysis (PCA)**, and uses **ensemble learning** methods such as **Random Forests** and **Decision Trees**. The goal is to improve the classification performance on a dataset by optimizing the number of features and weak learners. Key metrics such as accuracy, precision, recall, F1-score, and AUPRC (Area Under Precision-Recall Curve) are used to evaluate the model's performance.\n\n### Key Features:\n- **Dimensionality Reduction with PCA:** Reducing the number of features by maintaining the most variance-rich components.\n- **Ensemble Learning:** Using multiple weak learners (Decision Trees) with both hard and soft voting strategies to enhance prediction accuracy.\n- **Performance Metrics:** The model's output is evaluated using accuracy, precision, recall, F1-score, and AUPRC.\n\n### Steps in the Notebook:\n1. **Data Preprocessing:** \n   - Mean normalization and zero-centering of the data.\n   - PCA to reduce the dimensionality of the dataset based on explained variance.\n   \n2. **Model Training:**\n   - Train and test the model using the **Random Forest** estimator.\n   - Implement ensemble learning with different numbers of weak learners.\n   \n3. **Performance Evaluation:**\n   - Calculate key metrics including accuracy, precision, recall, F1-score, and AUPRC for both PCA-reduced data and ensemble learners.\n   \n4. **Optimization:** \n   - The number of PCA components and weak learners is optimized to balance performance and computational cost.\n\n### Metrics Achieved:\n- **Accuracy:** 97.7%\n- **Precision:** 98.4%\n- **Recall:** 98.7%\n- **F1-Score:** 98.6%\n- **AUPRC:** 98.1%\n\n## Requirements\nTo run the project, you need the following Python libraries:\n- `numpy`\n- `scikit-learn`\n- `matplotlib`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzazi2002%2Fmachine-learning-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzazi2002%2Fmachine-learning-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzazi2002%2Fmachine-learning-project/lists"}