Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zazi2002/machine-learning-project
Introduction to Machine Learning project with the goal of improving the classification performance on a dataset by optimizing the number of features and weak learners.
https://github.com/zazi2002/machine-learning-project
dimentionality-reduction ensemble-learning numpy pca random-forest scikit-learn
Last synced: 4 days ago
JSON representation
Introduction to Machine Learning project with the goal of improving the classification performance on a dataset by optimizing the number of features and weak learners.
- Host: GitHub
- URL: https://github.com/zazi2002/machine-learning-project
- Owner: ZaZi2002
- Created: 2024-09-06T15:33:31.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-09-06T15:57:23.000Z (4 months ago)
- Last Synced: 2024-11-07T13:18:46.193Z (about 2 months ago)
- Topics: dimentionality-reduction, ensemble-learning, numpy, pca, random-forest, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 404 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Machine Learning Project: Dimensionality Reduction and Ensemble Learning
## Project Overview
This project is for Introduction to Machine Learning course and applies **dimensionality reduction** techniques, specifically **Principal Component Analysis (PCA)**, and uses **ensemble learning** methods such as **Random Forests** and **Decision Trees**. The goal is to improve the classification performance on a dataset by optimizing the number of features and weak learners. Key metrics such as accuracy, precision, recall, F1-score, and AUPRC (Area Under Precision-Recall Curve) are used to evaluate the model's performance.### Key Features:
- **Dimensionality Reduction with PCA:** Reducing the number of features by maintaining the most variance-rich components.
- **Ensemble Learning:** Using multiple weak learners (Decision Trees) with both hard and soft voting strategies to enhance prediction accuracy.
- **Performance Metrics:** The model's output is evaluated using accuracy, precision, recall, F1-score, and AUPRC.### Steps in the Notebook:
1. **Data Preprocessing:**
- Mean normalization and zero-centering of the data.
- PCA to reduce the dimensionality of the dataset based on explained variance.
2. **Model Training:**
- Train and test the model using the **Random Forest** estimator.
- Implement ensemble learning with different numbers of weak learners.
3. **Performance Evaluation:**
- Calculate key metrics including accuracy, precision, recall, F1-score, and AUPRC for both PCA-reduced data and ensemble learners.
4. **Optimization:**
- The number of PCA components and weak learners is optimized to balance performance and computational cost.### Metrics Achieved:
- **Accuracy:** 97.7%
- **Precision:** 98.4%
- **Recall:** 98.7%
- **F1-Score:** 98.6%
- **AUPRC:** 98.1%## Requirements
To run the project, you need the following Python libraries:
- `numpy`
- `scikit-learn`
- `matplotlib`