https://github.com/demon-2-angel/handwritten-digit-classification
This document explores the use of Principal Component Analysis (PCA) in a machine learning context, specifically for image classification using a dataset of numerical representations of digits. The dataset is loaded using sci-kit-learn's load_digits function, and initial exploration is conducted to understand its structure.
https://github.com/demon-2-angel/handwritten-digit-classification
digits-recognition feature-extraction model-checking pca-analysis
Last synced: about 1 year ago
JSON representation
This document explores the use of Principal Component Analysis (PCA) in a machine learning context, specifically for image classification using a dataset of numerical representations of digits. The dataset is loaded using sci-kit-learn's load_digits function, and initial exploration is conducted to understand its structure.
- Host: GitHub
- URL: https://github.com/demon-2-angel/handwritten-digit-classification
- Owner: Demon-2-Angel
- Created: 2023-12-06T06:04:18.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-06T06:09:37.000Z (over 2 years ago)
- Last Synced: 2025-02-07T10:15:32.928Z (over 1 year ago)
- Topics: digits-recognition, feature-extraction, model-checking, pca-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 15.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Handwritten-Digit-Classification
# Introduction to PCA
## Overview
This document explores the use of Principal Component Analysis (PCA) in a machine learning context, specifically for image classification using a dataset of numerical representations of digits. The dataset is loaded using sci-kit-learn's `load_digits` function, and initial exploration is conducted to understand its structure.
## Dataset Exploration
The dataset consists of 1797 images, each represented by 64 numerical features (8x8 pixel values). The target variable represents the digit each image corresponds to (ranging from 0 to 9). After loading the data, a data frame is created for easier analysis.
## Preparing the Model
To train a machine learning model, the features are scaled using `StandardScaler` and the data is split into training and testing sets. A Logistic Regression model is then trained on the original dataset, and its accuracy is evaluated.
## Applying PCA
Principal Component Analysis (PCA) is employed to reduce the dimensionality of the dataset. The number of principal components is chosen based on the desired variance to be retained. The explained variance ratio and the number of components are examined. Another Logistic Regression model is trained on the reduced dataset, and its accuracy is evaluated.
## Conclusion
The document concludes with a summary table showcasing the impact of different levels of retained variance on the model's accuracy after applying PCA. This analysis provides insights into the trade-off between dimensionality reduction and model accuracy. The table indicates that as we reduce the dimensionality (and retain less variance), there is a gradual decrease in accuracy, demonstrating the importance of finding a balance between dimensionality reduction and model performance.
| Variance Selected | Features Retained | Accuracy |
| ------------------ | ----------------- | -------- |
| 95 | 27 | 96.9 |
| 90 | 21 | 96.3 |
| 85 | 17 | 95 |
| 80 | 13 | 94.4 |
| 75 | 11 | 93.05 |
| 70 | 9 | 93.88 |
| 65 | 8 | 90.83 |