https://github.com/nishant2018/pca-feature-selection-scratch

Principal Component Analysis (PCA) is a powerful dimensionality reduction technique commonly used in machine learning and data analysis. It transforms a dataset into a set of linearly uncorrelated variables called principal components.
https://github.com/nishant2018/pca-feature-selection-scratch

feature-selection linear-algebra machine-learning pca statistics

Last synced: 12 days ago
JSON representation

Host: GitHub
URL: https://github.com/nishant2018/pca-feature-selection-scratch
Owner: Nishant2018
Created: 2024-06-10T09:03:10.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-06-10T10:11:28.000Z (about 2 years ago)
Last Synced: 2025-02-26T15:17:17.467Z (over 1 year ago)
Topics: feature-selection, linear-algebra, machine-learning, pca, statistics
Language: Jupyter Notebook
Homepage: https://www.kaggle.com/code/endofnight17j03/pca-feature-selection-scratch
Size: 669 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          ## Principal Component Analysis (PCA)

### Introduction

Principal Component Analysis (PCA) is a powerful dimensionality reduction technique commonly used in machine learning and data analysis. It transforms a dataset into a set of linearly uncorrelated variables called principal components. The primary goal of PCA is to reduce the dimensionality of the data while retaining as much variability as possible.

### Why Use PCA?

- **Dimensionality Reduction**: Simplifies the dataset by reducing the number of features.

- **Noise Reduction**: Helps in removing noise and redundant features.

- **Visualization**: Makes it easier to visualize high-dimensional data in 2D or 3D space.

- **Improved Performance**: Enhances the performance of machine learning algorithms by reducing overfitting.

### How PCA Works

1. **Standardize the Data**: PCA is affected by the scale of the variables, so it's essential to standardize the dataset.

   \[

   z = \frac{x - \mu}{\sigma}

   \]

   Where \( z \) is the standardized value, \( x \) is the original value, \( \mu \) is the mean, and \( \sigma \) is the standard deviation.

2. **Compute the Covariance Matrix**: Measure the variance and the relationship between different variables.

   \[

   \mathbf{C} = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})(x_i - \bar{x})^T

   \]

   Where \( \mathbf{C} \) is the covariance matrix, \( n \) is the number of samples, \( x_i \) is the \( i \)-th sample, and \( \bar{x} \) is the mean vector.

3. **Calculate the Eigenvalues and Eigenvectors**: Eigenvectors determine the direction of the new feature space, and eigenvalues determine their magnitude (importance).

   \[

   \mathbf{C} \mathbf{v} = \lambda \mathbf{v}

   \]

   Where \( \mathbf{v} \) is the eigenvector and \( \lambda \) is the eigenvalue.

4. **Sort Eigenvalues and Eigenvectors**: Rank the eigenvalues and their corresponding eigenvectors in descending order.

5. **Select Principal Components**: Choose the top \( k \) eigenvectors based on the largest eigenvalues to form a new matrix \( \mathbf{W} \).

6. **Transform the Data**: Project the original dataset onto the new feature space.

   \[

   \mathbf{Y} = \mathbf{W}^T \mathbf{X}

   \]

   Where \( \mathbf{Y} \) is the transformed dataset, \( \mathbf{W} \) is the matrix of selected eigenvectors, and \( \mathbf{X} \) is the original dataset.

### Example Code

Here is a simple example of how to perform PCA using Python's `scikit-learn` library:

```python

import numpy as np

from sklearn.decomposition import PCA

from sklearn.preprocessing import StandardScaler

# Sample data

X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0], [2.3, 2.7], [2, 1.6], [1, 1.1], [1.5, 1.6], [1.1, 0.9]])

# Standardize the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Apply PCA

pca = PCA(n_components=2)

principal_components = pca.fit_transform(X_scaled)

print("Principal Components:\n", principal_components)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nishant2018/pca-feature-selection-scratch

Awesome Lists containing this project

README