{"id":19599131,"url":"https://github.com/nishant2018/pca-feature-selection-scratch","last_synced_at":"2026-06-12T08:32:35.572Z","repository":{"id":243635613,"uuid":"812972278","full_name":"Nishant2018/PCA-Feature-Selection-Scratch","owner":"Nishant2018","description":"Principal Component Analysis (PCA) is a powerful dimensionality reduction technique commonly used in machine learning and data analysis. It transforms a dataset into a set of linearly uncorrelated variables called principal components.","archived":false,"fork":false,"pushed_at":"2024-06-10T10:11:28.000Z","size":685,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-26T15:17:17.467Z","etag":null,"topics":["feature-selection","linear-algebra","machine-learning","pca","statistics"],"latest_commit_sha":null,"homepage":"https://www.kaggle.com/code/endofnight17j03/pca-feature-selection-scratch","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Nishant2018.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-10T09:03:10.000Z","updated_at":"2024-06-22T04:38:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"5dd21109-7583-42f1-aff4-4d5d7b4d2e8b","html_url":"https://github.com/Nishant2018/PCA-Feature-Selection-Scratch","commit_stats":null,"previous_names":["nishant2018/pca-feature-selection-scratch"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Nishant2018/PCA-Feature-Selection-Scratch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nishant2018%2FPCA-Feature-Selection-Scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nishant2018%2FPCA-Feature-Selection-Scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nishant2018%2FPCA-Feature-Selection-Scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nishant2018%2FPCA-Feature-Selection-Scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Nishant2018","download_url":"https://codeload.github.com/Nishant2018/PCA-Feature-Selection-Scratch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nishant2018%2FPCA-Feature-Selection-Scratch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34236550,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-12T02:00:06.859Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["feature-selection","linear-algebra","machine-learning","pca","statistics"],"created_at":"2024-11-11T09:09:03.515Z","updated_at":"2026-06-12T08:32:35.555Z","avatar_url":"https://github.com/Nishant2018.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Principal Component Analysis (PCA)\n\n### Introduction\n\nPrincipal Component Analysis (PCA) is a powerful dimensionality reduction technique commonly used in machine learning and data analysis. It transforms a dataset into a set of linearly uncorrelated variables called principal components. The primary goal of PCA is to reduce the dimensionality of the data while retaining as much variability as possible.\n\n### Why Use PCA?\n\n- **Dimensionality Reduction**: Simplifies the dataset by reducing the number of features.\n- **Noise Reduction**: Helps in removing noise and redundant features.\n- **Visualization**: Makes it easier to visualize high-dimensional data in 2D or 3D space.\n- **Improved Performance**: Enhances the performance of machine learning algorithms by reducing overfitting.\n\n### How PCA Works\n\n1. **Standardize the Data**: PCA is affected by the scale of the variables, so it's essential to standardize the dataset.\n   \\[\n   z = \\frac{x - \\mu}{\\sigma}\n   \\]\n   Where \\( z \\) is the standardized value, \\( x \\) is the original value, \\( \\mu \\) is the mean, and \\( \\sigma \\) is the standard deviation.\n\n2. **Compute the Covariance Matrix**: Measure the variance and the relationship between different variables.\n   \\[\n   \\mathbf{C} = \\frac{1}{n-1} \\sum_{i=1}^{n} (x_i - \\bar{x})(x_i - \\bar{x})^T\n   \\]\n   Where \\( \\mathbf{C} \\) is the covariance matrix, \\( n \\) is the number of samples, \\( x_i \\) is the \\( i \\)-th sample, and \\( \\bar{x} \\) is the mean vector.\n\n3. **Calculate the Eigenvalues and Eigenvectors**: Eigenvectors determine the direction of the new feature space, and eigenvalues determine their magnitude (importance).\n   \\[\n   \\mathbf{C} \\mathbf{v} = \\lambda \\mathbf{v}\n   \\]\n   Where \\( \\mathbf{v} \\) is the eigenvector and \\( \\lambda \\) is the eigenvalue.\n\n4. **Sort Eigenvalues and Eigenvectors**: Rank the eigenvalues and their corresponding eigenvectors in descending order.\n\n5. **Select Principal Components**: Choose the top \\( k \\) eigenvectors based on the largest eigenvalues to form a new matrix \\( \\mathbf{W} \\).\n\n6. **Transform the Data**: Project the original dataset onto the new feature space.\n   \\[\n   \\mathbf{Y} = \\mathbf{W}^T \\mathbf{X}\n   \\]\n   Where \\( \\mathbf{Y} \\) is the transformed dataset, \\( \\mathbf{W} \\) is the matrix of selected eigenvectors, and \\( \\mathbf{X} \\) is the original dataset.\n\n### Example Code\n\nHere is a simple example of how to perform PCA using Python's `scikit-learn` library:\n\n```python\nimport numpy as np\nfrom sklearn.decomposition import PCA\nfrom sklearn.preprocessing import StandardScaler\n\n# Sample data\nX = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0], [2.3, 2.7], [2, 1.6], [1, 1.1], [1.5, 1.6], [1.1, 0.9]])\n\n# Standardize the data\nscaler = StandardScaler()\nX_scaled = scaler.fit_transform(X)\n\n# Apply PCA\npca = PCA(n_components=2)\nprincipal_components = pca.fit_transform(X_scaled)\n\nprint(\"Principal Components:\\n\", principal_components)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnishant2018%2Fpca-feature-selection-scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnishant2018%2Fpca-feature-selection-scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnishant2018%2Fpca-feature-selection-scratch/lists"}