{"id":102821,"url":"https://github.com/mlpapers/feature-extraction","name":"feature-extraction","description":"Awesome papers on Feature Extraction (Dimensionality Reduction)","projects_count":56,"last_synced_at":"2026-06-18T18:00:29.491Z","repository":{"id":301067497,"uuid":"253096827","full_name":"mlpapers/feature-extraction","owner":"mlpapers","description":"Awesome papers on Feature Extraction (Dimensionality Reduction)","archived":false,"fork":false,"pushed_at":"2026-02-14T21:37:55.000Z","size":12,"stargazers_count":11,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-06-02T02:03:06.573Z","etag":null,"topics":["awesome","awesome-list","dimensionality-reduction","feature-extraction","features-extraction","pca","principal-component-analysis"],"latest_commit_sha":null,"homepage":"https://mlpapers.org/feature-extraction/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mlpapers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-04-04T20:54:16.000Z","updated_at":"2026-03-25T07:24:13.000Z","dependencies_parsed_at":"2026-03-30T08:00:24.469Z","dependency_job_id":null,"html_url":"https://github.com/mlpapers/feature-extraction","commit_stats":null,"previous_names":["mlpapers/feature-extraction"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mlpapers/feature-extraction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Ffeature-extraction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Ffeature-extraction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Ffeature-extraction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Ffeature-extraction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mlpapers","download_url":"https://codeload.github.com/mlpapers/feature-extraction/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mlpapers%2Ffeature-extraction/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34501482,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"created_at":"2026-01-02T00:00:41.091Z","updated_at":"2026-06-18T18:00:29.492Z","primary_language":null,"list_of_lists":false,"displayable":true,"categories":["Uncategorized","Software","Related Topics"],"sub_categories":["Uncategorized"],"readme":"# Feature extraction\n\u003e In machine learning, pattern recognition and in image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning and generalization steps, and in some cases leading to better human interpretations. Feature extraction is related to dimensionality reduction. ([Wiki](https://en.wikipedia.org/wiki/Feature_extraction))\n\n- **Overview**\n  - [A survey of dimensionality reduction techniques](https://arxiv.org/pdf/1403.2877.pdf) *C.O.S.Sorzano, J.Vargas, A.Pascual‐Montano*\n  - [Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review](https://arxiv.org/pdf/1905.02845.pdf) (2019) *Benyamin Ghojogh, Maria N. Samad, Sayema Asif Mashhadi,Tania Kapoor, Wahab Ali, Fakhri Karray, Mark Crowley*\n\n- **PCA** Principal Component Analysis ([Wiki](https://en.wikipedia.org/wiki/Principal_component_analysis))\n  - [On lines and planes of closest fit to systems of points in space](https://zenodo.org/record/1430636#.Xos47PFRVnx) (1901) *Karl Pearson*\n  - Supervised PCA: [Prediction by Supervised Principal Components](https://web.stanford.edu/~hastie/Papers/spca_JASA.pdf) (2006) *Eric Bair, Trevor Hastie, Debashis Paul, Robert Tibshirani*\n  - Sparse PCA ([sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.SparsePCA.html#sklearn.decomposition.SparsePCA))\n- **DPCA** Dual Principal Component Analysis\n- **KPCA** Kernel Principal Component Analysis ([sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.KernelPCA.html#sklearn.decomposition.KernelPCA), [Wiki](https://en.wikipedia.org/wiki/Kernel_principal_component_analysis))\n  - [Nonlinear Component Analysis as a Kernel Eigenvalue Problem](http://alex.smola.org/papers/1998/SchSmoMul98.pdf) (1998) *Bernhard Scholkopf, Alexander Smola, Klaus-Robert Muller*\n  - [Kernel PCA for Novelty Detection](http://www.heikohoffmann.de/documents/hoffmann_kpca_preprint.pdf) (2006) *Heiko Hoffmann*\n  - [Robust Kernel Principal Component Analysis](https://papers.nips.cc/paper/3566-robust-kernel-principal-component-analysis.pdf) *Minh Hoai Nguyen, Fernando De la Torre*\n- **IPCA** Incremental (online) PCA ([CRAN](https://cran.r-project.org/web/packages/onlinePCA/), [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.IncrementalPCA.html#sklearn.decomposition.IncrementalPCA))\n- **ICA** Independent Component Analysis ([Wiki](https://en.wikipedia.org/wiki/Independent_component_analysis))\n  - [Independent Component Analysis: Algorithms and Applications](http://mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA_Hyvarinen.pdf) (2000) *Aapo Hyvärinen, Erkki Oja*\n  - [Independent Component Analysis](https://www.cs.helsinki.fi/u/ahyvarin/papers/bookfinal_ICA.pdf) (2001) - Free ebook *Aapo Hyvarinen, Juha Karhunen, Erkki Oja*\n  - FastICA ([sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html#sklearn.decomposition.FastICA))\n- **FLDA** Fisher's Linear Discriminant Analysis (Supervised) ([Wiki](https://en.wikipedia.org/wiki/Linear_discriminant_analysis))  \n  \u003e Similar to PCA, FLDA calculates the projection of data along a direction; however, rather than maximizing the variation of data, FLDA utilizes label information to get a projection maximizing the ratio of between-class variance to within-class variance. ([Source](https://arxiv.org/pdf/1905.02845.pdf))\n  - [The Use of Multiple Measurements in Taxonomic Problems](https://digital.library.adelaide.edu.au/dspace/bitstream/2440/15227/1/138.pdf) (1936) *R. A. Fisher*\n  - [The Utilization of Multiple Measurements in Problems of Biological Classification](https://www.jstor.org/stable/2983775?seq=1) (1948) - require registration *C. Radhakrishna Rao*\n  - [PCA versus LDA](http://www2.ece.ohio-state.edu/~aleix/pami01.pdf) (2001) *Aleix M. Martinez, Avinash C. Kak*\n  - Package: MASS includes lda ([CRAN](https://cran.r-project.org/web/packages/MASS/))\n  - Package: sda ([CRAN](https://cran.r-project.org/web/packages/sda/index.html))\n- **KFLDA** Kernel Fisher Linear Discriminant Analysis\n- **MDS** Multidimensional Scaling ([Wiki](https://en.wikipedia.org/wiki/Multidimensional_scaling))\n  - [Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis](http://cda.psych.uiuc.edu/psychometrika_highly_cited_articles/kruskal_1964a.pdf) (1964) *J. B. Kruskal*\n  - [An Analysis of Classical Multidimensional Scaling](https://arxiv.org/pdf/1812.11954.pdf) (2019) *Anna Little, Yuying Xie, Qiang Sun*\n  - Packages:\n      [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html)\n- **Isomap** ([Homepage](https://web.mit.edu/cocosci/isomap/isomap.html), [Wiki](https://en.wikipedia.org/wiki/Isomap))\n  - [A Global Geometric Framework for Nonlinear Dimensionality Reduction](https://web.mit.edu/cocosci/Papers/sci_reprint.pdf) (2000) *Joshua B. Tenenbaum, Vin de Silva, John C. Langford*\n  - Packages:\n      [dimRed](https://cran.r-project.org/web/packages/dimRed/dimRed.pdf),\n      [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.Isomap.html)\n- **Latent Dirichlet Allocation**\n  - [Online Learning for Latent Dirichlet Allocation](https://www.di.ens.fr/~fbach/mdhnips2010.pdf) (2010) *Matthew D. Hoffman, David M. Blei, Francis Bach*\n- **Factor analysys** ([Wiki](https://en.wikipedia.org/wiki/Factor_analysis), [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FactorAnalysis.html#sklearn.decomposition.FactorAnalysis))  \n  \u003e This technique is used to reduce a large number of variables into fewer numbers of factors. The values of observed data are expressed as functions of a number of possible causes in order to find which are the most important. The observations are assumed to be caused by a linear transformation of lower-dimensional latent factors and added Gaussian noise. ([Source](https://towardsdatascience.com/dimensionality-reduction-101-for-dummies-like-me-abcfb2551794))\n- **t-SNE** ([Homepage](https://lvdmaaten.github.io/tsne/), [Wiki](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding), [CRAN](https://cran.r-project.org/web/packages/tsne/), [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html))\n  - [Visualizing Data using t-SNE](https://lvdmaaten.github.io/publications/papers/JMLR_2008.pdf) (2008) *Laurens van der Maaten, Geoffrey Hinton*\n  - [Accelerating t-SNE using Tree-Based Algorithms](https://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf) (2014) *Laurens van der Maaten*\n  - **Tree-SNE** - Hieararchical t-SNE  ([Code](https://github.com/isaacrob/treesne))\n    - [Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE](https://arxiv.org/pdf/2002.05687) (2020) *Isaac Robinson, Emma Pierce-Hoffman*\n  - **Let-SNE**\n    - [Let-SNE: A Hybrid Approach to Data Embedding and Visualization of Hyperspectral Imagery](https://arxiv.org/pdf/1910.08790.pdf) (2020) *Megh Shukla, Biplab Banerjee, Krishna Mohan Buddhiraju*\n- **LLE** Locally Linear Embedding  \n  \u003e Constructs a k-nearest neighbor graph similar to Isomap. Then it tries to locally represent every data sample x i using a weighted summation of its k-nearest neighbors. ([Source](https://arxiv.org/pdf/1905.02845.pdf))\n- **HLLE** Hessian Eigenmapping  \n  \u003e Projects data to a lower dimension while preserving the local neighborhood like LLE but uses the Hessian operator to better achieve this result and hence the name. ([Source](https://towardsdatascience.com/dimensionality-reduction-for-machine-learning-80a46c2ebb7e))\n- **Laplacian Eigenmap** Spectral Embedding\n- **Maximum Variance Unfolding**\n- **NMF** Non-negative matrix factorization\n- **UMAP** Uniform Manifold Approximation and Projection ([Code](https://github.com/lmcinnes/umap), [GPU version](https://docs.rapids.ai/api/cuml/stable/api.html#umap))\n  - [UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction](https://arxiv.org/pdf/1802.03426) (2018) *Leland McInnes, John Healy, James Melville*\n- **Trimap** ([Code](https://github.com/eamid/trimap), [PyPI](https://pypi.org/project/trimap/))\n  - [Trimap: Large-scale Dimensionality Reduction Using Triplets](https://arxiv.org/pdf/1910.00204.pdf) (2019) *Ehsan Amid, Manfred K. Warmuth*\n- **Autoencoders** ([Wiki](https://en.wikipedia.org/wiki/Autoencoder))\n- **SOM** Self-Organizing Maps or Kohonen Maps ([Wiki](https://en.wikipedia.org/wiki/Self-organizing_map))\n  - [Self-Organized Formation of Topologically Correct Feature Maps](http://www.cnbc.cmu.edu/~tai/nc19journalclubs/Kohonen1982_Article_Self-organizedFormationOfTopol.pdf) (1982) *Teuvo Kohonen*\n- **Sammon’s Mapping**\n- **SDE** Semi-definite embedding\n- **LargeVis**\n  - [Visualizing Large-scale and High-dimensional Data](https://arxiv.org/abs/1602.00370) (2016) *Jian Tang, Jingzhou Liu, Ming Zhang, Qiaozhu Mei*\n\n## Software\n- **R**\n  - dimRed ([CRAN](https://cran.r-project.org/web/packages/dimRed/))\n  - dyndimred ([CRAN](https://cran.r-project.org/web/packages/dyndimred/))\n  - intrinsicDimemsion ([CRAN](https://cran.r-project.org/web/packages/intrinsicDimension/))\n  - Rdimtools ([Paper](https://arxiv.org/pdf/2005.11107.pdf), [CRAN](https://cran.r-project.org/web/packages/Rdimtools/))\n- **Python**\n  - scikit-learn\n  - umap-learn ([Homepage](https://umap-learn.readthedocs.io), [PyPI](https://pypi.org/project/umap-learn/))\n- **Javascript**\n  - tsne ([NPM](https://www.npmjs.com/package/tsne))\n  - umap-js ([NPM](https://www.npmjs.com/package/umap-js))\n  - dimred ([NPM](https://www.npmjs.com/package/dimred))\n- **C++**\n  - tapkee ([Code](https://github.com/lisitsyn/tapkee))\n- **Web**\n  - StatSim ([Vis](https://statsim.com/vis/))\n\n## Related Topics\n- [Feature Selection](https://mlpapers.org/feature-selection/)\n- [Neural Networks](https://mlpapers.org/neural-nets/)\n- [Multiview Learning](https://mlpapers.org/multiview-learning/)\n- [Clustering](https://mlpapers.org/clustering/)\n","projects_url":"https://awesome.ecosyste.ms/api/v1/lists/mlpapers%2Ffeature-extraction/projects"}