https://github.com/khyatimahendru/dimensionality-reduction
This is an implementation of 3 dimensionality reduction techniques - PCA, SVD, and tSNE for visualization of high dimensional data in 2D and 3D.
https://github.com/khyatimahendru/dimensionality-reduction
data-visualization dimensionality-reduction mnist mnist-classification pca svd tsne tsne-algorithm
Last synced: 7 months ago
JSON representation
This is an implementation of 3 dimensionality reduction techniques - PCA, SVD, and tSNE for visualization of high dimensional data in 2D and 3D.
- Host: GitHub
- URL: https://github.com/khyatimahendru/dimensionality-reduction
- Owner: KhyatiMahendru
- Created: 2019-07-27T05:58:27.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-07-14T15:06:09.000Z (over 5 years ago)
- Last Synced: 2024-11-18T06:27:39.946Z (11 months ago)
- Topics: data-visualization, dimensionality-reduction, mnist, mnist-classification, pca, svd, tsne, tsne-algorithm
- Language: Jupyter Notebook
- Size: 5.79 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Dimensionality-Reduction
This is an implementation of 3 dimensionality reduction techniques - PCA, SVD, and tSNE for visualization of high dimensional data in 2D and 3D.
In this notebook, I have used 3 methods of Dimensionality Reduction:
- Principal Component Analysis
- Singular Value Decomposition
- t-distributed Stochastic Neighbor Embedding
I have used these techniques on two datasets:
- The Digits Dataset from sklearn
- MNIST Dataset - The "Hello World" of Computer Vision
t-SNE is an O(n^2) algorithm and the MNIST dataset was too large for it. I randomly sampled 10000 images from MNIST for visualization both in 2D and 3D.
## Performance Table
| Algorithm | Digits Dataset | MNIST Dataset (10000 images) |
| ------------- |:--------------|:-----|
| PCA - 2 Components | 21.7 ms | 428 ms |
| SVD - 2 Components | 30.2 ms | 392 ms |
| tSNE - 2 Components | 10.4 s | 4 min 8 s |
| PCA - 3 Components | 29.9 ms | 457 ms |
| SVD - 3 Components | 26.8 ms | 336 ms |
| tSNE - 3 Components | 43.4 s | 12 min 35 s |
# Visualizations
## The Digits Dataset
### In 2 Dimensions



### In 3 Dimensions



## The MNIST Dataset (10000 images)
### In 2 Dimensions



### In 3 Dimensions



# Conclusion
Although tSNE was slow, it lead to the most impressive and clearly clustered visualizations in both 2D and 3D.
To reduce the time for tSNE, PCA or SVD may be used as a primary step.