https://github.com/thefloatingstring/additional-cpca-experiments
https://github.com/thefloatingstring/additional-cpca-experiments
Last synced: 11 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/thefloatingstring/additional-cpca-experiments
- Owner: TheFloatingString
- Created: 2023-11-07T21:35:32.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-03T20:40:55.000Z (almost 2 years ago)
- Last Synced: 2025-01-13T08:12:43.823Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 1.62 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Additional cPCA Experiments
## Claims that We're Trying to Make
1. cPCA-preprocessed data yields better model performance downstream, compared to PCA
2. cPCA-preprocessed data yields better model performance downstream compared to no preprocessing
3. We idenify which types of backgrounds are most effective for cPCA
4. We identify the cPCA parameters which are optimal (for both alpha and number of dimensions)
## Motivation
1. Datasets with high-dimensionality are expensive to run models on
2. cPCA provides better target label separation relative to PCA or no preprocessing
## Limitations
1. cPCA-compressed data is difficult to explain
## Set of Experiments
### Evaluating Numerical Datasets
Metrics: f1, precision and recall
+ Mouse gene expression dataset
+ Beans dataset
### Evaluating Natural Language Datasets
Metrics: f1, precision and recall
+ Sentiment analysis (sst)
### Evaluating Image Datasets
Metrics: f1, precision and recall
+ CIFAR-10 classification
### Evaluating Effective Backgrounds
+ Ablation study of having unlabelled beans
+ Comparing different backgrounds