https://github.com/thefloatingstring/additional-cpca-experiments

Last synced: 11 months ago
JSON representation

Host: GitHub
URL: https://github.com/thefloatingstring/additional-cpca-experiments
Owner: TheFloatingString
Created: 2023-11-07T21:35:32.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-04-03T20:40:55.000Z (almost 2 years ago)
Last Synced: 2025-01-13T08:12:43.823Z (about 1 year ago)
Language: Jupyter Notebook
Size: 1.62 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Additional cPCA Experiments

## Claims that We're Trying to Make

1. cPCA-preprocessed data yields better model performance downstream, compared to PCA
2. cPCA-preprocessed data yields better model performance downstream compared to no preprocessing
3. We idenify which types of backgrounds are most effective for cPCA
4. We identify the cPCA parameters which are optimal (for both alpha and number of dimensions)

## Motivation

1. Datasets with high-dimensionality are expensive to run models on
2. cPCA provides better target label separation relative to PCA or no preprocessing

## Limitations

1. cPCA-compressed data is difficult to explain

## Set of Experiments

### Evaluating Numerical Datasets

Metrics: f1, precision and recall

+ Mouse gene expression dataset
+ Beans dataset

### Evaluating Natural Language Datasets

Metrics: f1, precision and recall

+ Sentiment analysis (sst)

### Evaluating Image Datasets

Metrics: f1, precision and recall

+ CIFAR-10 classification

### Evaluating Effective Backgrounds

+ Ablation study of having unlabelled beans
+ Comparing different backgrounds

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thefloatingstring/additional-cpca-experiments

Awesome Lists containing this project

README