https://github.com/ndgigliotti/torch-ipca

GPU-accelerated Incremental PCA for PyTorch
https://github.com/ndgigliotti/torch-ipca

cuda dimensionality-reduction gpu incremental-pca machine-learning pca pytorch

Last synced: 5 months ago
JSON representation

GPU-accelerated Incremental PCA for PyTorch

Host: GitHub
URL: https://github.com/ndgigliotti/torch-ipca
Owner: ndgigliotti
License: apache-2.0
Created: 2026-01-23T03:47:08.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-01-23T20:16:58.000Z (5 months ago)
Last Synced: 2026-01-23T22:21:40.496Z (5 months ago)
Topics: cuda, dimensionality-reduction, gpu, incremental-pca, machine-learning, pca, pytorch
Language: Python
Size: 19.5 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Notice: NOTICE

Awesome Lists containing this project

README

          # torch-ipca

GPU-accelerated Incremental PCA for PyTorch.

A PyTorch implementation of Incremental PCA adapted from scikit-learn, with full GPU support for fitting and transforming large datasets that don't fit in memory.

## Features

- **GPU-accelerated**: All operations run on CUDA when available

- **Incremental fitting**: Process data in batches with constant memory complexity

- **scikit-learn compatible API**: Drop-in replacement for `sklearn.decomposition.IncrementalPCA`

- **Save/load support**: Persist fitted models with `save()` and `load()`

## Installation

```bash

pip install git+https://github.com/ndgigliotti/torch-ipca.git

```

## Usage

```python

import torch

from torch_ipca import IncrementalPCA

# Create some data

X = torch.randn(10000, 768, device="cuda")

# Fit incrementally

ipca = IncrementalPCA(n_components=128, device="cuda")

for batch in X.split(1000):

    ipca.partial_fit(batch)

# Transform

X_reduced = ipca.transform(X)  # Shape: (10000, 128)

```

### Full fit

```python

ipca = IncrementalPCA(n_components=128, device="cuda")

X_reduced = ipca.fit_transform(X)

```

### Save and load

```python

# Save fitted model

ipca.save("pca_model.pt")

# Load later

ipca = IncrementalPCA.load("pca_model.pt", device="cuda")

X_reduced = ipca.transform(new_data)

```

## API

### `IncrementalPCA(n_components=None, whiten=False, device="cuda")`

**Parameters:**

- `n_components`: Number of components to keep. If None, keeps `min(n_samples, n_features)`.

- `whiten`: If True, whitens the output to have unit variance.

- `device`: PyTorch device ("cuda" or "cpu").

**Methods:**

- `fit(X)`: Fit the model with X using minibatches.

- `partial_fit(X)`: Incremental fit on a batch X.

- `transform(X)`: Apply dimensionality reduction to X.

- `inverse_transform(X)`: Transform reduced data back to original space.

- `fit_transform(X)`: Fit and transform in one call.

- `save(path)`: Save fitted model to file.

- `load(path, device)`: Load fitted model from file (classmethod).

**Attributes (after fitting):**

- `components_`: Principal axes (n_components, n_features).

- `explained_variance_`: Variance explained by each component.

- `explained_variance_ratio_`: Percentage of variance explained.

- `mean_`: Per-feature mean.

- `n_samples_seen_`: Number of samples processed.

## When to Use

- **Large datasets**: When data doesn't fit in GPU memory, use `partial_fit()` to process in batches.

- **Streaming data**: Continuously update PCA as new data arrives.

- **Non-MRL models**: For embedding models without Matryoshka training, PCA provides dimension reduction.

For models trained with Matryoshka Representation Learning (nomic-embed, jina-v3, OpenAI v3), simple truncation is preferred over PCA.

## License

Apache 2.0. Portions derived from scikit-learn (BSD 3-Clause License).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ndgigliotti/torch-ipca

Awesome Lists containing this project

README