Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/viodotcom/ppca_rs
Python+Rust implementation of the Probabilistic Principal Component Analysis model
https://github.com/viodotcom/ppca_rs
data-science dimensionality-reduction em-algorithm linear-algebra machine-learning machine-learning-algorithms maximum-likelihood maximum-likelihood-estimation missing-data missing-values pca pca-analysis python rust
Last synced: 3 months ago
JSON representation
Python+Rust implementation of the Probabilistic Principal Component Analysis model
- Host: GitHub
- URL: https://github.com/viodotcom/ppca_rs
- Owner: viodotcom
- License: mit
- Created: 2022-11-24T17:47:17.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2024-08-27T15:06:33.000Z (5 months ago)
- Last Synced: 2024-10-14T02:35:32.239Z (4 months ago)
- Topics: data-science, dimensionality-reduction, em-algorithm, linear-algebra, machine-learning, machine-learning-algorithms, maximum-likelihood, maximum-likelihood-estimation, missing-data, missing-values, pca, pca-analysis, python, rust
- Language: Rust
- Homepage:
- Size: 316 KB
- Stars: 34
- Watchers: 5
- Forks: 2
- Open Issues: 5
-
Metadata Files:
- Readme: readme.md
- License: license
Awesome Lists containing this project
README
# _Probabilistic_ Principal Component Analysis (PPCA) model
[![PyPI version](https://badge.fury.io/py/ppca-rs.svg)](https://badge.fury.io/py/ppca-rs)
[![Crates.io version](https://img.shields.io/crates/v/ppca)](https://crates.io/crates/ppca)
[![Docs.rs version](https://img.shields.io/docsrs/ppca)](https://docs.rs/ppca)This project implements a PPCA model implemented in Rust for Python using `pyO3` and `maturin`.
## Installing
This package is available in PyPI!
```bash
pip install ppca-rs
```And you can also use it natively in Rust:
```bash
cargo add ppca
```## Why use PPCA?
Glad you asked!
* The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
* The PPCA is a _proper statistical model_. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.
* The PPCA model can handle _missing values_. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.
* The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.## Why use `ppca-rs`?
That's an easy one!
* It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
* It uses `rayon` to paralellize computations evenly across as many CPUs as you have.
* It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
* Battle-tested at Vio.com with some ridiculously huge datasets.## Quick example
```python
import numpy as np
from ppca_rs import Dataset, PPCATrainer, PPCAsamples: np.ndarray
# Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)# And now, here is a free sample of what you can do:
# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()```
## Juicy extras!
* Tired of the linear? We have support for PPCA mixture models. Make the most of your data with clustering and dimensionality reduction in a single tool!
* Support for adaptation of DataFrames using either `pandas` or `polars`. Never juggle those `df`s in your code again.## Building from soure
### Prerequisites
You will need [Rust](https://rust-lang.org/), which can be installed locally (i.e., without `sudo`) and you will also need `maturin`, which can be installed by
```bash
pip install maturin
```
`pipenv` is also a good idea if you are going to mess around with it locally. At least, you need a `venv` set, otherwise, `maturin` will complain with you.### Installing it locally
Check the `Makefile` for the available commands (or just type `make`). To install it locally, do
```bash
make install # optional: i=python.version (e.g, `i=3.9`)
```### Messing around and testing
To mess around, _inside a virtual environment_ (a `Pipfile` is provided for the `pipenv` lovers), do
```bash
maturin develop # use the flag --release to unlock superspeed!
```
This will install the package locally _as is_ from source.## How do I use this stuff?
See the examples in the `examples` folder. Also, all functions are type hinted and commented. If you are using `pylance` or `mypy`, it should be easy to navigate.
## Is it faster than the pure Python implemetation you made?
You bet!