{"id":20324843,"url":"https://github.com/viodotcom/ppca_rs","last_synced_at":"2025-04-11T19:42:19.643Z","repository":{"id":64516233,"uuid":"570253611","full_name":"viodotcom/ppca_rs","owner":"viodotcom","description":"Python+Rust implementation of the Probabilistic Principal Component Analysis model","archived":false,"fork":false,"pushed_at":"2024-08-27T15:06:33.000Z","size":324,"stargazers_count":35,"open_issues_count":5,"forks_count":2,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-10T12:07:24.574Z","etag":null,"topics":["data-science","dimensionality-reduction","em-algorithm","linear-algebra","machine-learning","machine-learning-algorithms","maximum-likelihood","maximum-likelihood-estimation","missing-data","missing-values","pca","pca-analysis","python","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/viodotcom.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-24T17:47:17.000Z","updated_at":"2024-12-30T22:27:31.000Z","dependencies_parsed_at":"2024-06-12T16:08:18.613Z","dependency_job_id":"f5a6d3a9-4f27-44e1-9537-06fa9c2b9076","html_url":"https://github.com/viodotcom/ppca_rs","commit_stats":{"total_commits":89,"total_committers":1,"mean_commits":89.0,"dds":0.0,"last_synced_commit":"3b0977bf6c0e3eccd9b3da09ecbefd377ca6a78f"},"previous_names":["findhotel/ppca_rs"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viodotcom%2Fppca_rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viodotcom%2Fppca_rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viodotcom%2Fppca_rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viodotcom%2Fppca_rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/viodotcom","download_url":"https://codeload.github.com/viodotcom/ppca_rs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248468516,"owners_count":21108833,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","dimensionality-reduction","em-algorithm","linear-algebra","machine-learning","machine-learning-algorithms","maximum-likelihood","maximum-likelihood-estimation","missing-data","missing-values","pca","pca-analysis","python","rust"],"created_at":"2024-11-14T19:37:52.417Z","updated_at":"2025-04-11T19:42:19.621Z","avatar_url":"https://github.com/viodotcom.png","language":"Rust","readme":"# _Probabilistic_ Principal Component Analysis (PPCA) model\n\n[![PyPI version](https://badge.fury.io/py/ppca-rs.svg)](https://badge.fury.io/py/ppca-rs)\n[![Crates.io version](https://img.shields.io/crates/v/ppca)](https://crates.io/crates/ppca)\n[![Docs.rs version](https://img.shields.io/docsrs/ppca)](https://docs.rs/ppca)\n\nThis project implements a PPCA model implemented in Rust for Python using `pyO3` and `maturin`.\n\n## Installing\n\nThis package is available in PyPI!\n```bash\npip install ppca-rs\n```\n\nAnd you can also use it natively in Rust:\n```bash\ncargo add ppca\n```\n\n## Why use PPCA?\n\nGlad you asked!\n\n* The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.\n* The PPCA is a _proper statistical model_. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.\n* The PPCA model can handle _missing values_. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.\n* The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.\n\n## Why use `ppca-rs`?\n\nThat's an easy one!\n\n* It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.\n* It uses `rayon` to paralellize computations evenly across as many CPUs as you have.\n* It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks. \n* Battle-tested at Vio.com with some ridiculously huge datasets.\n\n\n## Quick example\n\n```python\nimport numpy as np\nfrom ppca_rs import Dataset, PPCATrainer, PPCA\n\nsamples: np.ndarray\n\n# Create your dataset from a rank 2 np.ndarray, where each line is a sample.\n# Use non-finite values (`inf`s and `nan`) to signal masked values\ndataset = Dataset(samples)\n\n# Train the model (convenient edition!):\nmodel: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)\n\n\n# And now, here is a free sample of what you can do:\n\n# Extrapolates the missing values with the most probable values:\nextrapolated: Dataset = model.extrapolate(dataset)\n\n# Smooths (removes noise from) samples and fills in missing values:\nextrapolated: Dataset = model.filter_extrapolate(dataset)\n\n# ... go back to numpy:\neextrapolated_np = extrapolated.numpy()\n\n```\n\n## Juicy extras!\n\n* Tired of the linear? We have support for PPCA mixture models. Make the most of your data with clustering and dimensionality reduction in a single tool!\n* Support for adaptation of DataFrames using either `pandas` or `polars`. Never juggle those `df`s in your code again.\n\n\n## Building from soure\n\n### Prerequisites\n\nYou will need [Rust](https://rust-lang.org/), which can be installed locally (i.e., without `sudo`) and you will also need `maturin`, which can be installed by \n```bash\npip install maturin\n```\n`pipenv` is also a good idea if you are going to mess around with it locally. At least, you need a `venv` set, otherwise, `maturin` will complain with you.\n\n### Installing it locally\n\nCheck the `Makefile` for the available commands (or just type `make`). To install it locally, do\n```bash\nmake install    # optional: i=python.version (e.g, `i=3.9`)\n```\n\n### Messing around and testing\n\nTo mess around, _inside a virtual environment_ (a `Pipfile` is provided for the `pipenv` lovers), do\n```bash\nmaturin develop  # use the flag --release to unlock superspeed!\n```\nThis will install the package locally _as is_ from source.\n\n## How do I use this stuff?\n\nSee the examples in the `examples` folder. Also, all functions are type hinted and commented. If you are using `pylance` or `mypy`, it should be easy to navigate.\n\n## Is it faster than the pure Python implemetation you made?\n\nYou bet!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviodotcom%2Fppca_rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fviodotcom%2Fppca_rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviodotcom%2Fppca_rs/lists"}