{"id":13689179,"url":"https://github.com/borchero/pycave","last_synced_at":"2025-04-12T19:50:46.014Z","repository":{"id":37080778,"uuid":"249200416","full_name":"borchero/pycave","owner":"borchero","description":"Traditional Machine Learning Models for Large-Scale Datasets in PyTorch.","archived":false,"fork":false,"pushed_at":"2024-10-21T20:07:36.000Z","size":699,"stargazers_count":126,"open_issues_count":20,"forks_count":13,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-10-22T14:12:00.696Z","etag":null,"topics":["gaussian-mixture-models","kmeans","machine-learning","markov-model","pytorch","pytorch-lightning"],"latest_commit_sha":null,"homepage":"https://pycave.borchero.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/borchero.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-22T14:26:33.000Z","updated_at":"2024-10-09T12:43:51.000Z","dependencies_parsed_at":"2023-12-25T20:32:04.805Z","dependency_job_id":"8b9182c0-bc8b-4cdf-8036-8556b23dc659","html_url":"https://github.com/borchero/pycave","commit_stats":{"total_commits":136,"total_committers":8,"mean_commits":17.0,"dds":0.6176470588235294,"last_synced_commit":"3d25c64c13f939eb453d613fb150c37ab6d17539"},"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borchero%2Fpycave","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borchero%2Fpycave/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borchero%2Fpycave/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borchero%2Fpycave/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/borchero","download_url":"https://codeload.github.com/borchero/pycave/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248625501,"owners_count":21135513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gaussian-mixture-models","kmeans","machine-learning","markov-model","pytorch","pytorch-lightning"],"created_at":"2024-08-02T15:01:36.953Z","updated_at":"2025-04-12T19:50:45.994Z","avatar_url":"https://github.com/borchero.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# PyCave\n\n![PyPi](https://img.shields.io/pypi/v/pycave?label=version)\n![License](https://img.shields.io/pypi/l/pycave)\n\nPyCave allows you to run traditional machine learning models on CPU, GPU, and even on multiple\nnodes. All models are implemented in [PyTorch](https://pytorch.org/) and provide an `Estimator` API\nthat is fully compatible with [scikit-learn](https://scikit-learn.org/stable/).\n\nFor Gaussian mixture model, PyCave allows for 100x speed ups when using a GPU and enables to train\non markedly larger datasets via mini-batch training. The full suite of benchmarks run to compare\nPyCave models against scikit-learn models is available on the\n[documentation website](https://pycave.borchero.com/sites/benchmark.html).\n\n_PyCave version 3 is a complete rewrite of PyCave which is tested much more rigorously, depends on\nwell-maintained libraries and is tuned for better performance. While you are, thus, highly\nencouraged to upgrade, refer to [pycave-v2.borchero.com](https://pycave-v2.borchero.com) for\ndocumentation on PyCave 2._\n\n## Features\n\n- Support for GPU and multi-node training by implementing models in PyTorch and relying on\n  [PyTorch Lightning](https://www.pytorchlightning.ai/)\n- Mini-batch training for all models such that they can be used on huge datasets\n- Well-structured implementation of models\n\n  - High-level `Estimator` API allows for easy usage such that models feel and behave like in\n    scikit-learn\n  - Medium-level `LightingModule` implements the training algorithm\n  - Low-level PyTorch `Module` manages the model parameters\n\n## Installation\n\nPyCave is available via `pip`:\n\n```bash\npip install pycave\n```\n\nIf you are using [Poetry](https://python-poetry.org/):\n\n```bash\npoetry add pycave\n```\n\n## Usage\n\nIf you've ever used scikit-learn, you'll feel right at home when using PyCave. First, let's create\nsome artificial data to work with:\n\n```python\nimport torch\n\nX = torch.cat([\n    torch.randn(10000, 8) - 5,\n    torch.randn(10000, 8),\n    torch.randn(10000, 8) + 5,\n])\n```\n\nThis dataset consists of three clusters with 8-dimensional datapoints. If you want to fit a K-Means\nmodel, to find the clusters' centroids, it's as easy as:\n\n```python\nfrom pycave.clustering import KMeans\n\nestimator = KMeans(3)\nestimator.fit(X)\n\n# Once the estimator is fitted, it provides various properties. One of them is\n# the `model_` property which yields the PyTorch module with the fitted parameters.\nprint(\"Centroids are:\")\nprint(estimator.model_.centroids)\n```\n\nDue to the high-level estimator API, the usage for all machine learning models is similar. The API\ndocumentation provides more detailed information about parameters that can be passed to estimators\nand which methods are available.\n\n### GPU and Multi-Node training\n\nFor GPU- and multi-node training, PyCave leverages PyTorch Lightning. The hardware that training\nruns on is determined by the\n[Trainer](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.trainer.trainer.html#pytorch_lightning.trainer.trainer.Trainer)\nclass. It's\n[**init**](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.trainer.trainer.html#pytorch_lightning.trainer.trainer.Trainer.__init__)\nmethod provides various configuration options.\n\nIf you want to run K-Means with a GPU, you can pass the options `accelerator='gpu'` and `devices=1`\nto the estimator's initializer:\n\n```python\nestimator = KMeans(3, trainer_params=dict(accelerator='gpu', devices=1))\n```\n\nSimilarly, if you want to train on 4 nodes simultaneously where each node has one GPU available,\nyou can specify this as follows:\n\n```python\nestimator = KMeans(3, trainer_params=dict(num_nodes=4, accelerator='gpu', devices=1))\n```\n\nIn fact, **you do not need to change anything else in your code**.\n\n### Implemented Models\n\nCurrently, PyCave implements three different models:\n\n- [GaussianMixture](https://pycave.borchero.com/sites/generated/bayes/gmm/pycave.bayes.GaussianMixture.html)\n- [MarkovChain](https://pycave.borchero.com/sites/generated/bayes/markov_chain/pycave.bayes.MarkovChain.html)\n- [K-Means](https://pycave.borchero.com/sites/generated/clustering/kmeans/pycave.clustering.KMeans.html)\n\n## License\n\nPyCave is licensed under the [MIT License](https://github.com/borchero/pycave/blob/main/LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fborchero%2Fpycave","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fborchero%2Fpycave","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fborchero%2Fpycave/lists"}