{"id":18305060,"url":"https://github.com/mdbloice/pyrea","last_synced_at":"2026-03-04T17:31:20.768Z","repository":{"id":65359771,"uuid":"523679097","full_name":"mdbloice/Pyrea","owner":"mdbloice","description":"Multi-view clustering with flexible ensemble structures.","archived":false,"fork":false,"pushed_at":"2025-01-29T12:50:43.000Z","size":236,"stargazers_count":24,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-16T07:15:38.765Z","etag":null,"topics":["clustering","data-fusion","enembles","multi-view"],"latest_commit_sha":null,"homepage":"https://pyrea.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mdbloice.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-11T10:25:42.000Z","updated_at":"2025-02-03T09:24:08.000Z","dependencies_parsed_at":"2024-11-05T15:51:16.621Z","dependency_job_id":null,"html_url":"https://github.com/mdbloice/Pyrea","commit_stats":{"total_commits":178,"total_committers":2,"mean_commits":89.0,"dds":0.0449438202247191,"last_synced_commit":"b7897e5338f64fc3ca9c782d6d13e228b8282abb"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdbloice%2FPyrea","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdbloice%2FPyrea/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdbloice%2FPyrea/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdbloice%2FPyrea/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mdbloice","download_url":"https://codeload.github.com/mdbloice/Pyrea/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247366689,"owners_count":20927565,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","data-fusion","enembles","multi-view"],"created_at":"2024-11-05T15:32:34.104Z","updated_at":"2026-03-04T17:31:20.728Z","avatar_url":"https://github.com/mdbloice.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pyrea\n\n\u003cp align=\"center\"\u003e\n     \u003cimg src=\"https://raw.githubusercontent.com/mdbloice/AugmentorFiles/master/Pyrea/Pyrea-logos_transparent.png\" width=\"400\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n     Multi-view clustering with flexible ensemble structures.\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n     \u003ca href=\"https://pypi.org/project/pyrea/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/Pyrea\"\u003e\u003c/a\u003e\n     \u003ca href=\"https://github.com/mdbloice/Pyrea/blob/master/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/pypi/l/Pyrea\"\u003e\u003c/a\u003e\n     \u003ca href=\"https://github.com/mdbloice/Pyrea/actions/workflows/main.yml\"\u003e\u003cimg src=\"https://img.shields.io/pypi/pyversions/Pyrea\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n     \u003cem\u003eThe name Pyrea is derived from the Greek word Parea, meaning a group of friends who gather to share experiences, values, and ideas.\u003c/em\u003e\n\u003c/p\u003e\n\n---\n\n## Installation\n\nInstall Pyrea using `pip`:\n\n```bash\npip install pyrea\n```\n\nThis will install the latest version of Pyrea from PyPI.\n\n## Demonstration Notebooks\nA demonstration of Pyrea's usage on the Nutrimouse dataset can be found in the following Jupyter notebook:\n\n- [Nutrimouse.ipynb](https://github.com/mdbloice/Pyrea/blob/master/notebooks/Nutrimouse.ipynb)\n\nIn this notebook, hierarchical and spectral clustering are performed on the Nutrimouse multi-view dataset, tuned using Pyrea's genetic algorithm functionality.\n\nMore notebooks will be added in due course.\n\n## Usage\nThe Pyrea software package is the accompanying software for our paper:\n\nPfeifer, B., Bloice, M.D., \u0026 Schimek, M.G. (2023). *Parea: multi-view ensemble clustering for cancer subtype discovery*. **Journal of Biomedical Informatics**. \u003chttps://doi.org/10.1016/j.jbi.2023.104406\u003e\n\nWhile Pyrea allows for flexible and custom architectures to be built, two structures are discussed specifically in the paper cited above, namely Parea 1 and Parea 2.\n\nBoth the structures, which are described in detail below as well as in the paper mentioned above, can be quickly generated and applied to your data using two helper functions, `parea_1()` and `parea_2()`, and can be quickly run as follows:\n\n```python\nimport pyrea\nimport numpy as np\n\n# Create sample data:\nd1 = np.random.rand(100,10)\nd2 = np.random.rand(100,10)\nd3 = np.random.rand(100,10)\n\ndata = [d1,d2, d3]\n\nlabels = pyrea.parea_2(data)\n```\n\nwhich executes Parea 2.\n\nDefault parameters are used which match those used in our experiments discussed in the paper referenced above. These default parameters can of course be overridden. As there are many combinations of parameters that could be used, a genetic algorithm can be utilised to find the optimum parameters, as shown in the next section.\n\n### Genetic Algorithm\n\nThe Parea 1 and Parea 2 structures can be optimised using a genetic algorithm in order to find the best combinations of clustering methods, fusion methods, and number of clusters.\n\nFor example, to find optimal parameters for Parea 2:\n\n```python\nimport pyrea\nfrom sklearn import datasets\n\nd1 = datasets.load_iris().data\nd2 = datasets.load_iris().data\nd3 = datasets.load_iris().data\n\ndata = [d1,d2,d3]\n\nparams = pyrea.parea_2_genetic(data, k_min=2, k_max=5)\n```\n\nwhere `k_min` and `k_max` refer to the minimum and maximum number of clusters to attempt for each layer, respectively.\n\nNote that `params` contains the optimal parameters found by the genetic algorithm. To get the labels, run `parea_2()` passing your data and these optimal parameters:\n\n```python\npyrea.parea_2(data, *params)\n```\n\nwhich will return the cluster labels for your data.\n\nAlso, you may choose to define the **final** number of clusters returned by the algorithm (but allowing it to optimise intermediate numbers of clusters) by defining `k_final`, e.g:\n\n```python\nparams = pyrea.parea_2_genetic(data, k_min=2, k_max=5, k_final=3)\n```\n\nand calling `pyrea_2()` as follows:\n\n```python\npyrea.parea_2(data, params, k_final=3)\n```\n\n#### Genetic Algorithm Update\n\nThe genetic algorithm functions now support arbitrary numbers of views, and the population and number of generations can now be adjusted. See [this notebook](https://github.com/mdbloice/Pyrea/blob/master/notebooks/Nutrimouse.ipynb) for a demonstration of this usage on the Nutrimouse dataset.\n\n### API\n\n**Please note that Pyrea is work in progress. The API may change from version\nto version and introduce breaking changes.**\n\nIn Pyrea, your data are organised in to views. A view consists of the data in\nthe form of a 2D matrix, and an associated clustering algorithm (a *clusterer*).\n\nTo create a view you must have some data, and a clusterer:\n\n```python\nimport pyrea\n\n# Create your data, which must be a 2-dimensional array/matrix.\nd = [[1,2,3],\n     [4,5,6],\n     [7,8,9]]\n\n# Create a clusterer\nc = pyrea.clusterer(\"hierarchical\", n_clusters=2, method='ward')\n\nv = pyrea.view(d, c)\n```\n\nYou now have a view `v`, containing the data `d` using the clustering algorithm\n`c`. Note that many views can share the same clusterer, or each view may have a\nunique clusterer.\n\nTo obtain the cluster solution the specified view can be executed\n\n```python\nv.execute()\n```\n\nThe clustering algorithm can be either 'spectral', 'hierarchical', 'dbscan', or 'optics'. See the documentation for a complete list of parameters that can be passed when creating a clusterer.\n\nAs this is a library for multi-view ensemble learning, you will normally have\nmultiple views.\n\nA fusion algorithm is therefore used to fuse the clusterings created from\nmultiple views. Therefore, our next step is to create a *fuser* object:\n\n```python\nf = pyrea.fuser('disagreement')\n```\n\nWith you fusion algorithm `f`, you can execute an *ensemble*. The ensemble is created with a set of views and a fusion algorithm,\nand its returned object (distance or affinity matrix) can again be specified as a view:\n\n```python\n# Create a new clusterer with precomputed=True\nc_pre = pyrea.clusterer(\"hierarchical\", n_clusters=2, method='ward', precomputed=True)\nv_res = pyrea.view(pyrea.execute_ensemble([v1, v2, v3], f), c_pre)\n```\n\nThis newly created view, `v_res` can subsequently be fed into another ensemble,\nallowing you to create stacked ensemble architectures, with high flexibility.\n\nA full example is shown below, using random data:\n\n```python\nimport pyrea\nimport numpy as np\n\n# Create two datasets with random values of 1000 samples and 100 features per sample.\nd1 = np.random.rand(1000,100)\nd2 = np.random.rand(1000,100)\n\n# Define the clustering algorithm(s) you want to use. In this case we used the same\n# algorithm for both views. By default n_clusters=2.\nc = pyrea.clusterer('hierarchical', n_clusters=2, method='ward')\n\n# Create the views using the data and the same clusterer\nv1 = pyrea.view(d1, c)\nv2 = pyrea.view(d2, c)\n\n# Create a fusion object\nf = pyrea.fuser('disagreement')\n\n# Specify a clustering algorithm (precomputed = True)\nc_pre = pyrea.clusterer(\"hierarchical\", n_clusters=2, method='ward', precomputed=True)\n# Execute an ensemble based on your views and a fusion algorithm\nv_res = pyrea.view(pyrea.execute_ensemble([v1, v2], f), c_pre)\n\n# The cluster solution can be obtained as follows\nv_res.execute()\n```\n\n## Ensemble Structures\nComplex structures can be built using Pyrea.\n\nFor example, examine the two structures below:\n\n![Ensemble Structures](https://raw.githubusercontent.com/mdbloice/AugmentorFiles/master/Pyrea/parea.png)\n\nWe will demonstrate how to create deep and flexible ensemble structures using\nthe examples a) and b) from the image above.\n\n### Example A\nThis ensemble consists of two sets of three views, which are clustered, fused,\nand then once again combined in a second layer.\n\nWe create two ensembles, which represent the first layer of structure a) in\nthe image above:\n\n```python\nimport pyrea\nimport numpy as np\n\n# Clusterers:\nhc1 = pyrea.clusterer('hierarchical', method='ward', n_clusters=2)\nhc2 = pyrea.clusterer('hierarchical', method='complete', n_clusters=2)\n\n# Fusion algorithm:\nf = pyrea.fuser('disagreement')\n\n# Create three random datasets\nd1 = np.random.rand(100,10)\nd2 = np.random.rand(100,10)\nd3 = np.random.rand(100,10)\n\n# Views for ensemble 1\nv1 = pyrea.view(d1, hc1)\nv2 = pyrea.view(d2, hc1)\nv3 = pyrea.view(d3, hc1)\n\n# Execute ensemble 1 and retrieve a new view, which is used later.\nhc1_pre = pyrea.clusterer('hierarchical', method='ward', n_clusters=2, precomputed=True)\nv_ensemble_1 = pyrea.view(pyrea.execute_ensemble([v1, v2, v3], f), hc1_pre)\n\n# Views for ensemble 2\nv4 = pyrea.view(d1, hc2)\nv5 = pyrea.view(d2, hc2)\nv6 = pyrea.view(d3, hc2)\n\n# Execute our second ensemble, and retreive a new view:\nhc2_pre = pyrea.clusterer('hierarchical', method='complete', n_clusters=2, precomputed=True)\nv_ensemble_2 = pyrea.view(pyrea.execute_ensemble([v4, v5, v6], f), hc2_pre)\n\n# Now we can execute a further ensemble, using the views generated from the\n# two previous ensemble methods:\nd_fuse  = pyrea.execute_ensemble([v_ensemble_1, v_ensemble_2], f)\n\n# The returned distance matrix is now used as an input for the two clustering methods (hc1 and hc2)\nv1_fuse = pyrea.view(d_fuse, hc1_pre)\nv2_fuse = pyrea.view(d_fuse, hc2_pre)\n\n# and the cluster solutions are combined\npyrea.consensus([v1_fuse.execute(), v2_fuse.execute()])\n```\n\n#### Helper Function\nSee the `parea_1()` helper function for a pre-built version of structure above.\n\n### Example B\nAs for structure b) in the image above, this can implemented as follows:\n\n```python\nimport pyrea\nimport numpy as np\n\n# Clustering algorithms\nc1 = pyrea.clusterer('hierarchical', method='ward', n_clusters=2)\nc2 = pyrea.clusterer('hierarchical', method='complete', n_clusters=2)\nc3 = pyrea.clusterer('hierarchical', method='single', n_clusters=2)\n\n# Clustering algorithms (so it works with a precomputed distance matrix)\nc1_pre = pyrea.clusterer('hierarchical', method='ward', n_clusters=2, precomputed=True)\nc2_pre = pyrea.clusterer('hierarchical', method='complete', n_clusters=2, precomputed=True)\nc3_pre = pyrea.clusterer('hierarchical', method='single', n_clusters=2, precomputed=True)\n\n# Fusion algorithm\nf = pyrea.fuser('disagreement')\n\n# Create the views with the random data directly:\nv1 = pyrea.view(np.random.rand(100,10), c1)\nv2 = pyrea.view(np.random.rand(100,10), c2)\nv3 = pyrea.view(np.random.rand(100,10), c3)\n\n# Create the ensemble and define new views based on the returned disagreement matrix v_res\nv_res  = pyrea.execute_ensemble([v1, v2, v3], f)\nv1_res = pyrea.view(v_res, c1_pre)\nv2_res = pyrea.view(v_res, c2_pre)\nv3_res = pyrea.view(v_res, c3_pre)\n\n# Get the final cluster solution\npyrea.consensus([v1_res.execute(), v2_res.execute(), v3_res.execute()])\n```\n\n#### Helper Function\nSee the `parea_2()` helper function for a pre-built version of structure above.\n\n## Extensible\nPyrea has been designed to be extensible. It allows you to use Pyrea's data fusion techniques with custom clustering algorithms that can be loaded in to Pyrea at run-time.\n\nBy providing a `View` with a `ClusterMethod` object, it makes providing custom clustering algorithms uncomplicated. See [`Extending Pyrea`](https://pyrea.readthedocs.io/en/latest/extending.html) for details.\n\n# Work In Progress and Future Work\nSeveral features are currently work in progress, future updates will include\nthe features described in the sections below.\n\n## HCfused Clustering Algorithm\nA novel fusion technique, developed by one of the authors of this software\npackage, named HCfused, will be included soon in a future update.\n\n## General Genetic Optimisation\nThe package will be extended to allow for any custom Pyrea structures to be optimised using a genetic algorithm.\n\n# Compilation of HC Fused C++ Code\nTo use the HC Fused method you may need to compile the source code yourself if binaries are not available for your operating system. HC Fused has been implemented in C++, see the `HC_fused_cpp_opt6.cpp` source file for more details.\n\nPre-compiled binaries are available for Linux and have been tested using Linux only. The instructions below pertain to Linux only. For Windows please consult \u003chttps://docs.python.org/3.5/library/ctypes.html#loading-shared-libraries\u003e and use a compiler such as MSVC or MinGW.\n\nTo compile HC Fused (and then create a shared library/dynamic library) execute the following on the command line:\n\n```bash\n$ clang++ -c -fPIC HC_fused_cpp_opt6.cpp -o HC_fused_cpp_opt6.o\n```\n\nand then create the `.so` file shared library file:\n\n```bash\n$ clang++ HC_fused_cpp_opt6.o -shared -o libhcfused.so\n```\n\nand finally place the `libhcfused.so` file in the root directory of the package's installation directory.\n\n# Tests\nInstallation is tested using Python versions 3.8, 3.9, 3.10, and 3.11 on Ubuntu 20.04 LTS only. See the project's Actions for details. The package should also work using Python 3.6 and 3.7 on other operating systems, however.\n\n# Miscellaneous\nLogo made by Adobe Express Logo Maker: \u003chttps://www.adobe.com/express/create/logo\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdbloice%2Fpyrea","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmdbloice%2Fpyrea","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdbloice%2Fpyrea/lists"}