{"id":15066238,"url":"https://github.com/gentaiscool/matrix_fact","last_synced_at":"2025-04-10T13:42:49.189Z","repository":{"id":48978128,"uuid":"382362992","full_name":"gentaiscool/matrix_fact","owner":"gentaiscool","description":"Matrix Factorization Library","archived":false,"fork":false,"pushed_at":"2023-12-18T20:12:29.000Z","size":178,"stargazers_count":9,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-24T12:21:45.047Z","etag":null,"topics":["convex","matrix","matrix-factorization","python","pytorch","svd"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gentaiscool.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-02T13:58:18.000Z","updated_at":"2024-10-30T01:14:55.000Z","dependencies_parsed_at":"2025-02-17T10:41:32.934Z","dependency_job_id":null,"html_url":"https://github.com/gentaiscool/matrix_fact","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gentaiscool%2Fmatrix_fact","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gentaiscool%2Fmatrix_fact/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gentaiscool%2Fmatrix_fact/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gentaiscool%2Fmatrix_fact/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gentaiscool","download_url":"https://codeload.github.com/gentaiscool/matrix_fact/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248226260,"owners_count":21068170,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["convex","matrix","matrix-factorization","python","pytorch","svd"],"created_at":"2024-09-25T01:04:05.994Z","updated_at":"2025-04-10T13:42:49.165Z","avatar_url":"https://github.com/gentaiscool.png","language":"Python","readme":"## MatrixFact\n\n### Install\n```\npip install matrix-fact\n```\n\n### What is matrix-fact?\nmatrix-fact contains modules for constrained/unconstrained matrix factorization (and related) methods for both sparse and dense matrices. The repository can be found at https://github.com/gentaiscool/matrix_fact. The code is based on https://github.com/ChrisSchinnerl/pymf3 and https://github.com/rikkhill/pymf. We updated the code to support the latest library. It requires cvxopt, numpy, scipy and torch. We just added the support for PyTorch-based SNMF.\n\n### Packages\n\nThe package includes:\n* Non-negative matrix factorization (NMF) [three different optimizations used]\n* Convex non-negative matrix factorization (CNMF)\n* Semi non-negative matrix factorization (SNMF)\n* Archetypal analysis (AA)\n* Simplex volume maximization (SiVM) [and SiVM for CUR, GSAT, ... ]\n* Convex-hull non-negative matrix factorization (CHNMF)\n* Binary matrix factorization (BNMF)\n* Singular value decomposition (SVD)\n* Principal component analysis (PCA)\n* K-means clustering (Kmeans)\n* C-means clustering (Cmeans)\n* CUR decomposition (CUR)\n* Compaxt matrix decomposition (CMD)\n* PyTorch SNMF\n\n### Usage\nGiven a dataset, most factorization methods try to minimize the Frobenius norm \u003ccode\u003e|data - W*H|\u003c/code\u003e by finding a suitable set of basis vectors \u003ccode\u003eW\u003c/code\u003e and coefficients H. The syntax for calling the various methods is quite similar. Usually, one has to submit a desired number of basis vectors and the maximum number of iterations. For example, applying NMF to a dataset data aiming at 2 basis vectors within 10 iterations works as follows:\n\n```python\n\u003e\u003e\u003e import matrix_fact\n\u003e\u003e\u003e import numpy as np\n\u003e\u003e\u003e data = np.array([[1.0, 0.0, 2.0], [0.0, 1.0, 1.0]])\n\u003e\u003e\u003e nmf_mdl = matrix_fact.NMF(data, num_bases=2, niter=10)\n\u003e\u003e\u003e nmf_mdl.initialization()\n\u003e\u003e\u003e nmf_mdl.factorize()\n```\n\nThe basis vectors are now stored in \u003ccode\u003enmf_mdl.W\u003c/code\u003e, the coefficients in \u003ccode\u003enmf_mdl.H\u003c/code\u003e. To compute coefficients for an existing set of basis vectors simply copy W to nmf_mdl.W, and set compW to False:\n\n```python\n\u003e\u003e\u003e data = np.array([[1.5], [1.2]])\n\u003e\u003e\u003e W = np.array([[1.0, 0.0], [0.0, 1.0]])\n\u003e\u003e\u003e nmf_mdl = matrix_fact.NMF(data, num_bases=2, niter=1, compW=False)\n\u003e\u003e\u003e nmf_mdl.initialization()\n\u003e\u003e\u003e nmf_mdl.W = W\n\u003e\u003e\u003e nmf_mdl.factorize()\n```\n\nBy changing py_fact.NMF to e.g. py_fact.AA or py_fact.CNMF Archetypal Analysis or Convex-NMF can be applied. Some methods might allow other parameters, make sure to have a look at the corresponding \u003ccode\u003e\u003e\u003e\u003ehelp(py_fact.AA)\u003c/code\u003e documentation. For example, CUR, CMD, and SVD are handled slightly differently, as they factorize into three submatrices which requires appropriate arguments for row and column sampling.\n\nFor PyTorch-SNMF\n```python\n\u003e\u003e\u003e data = torch.FloatTensor([[1.5], [1.2]])\n\u003e\u003e\u003e nmf_mdl = matrix_fact.NMF(data, num_bases=2)\n\u003e\u003e\u003e nmf_mdl.factorize(niter=1000)\n```\n\n### Very large datasets\nFor handling larger datasets py_fact supports hdf5 via h5py. Usage is straight forward as h5py allows to map large numpy matrices to disk. Thus, instead of passing data as a np.array, you can simply send the corresponding hdf5 table. The following example shows how to apply py_fact to a random matrix that is entirely stored on disk. In this example the dataset does not have to fit into memory, the resulting low-rank factors \u003ccode\u003eW,H\u003c/code\u003e have to.\n\n```python\n\u003e\u003e\u003e import h5py\n\u003e\u003e\u003e import numpy as np\n\u003e\u003e\u003e import matrix_fact\n\u003e\u003e\u003e\n\u003e\u003e\u003e file = h5py.File('myfile.hdf5', 'w')\n\u003e\u003e\u003e file['dataset'] = np.random.random((100,1000))\n\u003e\u003e\u003e sivm_mdl = matrix_fact.SIVM(file['dataset'], num_bases=10)\n\u003e\u003e\u003e sivm_mdl.factorize()\n```\n\nIf the low-rank matrices \u003ccode\u003eW,H\u003c/code\u003e also do not fit into memory, they can be initialized as a h5py matrix.\n\n```python\n\u003e\u003e\u003e import h5py\n\u003e\u003e\u003e import numpy as np\n\u003e\u003e\u003e import matrix_fact\n\u003e\u003e\u003e\n\u003e\u003e\u003e file = h5py.File('myfile.hdf5', 'w')\n\u003e\u003e\u003e file['dataset'] = np.random.random((100,1000))\n\u003e\u003e\u003e file['W'] = np.random.random((100,10))\n\u003e\u003e\u003e file['H'] = np.random.random((10,1000))\n\u003e\u003e\u003e sivm_mdl = matrix_fact.SIVM(file['dataset'], num_bases=10)\n\u003e\u003e\u003e sivm_mdl.W = file['W']\n\u003e\u003e\u003e sivm_mdl.H = file['H']\n\u003e\u003e\u003e sivm_mdl.factorize()\n```\n\nPlease note that currently not all methods work well with hdf5. While they all accept hdf5 input matrices, they sometimes lead to very high memory consumption on intermediate computation steps. This is difficult to avoid unless we switch to a completely disk-based storage.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgentaiscool%2Fmatrix_fact","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgentaiscool%2Fmatrix_fact","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgentaiscool%2Fmatrix_fact/lists"}