{"id":13697152,"url":"https://github.com/gregversteeg/LinearCorex","last_synced_at":"2025-05-03T19:32:59.056Z","repository":{"id":62576165,"uuid":"51806307","full_name":"gregversteeg/LinearCorex","owner":"gregversteeg","description":"Fast, linear version of CorEx for covariance estimation, dimensionality reduction, and subspace clustering with very under-sampled, high-dimensional data","archived":false,"fork":false,"pushed_at":"2020-11-30T17:49:33.000Z","size":26383,"stargazers_count":42,"open_issues_count":2,"forks_count":13,"subscribers_count":25,"default_branch":"master","last_synced_at":"2024-10-13T12:51:58.468Z","etag":null,"topics":["clustering","covariance-matrix","factor-analysis","information-theory","machine-learning","python","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gregversteeg.png","metadata":{"files":{"readme":"README.md","changelog":"Changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-02-16T03:52:24.000Z","updated_at":"2024-08-12T19:21:25.000Z","dependencies_parsed_at":"2022-11-03T20:06:49.749Z","dependency_job_id":null,"html_url":"https://github.com/gregversteeg/LinearCorex","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gregversteeg%2FLinearCorex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gregversteeg%2FLinearCorex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gregversteeg%2FLinearCorex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gregversteeg%2FLinearCorex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gregversteeg","download_url":"https://codeload.github.com/gregversteeg/LinearCorex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224373629,"owners_count":17300532,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","covariance-matrix","factor-analysis","information-theory","machine-learning","python","unsupervised-learning"],"created_at":"2024-08-02T18:00:53.156Z","updated_at":"2025-05-03T19:32:59.047Z","avatar_url":"https://github.com/gregversteeg.png","language":"Python","funding_links":[],"categories":["Models"],"sub_categories":["Embedding based Topic Models"],"readme":"# Latent Factor Models Based on Linear Total Correlation Explanation (CorEx)\n\nLinear CorEx finds latent factors that are as informative as possible about relationships in the data. \nThe approach is described in this paper:\n[Low Complexity Gaussian Latent Factor Models and a Blessing of Dimensionality](https://arxiv.org/abs/1706.03353).\nThis is useful for covariance estimation, clustering related variables, and dimensionality reduction, especially \nin the high-dimensional, under-sampled regime. \n\nTo install:\n```\npip install linearcorex\n```\n\nMathematically, the objective is to find factors, y, where y = W x and  \nx in R^n is the data and W is an m by n weight matrix. \nWe are minimizing TC(X|Y) + TC(Y) where TC is the \"total correlation\" or multivariate mutual information. This objective\nis optimized when X's are independent after conditioning on Y's, and the Y's themselves are independent. \nInstead of heuristically upper bounding this objective as we do for discrete CorEx, \nwe are able to optimize it exactly in the linear case. \nWhile this extension required assumptions of linearity, the \nadvantage is that the code is pretty fast since it only relies on matrix algebra. In principle it could be \nfurther accelerated using GPUs. \n\n\nWithout further constraints, the optima of this objective \nmay have an undesirable property: information about the X_i's can be stored \"synergistically\" in the latent factors. \nIn other words, to predict a single variable you need to combine info from all the latent factors. Therefore, we \nadd a constraint that the solutions should be non-synergistic (latent factors are individually informative about each variable X_i). \nThis also recovers the property of the original lower bound formulation from AISTATS that each latent factor\nhas a non-negative added contribution towards TC.\nNote that by default, we constrain solutions to eliminate synergy. \nBut, you can turn it off by setting eliminate_synergy=False in the python API or -a from the command line. \nFor making nice trees, it should be left on (e.g. for personality data or ADNI data). \n\nTo test the command line interface, try:\n```\ncd $INSTALL_DIRECTORY/linearcorex/\npython vis_corex.py ../tests/data/test_big5.csv --layers=5,1 --verbose=1 --no_row_names -o big5\npython vis_corex.py ../tests/data/adni_blood.csv --layers=30,5,1 --missing=-1e6 --verbose=1 -o adni\npython vis_corex.py ../tests/data/matrix.tcga_ov.geneset1.log2.varnorm.RPKM.txt --layers=30,5,1 --delimiter=' ' --verbose=1 --gaussianize=\"outliers\" -o gene\n```\nEach of these examples generates pairwise plots of relationships and a graph. \n\nThe python API uses the sklearn conventions of fit/transform.  \n```python\nimport linearcorex as lc\nimport numpy as np\n\nout = lc.Corex(n_hidden=5, verbose=True)  # A Corex model with 5 factors\nX = np.random.random((100, 50))  # Random data with 100 samples and 50 variables\nout.fit(X)  # Fit the model on data\ny = out.transform(X)  # Transform data into latent factors\nprint(out.clusters)  # See the clusters\ncov = out.get_covariance()  # The covariance matrix\n```\n\n\nMissing values can be specified, but are just imputed in a naive way. \n\n## Papers\n\nSee [Sifting Common Info...](https://arxiv.org/abs/1606.02307) and \n[Maximally informative representations...](https://arxiv.org/abs/1410.7404) for work building up to this method. \nThe main paper describing the method is \n[Low Complexity Gaussian Latent Factor Models and a Blessing of Dimensionality](https://arxiv.org/abs/1706.03353).\nThe connections with the idea of \"synergy\" will be described in future work. \n\n\n### Troubleshooting visualization\nFor Mac users: \n\nTo get the visualization of the hierarchy looking nice sometimes takes a little effort. To get graphs to compile correctly do the following. \nUsing \"brew\" to install, you need to do \"brew install gts\" followed by \"brew install --with-gts graphviz\". \nThe (hacky) way that the visualizations are produced is the following. The code, vis_corex.py, produces a text file called \"graphs/graph.dot\". This just encodes the edges between nodes in dot format. Then, the code calls a command line utility called sfdp that is part of graphviz, \n\n```\nsfdp graph.dot -Tpdf -Earrowhead=none -Nfontsize=12  -GK=2 -Gmaxiter=1000 -Goverlap=False -Gpack=True -Gpackmode=clust -Gsep=0.01 -Gsplines=False -o graph_sfdp.pdf\n```\n\nThese dot files can also be opened with OmniGraffle if you would like to be able to manipulate them by hand. \nIf you want, you can try to recompile graphs yourself with different options to make them look nicer. Or you can edit the dot files to get effects like colored nodes, etc.\n\nFor Ubuntu users:\n\nCredits: https://gitlab.com/graphviz/graphviz/issues/1237\n\n1. Remove any existing installation with `conda uninstall graphviz`. (If you did not install with Conda, you might need to do `sudo apt purge graphviz` and/or `pip uninstall graphviz`).\n    \n2. run `sudo apt install libgts-dev`\n\n3. run `sudo pkg-config --libs gts`\n    \n4. run `sudo pkg-config --cflags gts`\n\n5. Download `graphviz-2.40.1.tar.gz` from [here](https://graphviz.gitlab.io/pub/graphviz/stable/SOURCES/graphviz.tar.gz)\n\n6. Navigate to directory containing download, and extract with `tar -xvf graphviz-2.40.1.tar.gz` (or newer whatever the download is named.)\n\n7. `cd` into extracted folder (ie `cd graphviz-2.40.1`) and run `sudo ./configure --with-gts`\n\n8. Run `sudo make` in the folder\n\n9. Run `sudo make install` in the folder\n\n10. Reinstall library using `pip install graphviz`\n    \n \n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgregversteeg%2FLinearCorex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgregversteeg%2FLinearCorex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgregversteeg%2FLinearCorex/lists"}