{"id":31578020,"url":"https://github.com/jolespin/compositional","last_synced_at":"2025-10-05T19:21:26.344Z","repository":{"id":62564206,"uuid":"263873558","full_name":"jolespin/compositional","owner":"jolespin","description":"Python package for compositional data analysis including CLR/ILR, proportionality, partial correlation with basis shrinkage, and visualizations.","archived":false,"fork":false,"pushed_at":"2023-08-28T19:05:19.000Z","size":384,"stargazers_count":21,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-08T13:54:44.768Z","etag":null,"topics":["compositional-data-analysis","genomics-data","microbiome"],"latest_commit_sha":null,"homepage":"https://github.com/jolespin/compositional/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jolespin.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2020-05-14T09:37:46.000Z","updated_at":"2025-08-20T18:57:42.000Z","dependencies_parsed_at":"2025-09-08T13:40:16.558Z","dependency_job_id":"93511558-e321-418f-8227-a5de093b4e2a","html_url":"https://github.com/jolespin/compositional","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/jolespin/compositional","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolespin%2Fcompositional","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolespin%2Fcompositional/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolespin%2Fcompositional/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolespin%2Fcompositional/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jolespin","download_url":"https://codeload.github.com/jolespin/compositional/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jolespin%2Fcompositional/sbom","scorecard":{"id":529623,"data":{"date":"2025-08-11","repo":{"name":"github.com/jolespin/compositional","commit":"f98c8c156e458c9a16dda50c6e24664129743740"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.7,"checks":[{"name":"Code-Review","score":0,"reason":"Found 0/27 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":0,"reason":"Project has not signed or included provenance with any releases.","details":["Warn: release artifact v2023.7.20 not signed: https://api.github.com/repos/jolespin/compositional/releases/113041096","Warn: release artifact v2020.12.16 not signed: https://api.github.com/repos/jolespin/compositional/releases/36789764","Warn: release artifact v2023.7.20 does not have provenance: https://api.github.com/repos/jolespin/compositional/releases/113041096","Warn: release artifact v2020.12.16 does not have provenance: https://api.github.com/repos/jolespin/compositional/releases/36789764"],"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-20T05:23:43.222Z","repository_id":62564206,"created_at":"2025-08-20T05:23:43.222Z","updated_at":"2025-08-20T05:23:43.222Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278505934,"owners_count":25998210,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compositional-data-analysis","genomics-data","microbiome"],"created_at":"2025-10-05T19:21:22.536Z","updated_at":"2025-10-05T19:21:26.335Z","avatar_url":"https://github.com/jolespin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"### compositional\nCompositional data analysis in Python.\n\nThis package is meant to extend the methods of [scikit-bio](http://scikit-bio.org/docs/latest/generated/skbio.stats.composition.html#module-skbio.stats.composition) and serve as a pythonic alternative (not replacement) to *some* functionalities within [propr](https://github.com/tpq/propr).  \n\n#### Dependencies:\nCompatible for Python 3.\n\n\n**Required:**\n\n* pandas\n* numpy\n* scipy\n\n**Optional:**\n\n* scikit-bio\n* gneiss\n* ete[2/3]\n* matplotlib\n* seaborn\n* scitkit-learn\n\n\n#### Install:\n```\n# Stable release (Preferred)\npip install compositional\n\n# Developmental release\npip install git+https://github.com/jolespin/compositional\n```\n\n#### Proportionality and partial correlation methods adapted from the following source:\n* [propr: An R package to calculate proportionality between vectors of compositional data\n (Thomas Quinn)](https://github.com/tpq/propr)\n \n#### Isometric log-ratio methods use the following sources:\n* [scikit-bio: A package providing data structures, algorithms and educational resources for bioinformatics](https://github.com/biocore/scikit-bio)\n* [gneiss: a compositional data analysis toolbox designed for analyzing high dimensional proportions (Jamie Morton)](https://github.com/biocore/gneiss)\n\n \n#### Citations (Code):\n\n* Jin, S., Notredame, C. and Erb, I., 2022. Compositional Covariance Shrinkage and Regularised Partial Correlations. arXiv preprint arXiv:2212.00496.\n   \n* Quinn T, Richardson MF, Lovell D, Crowley T (2017) propr: An\n   R-package for Identifying Proportionally Abundant Features Using\n   Compositional Data Analysis. Scientific Reports 7(16252):\n   doi:10.1038/s41598-017-16520-0\n\n* Espinoza JL. compositional: Compositional data analysis in Python (2020). \n   https://github.com/jolespin/compositional\n   \n#### Citations (Theory):\n\n* Jin, S., Notredame, C. and Erb, I., 2022. Compositional \n\tCovariance Shrinkage and Regularised Partial Correlations. \n\tarXiv preprint arXiv:2212.00496.\n\t\n* Erb, I., 2020. Partial correlations in compositional data analysis. \n\tApplied Computing and Geosciences, 6, p.100026.\n\t\n* Quinn TP, Erb I, Gloor G, Notredame C, Richardson MF, Crowley TM\n   (2019) A field guide for the compositional analysis of any-omics\n   data. GigaScience 8(9). doi:10.1093/gigascience/giz107\n \n* Quinn T, Erb I, Richardson MF, Crowley T (2018) Understanding\n   sequencing data as compositions: an outlook and review.\n   Bioinformatics 34(16): doi:10.1093/bioinformatics/bty175\n \n* Erb I, Quinn T, Lovell D, Notredame C (2017) Differential\n   Proportionality - A Normalization-Free Approach To Differential\n   Gene Expression. Proceedings of CoDaWork 2017, The 7th\n   Compositional Data Analysis Workshop; available under bioRxiv\n   134536: doi:10.1101/134536\n \n* Erb I, Notredame C (2016) How should we measure proportionality\n   on relative gene expression data? Theory in Biosciences 135(1):\n   doi:10.1007/s12064-015-0220-8\n \n* Lovell D, Pawlowsky-Glahn V, Egozcue JJ, Marguerat S, Bahler J\n   (2015) Proportionality: A Valid Alternative to Correlation for\n   Relative Data. PLoS Computational Biology 11(3):\n   doi:10.1371/journal.pcbi.1004075\n   \n* Morton, J.T., Sanders, J., Quinn, R.A., McDonald, D., Gonzalez, A., Vázquez‐Baeza, Y., et al . (2017) Balance trees reveal microbial niche differentiation. mSystems: e00162‐16. doi: 10.1128/mSystems.00162-16\n\n\n#### Citations (Debut):\n   \n   * Espinoza JL., Shah N, Singh S, Nelson KE., Dupont CL. Applications of weighted association networks applied to compositional data in biology. https://doi.org/10.1111/1462-2920.15091\n\n\n_________________________\n### Usage:\n\nEach function operates on either 2D `pd.DataFrame` or `np.array` objects and output either `pandas` or `numpy` objects, respectively.  \n\nTransformation functions (e.g., `transform_clr`) output the equivalent object with the same shape.\n\nPairwise functions either output a redundant form or non-redundant form.  If a `numpy` object is input, then either a 2D redundant form or 1D non-redundant form `np.array` object will be output.  If a `pd.DataFrame` is input then there are 2 types of output that can be returned.  If `redundant_form=True`, then a square `pd.DataFrame` will be returned.  If `redundant_form=False`, then a `pd.Series` will be returned and the index will contain `frozenset` objects that have the combinations. \n\nFor the operations in logspace, a pseudocount of 1 is added to avoid -inf values for log(0). \n\n#### Loading package and obtaining data\nFor usage, we are going to load data from oral microbiome 16S amplicon data from [Gomez and Espinoza et al. 2017](https://pubmed.ncbi.nlm.nih.gov/28910633/).\n\n```python\nimport compositional as coda\nimport pandas as pd\n\n# Load abundances (Gomez and Espinoza et al. 2017)\nX = pd.read_csv(\"https://github.com/jolespin/projects/raw/main/supragingival_plaque_microbiome/16S_amplicons/Data/X.tsv.gz\", \n                sep=\"\\t\",\n                index_col=0,\n                compression=\"gzip\",\n)\n\n# Load metadata\nY = pd.read_csv(\"https://github.com/jolespin/projects/raw/main/supragingival_plaque_microbiome/16S_amplicons/Data/Y.tsv.gz\", \n                sep=\"\\t\",\n                index_col=0,\n                compression=\"gzip\",\n).loc[X.index]\n\n\n# print(\"X.shape: (n={} samples, m={} OTUs)\")\n# X.shape: (n=473 samples, m=481 OTUs)\n\n# Classes\nclasses = pd.Series(((Y[\"Caries_enamel\"] == \"YES\").astype(int) + (Y[\"Caries_dentine\"] == \"YES\").astype(int)).map(lambda x: {True:\"Caries\", False:\"Caries-free\"}[x \u003e 0]), name=\"Diagnosis\")\n\nclass_colors = {\"Caries-free\":\"black\", \"Caries\":\"red\"}\n```\n\n#### (Highpass) Filtering of compositional data\nFiltering functions to preprocess data.  Example use case: (1) Remove all samples with less than 10,000 total counts; (2) then all features that aren't in at least 50% of the samples, and then (3) samples that don't have at least 50 detected components.\n\n```\nX_filtered = coda.filter_data_highpass(\n    X=X, \n    minimum_total_counts=10000,\n    minimum_prevalence=0.5,\n    minimum_components=50,\n)\n\nX.shape, X_filtered.shape\n# ((473, 481), (401, 93))\n```\n\n#### Summary metrics\nSummary metrics for compositional data. \n\n```\n# Sparsity\ns = coda.sparsity(X)\nprint(\"Ratio of zeros in dataset: {:.3f}\".format(s))\n# Ratio of zeros in dataset: 0.776\n\n# Total number of components per composition (i.e., richness)\ncoda.number_of_components(X).head()\n# S-1409-45.B_RD1     111\n# 1104.2_RD1           84\n# S-1409-42.B_RD1     142\n# 1073.1_RD1          101\n# A-1504-100.B_RD1     95\n\n# Prevalence of components across compositions\ncoda.prevalence_of_components(X).head()\n# Otu000514    470\n# Otu000001    473\n# Otu000038    472\n# Otu000003    473\n# Otu000326    432\n```\n\n#### Pairwise operations\n\nAll pairwise operations support either a redundant form or non-redundant form using the `redundant_form` argument. \n\n##### Pairwise sample operations:\n\n```\n# Pairwise Aitchison distance (redundant form)\naitchison_distances = coda.pairwise_aitchison_distance(X + 1, redundant_form=True)\n# print(aitchison_distances.iloc[:4,:4])\n#                  S-1409-45.B_RD1  1104.2_RD1  S-1409-42.B_RD1  1073.1_RD1\n# S-1409-45.B_RD1         0.000000   25.384218        21.573635   23.455055\n# 1104.2_RD1             25.384218    0.000000        27.811292   21.942080\n# S-1409-42.B_RD1        21.573635   27.811292         0.000000   26.734435\n# 1073.1_RD1             23.455055   21.942080        26.734435    0.000000\n\n# Pairwise Aitchison distance (non-redundant form)\naitchison_distances = coda.pairwise_aitchison_distance(X + 1, redundant_form=False)\n# print(aitchison_distances)\n# aitchison_distance\n# (S-1409-45.B_RD1, 1104.2_RD1)          25.384218\n# (S-1409-45.B_RD1, S-1409-42.B_RD1)     21.573635\n# (S-1409-45.B_RD1, 1073.1_RD1)          23.455055\n# (S-1409-45.B_RD1, A-1504-100.B_RD1)    21.330042\n# (S-1409-45.B_RD1, 2053.2_RD1)          22.531754\n#                                          ...    \n# (S-1410-40.B_RD1, M-1507-132.A_RD1)    23.247654\n# (M-1507-132.A_RD1, C-1504-92.B_RD1)    20.422768\n# (S-1410-40.B_RD1, 2005.1_RD1)          22.294198\n# (2005.1_RD1, C-1504-92.B_RD1)          21.323598\n# (S-1410-40.B_RD1, C-1504-92.B_RD1)     21.073093\n# Length: 111628, dtype: float64\n```\n\n##### Pairwise component operations:\n\n```\n# Pairwise variance log-ratio\nvlr = coda.pairwise_vlr(X + 1)\n# print(vlr.iloc[:4,:4])\n#            Otu000514  Otu000001  Otu000038  Otu000003\n# Otu000514   0.000000   0.764679   1.844322   1.869921\n# Otu000001   0.764679   0.000000   1.299599   1.230553\n# Otu000038   1.844322   1.299599   0.000000   2.207001\n# Otu000003   1.869921   1.230553   2.207001   0.000000\n\n# Pairwise rho from Erb et al. 2016\nrhos = coda.pairwise_rho(X + 1)\n# print(rhos.iloc[:4,:4])\n#            Otu000514  Otu000001  Otu000038  Otu000003\n# Otu000514   1.000000   0.708325   0.304007   0.298552\n# Otu000001   0.708325   1.000000   0.355895   0.394880\n# Otu000038   0.304007   0.355895   1.000000  -0.070423\n# Otu000003   0.298552   0.394880  -0.070423   1.000000\n\n# Pairwise phi from Erb et al. 2016\nphis = pairwise_phi(X + 1)\n# print(phis.iloc[:4,:4])\n#            Otu000514  Otu000001  Otu000038  Otu000003\n# Otu000514   0.000000   0.470005   1.133602   1.149336\n# Otu000001   0.470005   0.000000   1.306492   1.237079\n# Otu000038   1.133602   1.306492   0.000000   2.157470\n# Otu000003   1.149336   1.237079   2.157470   0.000000\n\n```\n\n##### Partial correlation with basis shrinkage (requires scikit-learn)\n```\n# Pairwise partial correlation with basis shrinkage from Erb et al. 2020 and Jin et al. 2022\npcorr = coda.pairwise_partial_correlation_with_basis_shrinkage(X + 1)\n# print(pcorr.iloc[:4,:4])\n#            Otu000514  Otu000001  Otu000038  Otu000003\n# Otu000514   1.000000   0.256310  -0.022194  -0.005131\n# Otu000001   0.256310   1.000000   0.105960   0.222187\n# Otu000038  -0.022194   0.105960   1.000000  -0.042785\n# Otu000003  -0.005131   0.222187  -0.042785   1.000000\n```\n\n#### Isometric log-ratio transform *without* tree (requires scikit-bio)\n```\n# Isometric log-ratio\nX_ilr_without_tree = coda.transform_ilr(X + 1)\n# print(X_ilr_without_tree.iloc[:4,:4])\n#                         0         1         2         3\n# S-1409-45.B_RD1 -2.663112 -0.139161 -1.098112  6.023297\n# 1104.2_RD1      -2.094331  3.804032 -4.579665  2.357939\n# S-1409-42.B_RD1 -1.909313 -0.023536 -0.018245  5.614873\n# 1073.1_RD1      -1.879929  2.322184 -2.717553  2.426881\n```\n\n#### Isometric log-ratio transform *with* tree (requires scikit-bio, gneiss, and [Optional: ete3])\n```\nimport requests\nfrom io import StringIO\nfrom skbio import TreeNode\n\n# Get newick tree\nurl = \"https://github.com/jolespin/projects/raw/main/supragingival_plaque_microbiome/16S_amplicons/Data/otus.alignment.fasttree.nw\"\nnewick = requests.get(url).text\ntree = TreeNode.read(StringIO(newick), convert_underscores=False)\ntree.bifurcate()\n\n# Name internal nodes\nintermediate_node_index = 1\nfor node in tree.traverse():\n    if not node.is_tip():\n        node.name = \"y{}\".format(intermediate_node_index)\n        intermediate_node_index += 1\n\n# Isometric log-ratio transform\nX_ilr_with_tree = coda.transform_ilr(X + 1, tree)\n# print(X_ilr_with_tree.iloc[:4,:4])\n#                        y1        y2          y480            y3\n# S-1409-45.B_RD1 -1.039407  1.655538 -2.464164e-17  6.189481e-16\n# 1104.2_RD1      -0.673964  1.073470  4.192522e-18  3.923163e-16\n# S-1409-42.B_RD1 -1.326432  2.112703  3.851113e-17  8.306736e-16\n# 1073.1_RD1      -0.979605  1.560287  5.023995e-18  5.907717e-16\n```\n\n#### Plotting compositions (requires matplotlib and seaborn)\n\nLet's color the samples by a continuous variable (e.g., age in months).\n\n```\nsample_labels = pd.Index(X.sum(axis=1).sort_values().index[:4].tolist())\n\nfig, g, df = coda.plot_compositions(X, colors=Y.loc[X.index,\"age (months)\"],  sample_labels=sample_labels, title=\"Caries\", figsize=(8,5))\n```\n![](images/kde2d_continuous.png)\n\nNow color the samples by each class (e.g., phenotype).\n\n```\nfig, g, df = coda.plot_compositions(X, classes=classes, class_colors=class_colors, log_scale=True, title=\"Caries\", style=\"ggplot\", vertical_lines=[1, 1000,5000])\n```\n\n![](images/kde2d_classes.png)\n\n\n#### Plotting prevalence (requires matplotlib)\nTo identify a threshold to remove low prevalence components/features let's plot a prevalence curve where the x-axis shows the prevalence and y-axis shows the number of components are prevalent in x samples. \n\nFirst, let's look at the prevalence globally. We want to see number of OTUs that are prevalent in at least 1 sample, 2 samples, half the samples, and all the samples. \n\nThere are 462 OTUs that in are in at least 1 sample, 392 OTUs that are in at least 2 samples (i.e., 462 - 392 = 70 singleton OTUs), and 11 OTUs that are in all the samples.\n\n```\nfig, ax, prevalence_distribution = coda.plot_prevalence(X, component_type=\"OTUs\", show_prevalence=[1,2,0.5,1.0])\n\n```  \n\n![](images/prevalence_global.png)\n\nNow, let's look at the prevalence for each classes separately.\n\n```\nfig, ax, prevalence_distribution = coda.plot_prevalence(X, classes=classes,  class_colors=class_colors, component_type=\"OTUs\", show_prevalence=[1,2,0.5,1.0])\n```\n\n![](images/prevalence_classes.png)\n\n#### Notes:\n* Versions prior to v2020.12.16 used `ddof=0` for all variance except during the `vlr` calculation.  This was because `pandas._libs.algos.nancorr` uses `ddof=1` and not `ddof=0`.  This caused specific `rho` values not to be bound by [-1,1].  To retain the performance of `nancorr`, I've set all `ddof=1` to match `nancorr`. \n* The partial correlation with basis shrinkage is implemented exactly the same as `propr` as the backend algorithm in the `corpcor` package uses an updated the Ledoit-Wolf shrinkage approach from [Opgen-Rhein, R., and K. Strimmer. 2007](doi.org/10.2202/1544-6115.1252) and [Schafer, J., and K. Strimmer. 2005](doi.org/10.2202/1544-6115.1175).\n\n#### Acknowledgements:\n  * [Thomas Quinn](https://scholar.google.com/citations?user=h4nh0VoAAAAJ\u0026hl=en\u0026oi=sra) for [insightful explanations of compositional data analysis](https://github.com/tpq/propr/issues/11)  and [Jamie Morton](https://scholar.google.com/citations?user=gwzQvp4AAAAJ\u0026hl=en\u0026oi=sra) for [help in understanding isometric log-ratio transformations](https://github.com/biocore/gneiss/issues/262).  \n  * [Ionas Erb](https://scholar.google.com/citations?user=4DeNxosAAAAJ\u0026hl=en) and [Suzanne Jin](https://scholar.google.com/citations?user=7hSkrvoAAAAJ\u0026hl=en) for their help in understanding partial correlation with basis shrinkage.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjolespin%2Fcompositional","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjolespin%2Fcompositional","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjolespin%2Fcompositional/lists"}