{"id":23505690,"url":"https://github.com/gamcil/fungiphy","last_synced_at":"2025-07-29T05:04:24.324Z","repository":{"id":113168315,"uuid":"245089769","full_name":"gamcil/fungiphy","owner":"gamcil","description":"Fungal marker-based phylogenetics toolkit","archived":false,"fork":false,"pushed_at":"2025-04-24T17:03:07.000Z","size":2559,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-08T23:49:27.598Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gamcil.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-03-05T06:49:24.000Z","updated_at":"2023-08-27T13:49:52.000Z","dependencies_parsed_at":"2025-05-07T20:25:13.448Z","dependency_job_id":null,"html_url":"https://github.com/gamcil/fungiphy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gamcil/fungiphy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Ffungiphy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Ffungiphy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Ffungiphy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Ffungiphy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gamcil","download_url":"https://codeload.github.com/gamcil/fungiphy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Ffungiphy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267632858,"owners_count":24118748,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-25T09:39:05.302Z","updated_at":"2025-07-29T05:04:24.309Z","avatar_url":"https://github.com/gamcil.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fungphy\nFungal marker-based phylogenetics toolkit\n\n## Usage\nClone, set up virtualenv, install fungphy\n```sh\ngit clone https://github.com/gamcil/fungphy.git\ncd fungphy\npython3 -m virtualenv venv\nsource venv/bin/activate\npip3 install -e .\n```\n\n### Setup\nInitialise database\n```python3\n\u003e\u003e\u003e from fungphy.database import db\n\u003e\u003e\u003e db.create_all()\n```\n\nScrape aspergilluspenicillium.org for Aspergillus, Penicillium \u0026 Talaromyces markers\n```python3\n\u003e\u003e\u003e from fungphy import scraper\n\u003e\u003e\u003e good, bad = scraper.scrape()\n\u003e\u003e\u003e with open(\"table.csv\", \"w\") as fp:\n...    for sp in good:\n...        columns = \",\".join(sp)\n...        fp.write(columns) \n```\n\nImport .csv\n```python3\n# Genus|Species|Reference|MycoBank ID|Type|Ex-types|Subgenus|Section|*markers\n\n\u003e\u003e\u003e from fungphy import importer\n\u003e\u003e\u003e with open(\"table.csv\") as fp:\n...    importer.parse_csv(fp)\n```\n\n### Express usage\n```python3\n\u003e\u003e\u003e from fungphy import plot\n\u003e\u003e\u003e table, msa, tree = plot._run(\n...     genera=[\"Aspergillus\"],\n...     species=[\"avenaceus\", \"alliaceus\", \"albertensis\", \"tamarii\", \"caelatus\",\n...              \"oryzae\", \"flavus\", \"nomius\", \"bombycis\", \"coremiiformis\", \"togoensis\"],\n...     markers=[\"ITS\", \"BenA\", \"CaM\", \"RPB2\"],\n...     trim_msa=True,\n...     run_fasttree=True,  # Generate tree using FastTree\n...     show_tree=True,\n...     outgroup=\"avenaceus\",  # Root this tree on A. avenaceus\n...     bold=[\"nomius\", \"bombycis\"],  # Bolden leaf labels for A. nomius and A. bombycis\n...     types=[\"albertensis\", \"tamarii\"]  # Use holotype strain name\n... )    \n```\n\nGenerates:\n\n\u003cimg src=\"img/example_tree.png\" width=\"600\" /\u003e\n\nMaximum likelihood tree inferred from combined ITS, BenA, CaM and RPB2 sequences of taxa\nwithin subg. *Circumdati* sect. *Flavi* using FastTree.\n\n### Alignments \u0026 phylogeny\nCurrently, procedures are in the `plot` module, so import that\n```python3\nfrom fungphy import plot\n```\n\nQuery DB for species\n```python3\n\u003e\u003e\u003e species = plot.get_species(\n...    genera=[\"Aspergillus\"],\n...    species=[\"avenaceus\", \"alliaceus\", \"albertensis\", \"tamarii\", \"caelatus\",\n...             \"oryzae\", \"flavus\", \"nomius\", \"bombycis\", \"coremiiformis\", \"togoensis\"],\n... )\n\u003e\u003e\u003e species\n[\u003cStrain 464\u003e, \u003cStrain 465\u003e, \u003cStrain 469\u003e, \u003cStrain 471\u003e, \u003cStrain 472\u003e, \u003cStrain 474\u003e, \u003cStrain 475\u003e, \u003cStrain 482\u003e, \u003cStrain 483\u003e, \u003cStrain 493\u003e, \u003cStrain 494\u003e]\n```\n\nNote that the returned objects are instances of the `Strain` class. The model hierarchy\nis as follows:\n```\nSubgenus\n  Section\n    Genus\n      Species\n        Strain\n          Marker\n```\n\nSince the species descriptions on aspergilluspenicillium.org describe just the type\nand ex-types formally attached to the species name, there will only be one `Strain` per\n`Species`. If more `Strain` objects are added, strain names can be specified using the\n`strains` argument:\n\n```python3\n\u003e\u003e\u003e species = plot.get_species(genera=[\"Aspergillus\"], strains=[\"CBS 1234\", ... ])\n```\n\nRetrieve marker sequences\n```python3\n\u003e\u003e\u003e markers = plot.get_marker_sequences(species, marker=\"ITS\")\n\u003e\u003e\u003e markers\n[\u003cfungphy.phylogeny.Sequence object at 0x7f90d6e4dc40\u003e, ... ]\n```\n\nNote this returns instances of the `Sequence` class, which have headers and sequences\n\n```python3\n\u003e\u003e\u003e sequence = markers[0]\n\u003e\u003e\u003e sequence.header, sequence.sequence[:20]\n(464, 'AAGGATCATTACCGAGTGTA')\n\u003e\u003e\u003e sequence.fasta()\n'\u003e464\\nAAGGATCATTACCGAGTGTAGGGTTCCCTCGTGGAGCCCAACC ... '\n```\n\nPhylogenetic procedures are stored in the `phylogeny` module\n```python3\nfrom fungphy import phylogeny as phy\n```\n\nAlign markers\n```python3\n\u003e\u003e\u003e msa = phy.align_sequences(markers, name=\"ITS\", tool=\"mafft\")\n\u003e\u003e\u003e msa\n\u003cfungphy.phylogeny.MSA object at 0x7f90d6e1fc10\u003e\n```\n\nThis returns an instance of `MSA`, which is a list-like class used to store aligned\n`Sequence` objects:\n\n```python3\n\u003e\u003e\u003e msa.records\n[\u003cfungphy.phylogeny.Sequence object at 0x7f90d6e16d00\u003e, ... ]\n```\n\nTrim MSAs to first and last non gap-containing columns\n```python3\n\u003e\u003e\u003e msa = phy.trim(msa)\n```\n\nOptionally, trim as above by a minimum number of non gap-containing columns\n```python3\n# Trim to first/last columns where at least 90% of sequences contain no gaps\n\u003e\u003e\u003e msa = phy.trim(msa, threshold=0.9)\n```\n\nWrite to file in FASTA format\n```python3\n\u003e\u003e\u003e with open(\"ITS.msa\", \"w\") as fp:\n...     fp.write(msa.fasta())\n```\n\nBack to the `plot` module, we can automate the above steps for multiple markers\n```python3\n\u003e\u003e\u003e mmsa = plot.align_organisms(\n...     species,\n...     markers=[\"ITS\", \"BenA\", \"CaM\", \"RPB2\"],\n...     trim_msa=True\n... )\nAligning ITS\nAligning BenA\nAligning CaM\nAligning RPB2\n\u003e\u003e\u003e mmsa\n\u003cfungphy.phylogeny.MultiMSA object at 0x7f90d6e4db20\u003e\n```\n\nNote this returns an instance of `MultiMSA`, which is another list-like class, this time\nused to store multiple instances of `MSA`\n```python3\n\u003e\u003e\u003e mmsa.msas\n[\u003cfungphy.phylogeny.MSA object at 0x7f90d6e067c0\u003e, ... ]\n```\n\nWrite to file in FASTA format, as above\n```python3\n\u003e\u003e\u003e with open(\"multi.msa\", \"w\") as fp:\n...     fp.write(mmsa.fasta())\n```\n\nCan also report partition intervals\n```python3\n\u003e\u003e\u003e mmsa.partitions\n[(1, 617), (618, 1098), (1099, 1606), (1607, 2562)]\n```\n\nWrite to file in RAxML format\n```python3\n\u003e\u003e\u003e with open(\"partitions.msa\", \"w\") as fp:\n...     fp.write(mmsa.raxml_partitions())\n```\n\nReady to be analysed by e.g. modeltest-ng / raxml-ng.\n\nMultiMSAs can also be re-read back into Python if given an accompanying RAxML format\npartition file generated as above\n```python3\n\u003e\u003e\u003e mmsa = phy.MultiMSA.from_msa_file(\"msa.fasta\", \"partitions.txt\")\n```\n\n### Summary table\nWe can use the `Summary` class to generate a table of marker accessions for use in\npublications.\n\n```python3\n\u003e\u003e\u003e table = plot.Summary.make(species, markers=[\"ITS\", \"BenA\", \"CaM\", \"RPB2\"])\n\u003e\u003e\u003e table.headers\n['Organism', 'ITS', 'BenA', 'CaM', 'RPB2']\n\u003e\u003e\u003e table.rows\n[['Aspergillus albertensis', 'EF661548', 'EF661464', 'EF661537', 'EU021628'],\n ['Aspergillus alliaceus', 'EF661551', 'EF661465', 'EF661534', 'MG517825'],\n ...]\n```\n\nFormat as e.g. tab delimited file with headers\n```python3\n\u003e\u003e\u003e formatted = table.format(delimiter=\"\\t\", show_headers=True)\n\u003e\u003e\u003e print(formatted)\nOrganism        ITS     BenA    CaM     RPB2\nAspergillus albertensis EF661548        EF661464        EF661537        EU021628\nAspergillus alliaceus   EF661551        EF661465        EF661534        MG517825\nAspergillus avenaceus   AF104446        FJ491481        FJ491496        JN121424\n...\n```\n\nWrite to file\n```python3\n\u003e\u003e\u003e with open(\"markers.tsv\", \"w\") as fp:\n...     fp.write(formatted)\n```\n\n### Visualisation\nNote above aligned `Sequence` objects use their respective strain's database row ID as a\nheader. This is to allow separate `MSA` instances to be linked in multi-locus analyses.\nThey are also used to lookup species information from the database when reading in a\ntree file for visualisation.\n\nGenerate a tree from MSA using FastTree\n```python3\n\u003e\u003e\u003e newick = phy.fasttree(mmsa)\n```\n\nThis generates a tree in Newick format, so convert to an ETE3 toolkit `Tree` object.\nNote we set our outgroup as *Aspergillus avenaceus*.\n```python3\n\u003e\u003e\u003e tree = plot.read_tree(newick, outgroup=\"avenaceus\")\n```\n\nVisualise the tree\n```python3\n\u003e\u003e\u003e plot.show(tree)\n```\n\nThis will open the ETE3 interactive tree browser for further manipulation. Magically,\nall our species information (including superscript Ts to indicate ex-type strains) have\nbeen filled in via the database (see top of page). Also note that support values of 100\nor 1.0 are shortened to asterisks (\\*). Note that due to the use of HTML in the tree\nleaf labels, resulting trees should be saved and edited as PDF files. Saving as SVG and\nimporting into e.g. InkScape will result in warped font kerning, spacing, etc.\n\nWe can also load in trees generated using e.g. raxml-ng\n```python3\n\u003e\u003e\u003e tree = plot.read_tree_from_path(\"path/to/file.nw\")\n```\n\nOr multiple trees for same species generated through different methods\n```python3\n\u003e\u003e\u003e tree = plot.read_trees_from_paths([\"tree1.nw\", \"tree2.nw\"], merge=True)\n```\n\nWith `merge=True`, support values for identical clades are merged; this allows\ne.g. placing posterior probabilities from Bayesian inference on a maximum likelihood\ntree, and vice versa. Unfortunately, since ETE3 reads in support values numerically,\nit does not allow combined values generated by e.g. `IQTree -s msa --alrt 1000 -B\n1000`. Thus `fungphy` does not support these either.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgamcil%2Ffungiphy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgamcil%2Ffungiphy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgamcil%2Ffungiphy/lists"}