{"id":22188428,"url":"https://github.com/vmikk/phylonext","last_synced_at":"2025-07-17T16:40:52.348Z","repository":{"id":43680130,"uuid":"457327826","full_name":"vmikk/PhyloNext","owner":"vmikk","description":"A pipeline for phylogenetic diversity analysis of GBIF-mediated data","archived":false,"fork":false,"pushed_at":"2025-05-30T09:52:34.000Z","size":18998,"stargazers_count":13,"open_issues_count":12,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-30T12:40:39.424Z","etag":null,"topics":["beta-diversity","biodiverse","docker","endemism","gbif","nextflow","phylodiversity","phylogenetic-diversity","r","randomisations","singularity"],"latest_commit_sha":null,"homepage":"https://phylonext.github.io","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vmikk.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-09T11:21:45.000Z","updated_at":"2025-05-30T09:52:39.000Z","dependencies_parsed_at":"2023-02-18T05:45:38.864Z","dependency_job_id":"b2bbcae5-e522-40d0-9243-1b5c6783cc9d","html_url":"https://github.com/vmikk/PhyloNext","commit_stats":{"total_commits":791,"total_committers":3,"mean_commits":263.6666666666667,"dds":0.005056890012642201,"last_synced_commit":"8ab4da5a239b3527aa787ae146c830ab8bcbb9db"},"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/vmikk/PhyloNext","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2FPhyloNext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2FPhyloNext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2FPhyloNext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2FPhyloNext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vmikk","download_url":"https://codeload.github.com/vmikk/PhyloNext/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2FPhyloNext/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265631453,"owners_count":23801828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beta-diversity","biodiverse","docker","endemism","gbif","nextflow","phylodiversity","phylogenetic-diversity","r","randomisations","singularity"],"created_at":"2024-12-02T11:10:29.157Z","updated_at":"2025-07-17T16:40:52.311Z","avatar_url":"https://github.com/vmikk.png","language":"R","readme":"# PhyloNext - PD (Phylogenetic Diversity) in the cloud \u003cimg src='images/PhyloNext_logo.png' align=\"right\" height=\"100\" /\u003e\n\n![GitHub (latest release)](https://img.shields.io/github/v/release/vmikk/PhyloNext?label=GitHub%20release)\n[![Nextflow](https://img.shields.io/badge/Nextflow%20DSL2-%E2%89%A522.10.0-23aa62.svg?labelColor=000000)](https://www.nextflow.io/)\n[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000\u0026logo=docker)](https://www.docker.com/)\n[![run with singularity](https://img.shields.io/badge/run%20with-singularity-blue?style=flat\u0026logo=singularity)](https://sylabs.io/docs/)\n[![GitHub license](https://img.shields.io/github/license/vmikk/PhyloNext)](https://github.com/vmikk/PhyloNext/blob/main/LICENSE)  \nCI/CD status:\n[![Nextflow (full pipeline)](https://github.com/vmikk/PhyloNext/actions/workflows/Nextflow_test.yml/badge.svg)](https://github.com/vmikk/PhyloNext/actions/workflows/Nextflow_test.yml)\n[![OToL](https://github.com/vmikk/PhyloNext/actions/workflows/OToL_test.yml/badge.svg)](https://github.com/vmikk/PhyloNext/actions/workflows/OToL_test.yml)\n[![Biodiverse](https://github.com/vmikk/PhyloNext/actions/workflows/Biodiverse_test.yml/badge.svg)](https://github.com/vmikk/PhyloNext/actions/workflows/Biodiverse_test.yml)  \n[![DOI - 10.1186/s12862-024-02256-9](https://img.shields.io/badge/DOI-10.1186%2Fs12862--024--02256--9-24B064)](https://bmcecolevol.biomedcentral.com/articles/10.1186/s12862-024-02256-9)\n[![DOI](https://zenodo.org/badge/457327826.svg)](https://zenodo.org/badge/latestdoi/457327826)\n\n\nPhyloNext is the automated pipeline for the analysis of phylogenetic diversity using [GBIF occurrence data](https://www.gbif.org/occurrence/search?occurrence_status=present), species phylogenies from [Open Tree of Life](https://tree.opentreeoflife.org), and [Biodiverse software](https://shawnlaffan.github.io/biodiverse/).\n\n## Introduction\n\nCurrent pipeline brings together two critical research data infrastructures, the Global\nBiodiversity Information Facility [(GBIF)](https://www.gbif.org/) and Open Tree of Life [(OToL)](https://tree.opentreeoflife.org), to make them more accessible to non-experts.\n\nThe pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses [Docker](https://www.docker.com/) containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.\n\nThe pipeline could be launched in a cloud environment (e.g., the [Microsoft Azure Cloud Computing Services](https://azure.microsoft.com/en-us/), [Amazon AWS Web Services](https://aws.amazon.com/), and [Google Cloud Computing Services](https://cloud.google.com/)).\n\n## Pipeline summary\n\n1. Filtering of GBIF species occurrences for various taxonomic clades and geographic areas\n2. Removal of non-terrestrial records and spatial outliers (using density-based clustering)\n3. Preparation of phylogenetic tree (currently, only pre-constructed phylogenetic trees are available; with the update of OToL, phylogenetic trees will be downloaded automatically using API) and name-matching with GBIF species keys\n4. Spatial binning of species occurrences using Uber’s H3 system (hexagonal hierarchical spatial index)\n5. Estimation of phylogenetic diversity and endemism indices using [Biodiverse program](https://shawnlaffan.github.io/biodiverse/)\n6. Visualization of the obtained results\n\n## Quick Start\n\nAn example command to run the pipilene:\n\n```bash\nnextflow run vmikk/phylonext -r main \\\n  --input \"/mnt/GBIF/Parquet/2022-01-01/occurrence.parquet/\" \\\n  --classis \"Mammalia\" --family  \"Felidae,Canidae\" \\\n  --country \"DE,PL,CZ\"  \\\n  --minyear 2000  \\\n  --dbscan true  \\\n  --phytree $(realpath \"${HOME}/.nextflow/assets/vmikk/phylonext/test_data/phy_trees/Mammals.nwk\") \\\n  --iterations 100  \\\n  -resume\n```\n\n## Web GUI\n\nTo facilitate easy and efficient navigation for exploring the PhyloNext pipeline, a user-friendly, web-based graphical user interface (GUI) has been developed by [Thomas Stjernegaard Jeppesen](https://github.com/thomasstjerne).\n\nThe GUI is available at [https://phylonext.gbif.org/](https://phylonext.gbif.org/).\n\n**NB!** To access the GUI, users must have a GBIF user account. To register an account, please visit https://www.gbif.org/.\n\n\n## Documentation\n\nThe PhyloNext pipeline comes with documentation about the pipeline usage\nat [https://phylonext.github.io/](https://phylonext.github.io/).\n\nMain pipeline parameters and output are desribed here:\n- [parameters](https://phylonext.github.io/parameters/)\n- [output](https://phylonext.github.io/outputs/)\n\nTo show a help message, run `nextflow run vmikk/phylonext -r main --help`.\n```\n=====================================================================\nPhyloNext: GBIF phylogenetic diversity pipeline :  Version 1.4.0\n=====================================================================\n\nPipeline Usage:\nTo run the pipeline, enter the following in the command line:\n    nextflow run vmikk/phylonext -r main --input ... --outdir ...\n\nOptions:\nREQUIRED:\n    --input               Path to the directory with parquet files (GBIF occurrcence dump)\n    --outdir              The output directory where the results will be saved\nOPTIONAL:\n    --phylum              Phylum to analyze (multiple comma-separated values allowed); e.g., \"Chordata\"\n    --classis             Class to analyze (multiple comma-separated values allowed); e.g., \"Mammalia\"\n    --order               Order to analyze (multiple comma-separated values allowed); e.g., \"Carnivora\"\n    --family              Family to analyze (multiple comma-separated values allowed); e.g., \"Felidae,Canidae\"\n    --genus               Genus to analyze (multiple comma-separated values allowed); e.g., \"Felis,Canis,Lynx\"\n    --specieskeys         Custom list of GBIF specieskeys (file with a single column, with header)\n\n    --phytree             Custom phylogenetic tree\n    --taxgroup            Specific taxonomy group in Open Tree of Life (default, \"All_life\")\n    --phylabels           Type of tip labels on a phylogenetic tree (\"OTT\" or \"Latin\")\n    --maxage              Manually assign root age for a tree obtained from Open Tree of Life; e.g., 127\n    --phyloonly           Prune Open Tree tips for which there are no phylogenetic inputs; logical, default, false\n\n    --country             Country code, ISO 3166 (multiple comma-separated values allowed); e.g., \"DE,PL,CZ\"\n    --latmin              Minimum latitude of species occurrences (decimal degrees); e.g., 5.1\n    --latmax              Maximum latitude of species occurrences (decimal degrees); e.g., 15.5\n    --lonmin              Minimum longitude of species occurrences (decimal degrees); e.g., 47.0\n    --lonmax              Maximum longitude of species occurrences (decimal degrees); e.g., 55.5\n    --minyear             Minimum year of record's occurrences; default, 1945\n    --maxyear             Maximum year of record's occurrences; default, none\n    --coordprecision      Coordinate precision threshold (less than maximum allowed value; default, 0.1)\n    --coorduncertainty    Maximum allowed coordinate uncertainty, meters (default, 10000)\n    --coorduncertaintyexclude Black list of coordinate uncertainty values (default, \"301,3036,999,9999\")\n    --basisofrecordinclude Basis of record to include from the data; e.g., \"PRESERVED_SPECIMEN\"\n    --basisofrecordexclude Basis of record to exclude from the data; e.g., \"FOSSIL_SPECIMEN,LIVING_SPECIMEN\"\n    --polygon             Custom area of interest (a file with polygons in GeoPackage format)\n    --wgsrpd              Polygons of World Geographical Regions; e.g., \"pipeline_data/WGSRPD.RData\"\n    --regions             Names of World Geographical Regions; e.g., \"L1_EUROPE,L1_ASIA_TEMPERATE\"\n    --noextinct           File with extinct species specieskeys for their removal (file with a single column, with header)\n    --excludehuman        Logical, exclude genus \"Homo\" from occurrence data (default, true)\n    --roundcoords         Numeric, round spatial coordinates to N decimal places, to reduce the dataset size (default, 2; set to negative to disable rounding)\n    --h3resolution        Spatial resolution of the H3 geospatial indexing system; e.g., 4\n\n    --dbscan              Logical, remove spatial outliers with density-based clustering; e.g., \"false\"\n    --dbscannoccurrences  Minimum species occurrence to perform DBSCAN; e.g., 30\n    --dbscanepsilon       DBSCAN parameter epsilon, km; e.g., \"700\"\n    --dbscanminpts        DBSCAN min number of points; e.g., \"3\"\n\n    --terrestrial         Land polygon for removal of non-terrestrial occurrences; e.g., \"pipeline_data/Land_Buffered_025_dgr.RData\"\n    --rmcountrycentroids  Polygons with country and province centroids; e.g., \"pipeline_data/CC_CountryCentroids_buf_1000m.RData\"\n    --rmcountrycapitals   Polygons with country capitals; e.g., \"pipeline_data/CC_Capitals_buf_10000m.RData\"\n    --rminstitutions      Polygons with biological institutuions and museums; e.g., \"pipeline_data/CC_Institutions_buf_100m.RData\"\n    --rmurban             Polygons with urban areas; e.g., \"pipeline_data/CC_Urban.RData\"\n\n    --deriveddataset      Prepare a list of DOIs for the datasets used (default, true)\n\n    --indices             Comma-seprated list of diversity and endemism indices; e.g., \"calc_richness,calc_pd,calc_pe\"\n    --randname            Randomisation scheme type; e.g., \"rand_structured\"\n    --iterations          Number of randomisation iterations; e.g., 1000\n    --biodiversethreads   Number of Biodiverse threads; e.g., 10\n    --randconstrain       Polygons to perform spatially constrained randomization (GeoPackage format)\n\nLeaflet interactive visualization:\n    --leaflet_var         Variables to plot; e.g., \"RICHNESS_ALL,PD,SES_PD,PD_P,ENDW_WE,SES_ENDW_WE,PE_WE,SES_PE_WE,CANAPE,Redundancy\"\n    --leaflet_canapesuper Include the `superendemism` class in CANAPE results (default, false)\n    --leaflet_color       Color scheme for continuous variables (default, \"RdYlBu\")\n    --leaflet_palette     Color palette for continuous variables (default, \"quantile\")\n    --leaflet_bins        Number of color bins for continuous variables (default, 5)\n    --leaflet_sescolor    Color scheme for standardized effect sizes, SES (default, \"threat\"; alternative - \"hotspots)\n    --leaflet_redundancy  Redundancy threshold for hiding the grid cells with low number of records (default, 0 = display all grid cells)\n\nStatic visualization:\n    --plotvar             Variables to plot (multiple comma-separated values allowed); e.g., \"RICHNESS_ALL,PD,PD_P\"\n    --plottype            Plot type\n    --plotformat          Plot format (jpg,pdf,png)\n    --plotwidth           Plot width (default, 18 inches)\n    --plotheight          Plot height (default, 18 inches)\n    --plotunits           Plot size units (in,cm)\n    --world               World basemap\n\nNEXTFLOW-SPECIFIC:\n    -qs                   Queue size (max number of processes that can be executed in parallel); e.g., 8\n    -w                    Path to the working directory to store intermediate results (default, \"./work\")\n    -resume               Execute the pipeline using the cached results.\u003cbr\u003eUseful to continue executions that was stopped by an error\n    -profile              Configuration profile; e.g., \"docker\"\n    -params-file          Parameter file in YAML or JSON format (e.g., \"Mammals.yaml\")\n    -c / -C               Configuration file (`-C` ignores all default values) (default, \"nextflow.config\")\n```\n\nSource code for the documentation can be found at [https://github.com/PhyloNext/phylonext.github.io](https://github.com/PhyloNext/phylonext.github.io).\n\n\n## Credits\n\nPhyloNext pipeline was developed by [Vladimir Mikryukov](https://github.com/vmikk) and [Kessy Abarenkov](https://github.com/kessya).\n\n[Biodiverse program](https://shawnlaffan.github.io/biodiverse/) and Perl scripts accompanying PhyloNext were written by [Shawn Laffan](https://github.com/shawnlaffan) (Laffan et al., 2010).\n\nScripts for getting an induced subtree from the Open Tree of Life were developed by [Emily Jane McTavish](https://github.com/snacktavish).\n\nWe thank the following people for their extensive assistance in the development of this pipeline: Joe Miller, Shawn Laffan, Tim Robertson, Emily Jane McTavish, John Waller, Thomas Stjernegaard Jeppesen, and Matthew Blissett.\n\nAlso we are very grateful to [Manuele Simi](https://github.com/manuelesimi) and [nf-core](https://nf-co.re/) community for helpful advices on the development of this pipeline.\n\nFor more details, please see the [Acknowledgments section](https://phylonext.github.io/acknowledgements/) in the docs.\n\n## Funding\n\nThe work is supported by a grant “PD (Phylogenetic Diversity) in the Cloud” to GBIF Supplemental funds from the GEO-Microsoft Planetary Computer Programme.\n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the [contributing guidelines](CONTRIBUTING.md).\n\nFor further information or help, don't hesitate to file an [issue on GitHub](https://github.com/vmikk/PhyloNext/issues).\n\n## Future plans\n\n- Add support of [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) containers.\n\n## Citations\n\nIf you use PhyloNext pipeline for your analysis, please cite it as:\n\nMikryukov V, Abarenkov K, Laffan S, Robertson T, McTavish EJ, Jeppesen TS, Waller J, Blissett M, Kõljalg U, Miller JT (2024). PhyloNext: A pipeline for phylogenetic diversity analysis of GBIF-mediated data. BMC Ecology and Evolution, 24(1), 76. [DOI:10.1186/s12862-024-02256-9](https://bmcecolevol.biomedcentral.com/articles/10.1186/s12862-024-02256-9)\n\nLaffan SW, Lubarsky E, Rosauer DF (2010) Biodiverse, a tool for the spatial analysis of biological and related diversity. Ecography, 33: 643-647. [DOI: 10.1111/j.1600-0587.2010.06237.x](https://onlinelibrary.wiley.com/doi/10.1111/j.1600-0587.2010.06237.x)\n\nAn extensive list of references for the tools used by the pipeline can be found in the [Citations](https://phylonext.github.io/citations/) section in the documentation.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvmikk%2Fphylonext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvmikk%2Fphylonext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvmikk%2Fphylonext/lists"}