https://github.com/vmikk/phylotwin-preprocessor
Species occurrence pre-processing pipeline
https://github.com/vmikk/phylotwin-preprocessor
Last synced: 2 months ago
JSON representation
Species occurrence pre-processing pipeline
- Host: GitHub
- URL: https://github.com/vmikk/phylotwin-preprocessor
- Owner: vmikk
- License: mit
- Created: 2024-10-01T07:53:13.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-02-12T09:30:57.000Z (3 months ago)
- Last Synced: 2025-02-12T10:33:40.026Z (3 months ago)
- Language: Shell
- Homepage:
- Size: 701 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# PhyloTwin-Preprocessor - GBIF Occurrence Data Processing Pipeline
A Nextflow pipeline for processing GBIF species occurrence data with spatial outlier detection and H3 grid binning.
The pipeline offers two processing modes:
- "Atomic" mode: Processes each species independently
- "Batched" mode: Processes multiple species in batches (optimized for HPC environments)# Dependencies
## Primary dependencies
These are the tools that the user must install and configure on their system to run the pipeline:
- [Nextflow](https://www.nextflow.io/) >= 24.10 (requires [Java](https://www.java.com/en/) >= 17 & <= 23)
- [Singularity](https://sylabs.io/)/[Apptainer](https://apptainer.org/) or [Docker](https://www.docker.com/)## Secondary dependencies
These are the containerized tools or packages required by the pipeline, which will be automatically handled within the containers:
- [DuckDB](https://duckdb.org/)
- [ELKI](https://elki-project.github.io/)
- [R](https://www.r-project.org/)
- [Python](https://www.python.org/)
- [GNU Parallel](https://www.gnu.org/software/parallel/)
- [aria2](https://aria2.github.io/)
- [jq](https://github.com/jqlang/jq)R packages:
- [data.table](https://rdatatable.gitlab.io/data.table/)
- [ape](https://github.com/emmanuelparadis/ape)