https://github.com/seandavi/curatedmetagenomicsnextflow

Curated Metagenomics Data Nextflow workflows
https://github.com/seandavi/curatedmetagenomicsnextflow

bioinformatics metagenomics nextflow r01ca230551

Last synced: 6 months ago
JSON representation

Curated Metagenomics Data Nextflow workflows

Host: GitHub
URL: https://github.com/seandavi/curatedmetagenomicsnextflow
Owner: seandavi
Created: 2020-06-23T16:28:03.000Z (over 5 years ago)
Default Branch: main
Last Pushed: 2025-03-21T21:33:07.000Z (7 months ago)
Last Synced: 2025-03-21T22:27:47.741Z (7 months ago)
Topics: bioinformatics, metagenomics, nextflow, r01ca230551
Language: Nextflow
Homepage:
Size: 93.8 KB
Stars: 7
Watchers: 6
Forks: 6
Open Issues: 9
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Curated Metagenomics NextFlow Pipeline

A NextFlow pipeline for processing metagenomics data, implementing the curatedMetagenomics workflow.

## Overview

This pipeline processes raw sequencing data through multiple steps:
1. FASTQ extraction with `fasterq-dump`
2. Quality control with `KneadData`
3. Taxonomic profiling with `MetaPhlAn`
4. Functional profiling with `HUMAnN` (optional)

## Usage

Basic usage:

```bash
nextflow run main.nf --metadata_tsv samples.tsv
```

With specific parameters:

```bash
nextflow run main.nf --metadata_tsv samples.tsv --skip_humann --publish_dir results
```

## Parameters

### General Pipeline Parameters

| Parameter | Description | Default |
| -------------- | -------------------------------------- | ------------- |
| `metadata_tsv` | Path to TSV file with sample metadata | `samples.tsv` |
| `publish_dir` | Directory to publish results | `results` |
| `store_dir` | Directory to store reference databases | `databases` |
| `cmgd_version` | Curated Metagenomic Data version | `4` |

### Process Control Parameters

| Parameter | Description | Default |
| ------------- | -------------------------------- | ------- |
| `skip_humann` | Skip HUMAnN functional profiling | `false` |

### MetaPhlAn Parameters

| Parameter | Description | Default |
| ----------------- | ---------------------- | -------- |
| `metaphlan_index` | MetaPhlAn index to use | `latest` |

### HUMAnN Parameters

| Parameter | Description | Default |
| ------------ | --------------------------- | ------------------ |
| `chocophlan` | ChocoPhlAn database version | `full` |
| `uniref` | UniRef database version | `uniref90_diamond` |

## Input Format

The `metadata_tsv` file should be a tab-separated values file with at least the following columns:
- `sample_id`: Unique sample identifier
- `NCBI_accession`: SRA accession number(s), separated by semicolons for multiple files

Example:
```
sample_id NCBI_accession
sample1 SRR1234567
sample2 SRR2345678;SRR2345679
```

## Output

Results will be organized by sample in the `publish_dir` directory:
```
results/
├── sample1/
│ ├── fasterq_dump/
│ ├── kneaddata/
│ ├── metaphlan_lists/
│ ├── metaphlan_markers/
│ ├── strainphlan_markers/
│ └── humann/
├── sample2/
│ └── ...
```

## Profiles

The pipeline comes with several execution profiles:
- `local`: For local execution
- `google`: For execution on Google Cloud Batch
- `anvil`: For execution on AnVIL
- `alpine`: For execution on Alpine HPC
- `unitn`: For execution on UNITN PBS Pro

Example:
```bash
nextflow run main.nf -profile google --metadata_tsv samples.tsv
```

## Dependencies

This pipeline requires:
- Nextflow 22.10.0 or later
- Container support (Docker, Singularity, etc.)
- AWS CLI (for data retrieval from SRA)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/seandavi/curatedmetagenomicsnextflow

Awesome Lists containing this project

README