https://github.com/ssi-dk/bps_fbi_scripts_clostyper

Tool to find mge (phages, plasmids, and transposons) in cdiff. Scripts that allow Clostyper to be run in slurm.
https://github.com/ssi-dk/bps_fbi_scripts_clostyper

cdiff clostridioides-difficile clostyper mge phages plasmids transposons

Last synced: 3 months ago
JSON representation

Tool to find mge (phages, plasmids, and transposons) in cdiff. Scripts that allow Clostyper to be run in slurm.

Host: GitHub
URL: https://github.com/ssi-dk/bps_fbi_scripts_clostyper
Owner: ssi-dk
License: mit
Created: 2023-07-05T13:50:54.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2023-07-31T09:43:01.000Z (almost 2 years ago)
Last Synced: 2025-01-11T09:44:35.585Z (4 months ago)
Topics: cdiff, clostridioides-difficile, clostyper, mge, phages, plasmids, transposons
Language: Python
Homepage:
Size: 15.9 MB
Stars: 0
Watchers: 4
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# ClosTyper

:warning:__*Please note that clostyper is still in an early stage of development. There may be substantial changes as many features are still being added*__

ClosTyper is a pipeline tool designed to automate characterization and genotyping of selected _Clostrdia_ species using the whole genome sequencing data
ClosTyper is written in Snakemake that allows reproducibility and scalability of the intergrated workflow.

✋ **This tool is under active development** ✋
You may think of use WGSBAC (https://gitlab.com/FLI_Bioinfo/WGSBAC) that combine different generalized modules for bacterial characterization based on WGS data

## Requirements
#### Software requirements
before start, you need to make sure that the follwoing software are installed in your system and that they are available in your $PATH
1. `any2fasta`: Convert various sequence formats to FASTA (https://github.com/tseemann/any2fasta)
- Installation instructions: https://github.com/tseemann/any2fasta#github
- Required version: 0.4.2 ro later

2. `snakemake`: a workflow management system that creates reproducible and scalable data analyses. Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment (https://snakemake.github.io/)
- Installation instructions: https://snakemake.readthedocs.io/en/stable/getting_started/installation.html
- Required version: 6.15.5 or later

3. `pigz`: A parallel implementation of gzip for modern multi-processor, multi-core machines (https://zlib.net/pigz/)
- Installation instructions: https://zlib.net/pigz/
- Required version: 2.3.4 or later

4. `conda`: package, dependency and environment management system (https://docs.conda.io/en/latest/)
- Installation instructions: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html

#### Databases requirements
1. (Mini)Kraken2 database: taxonomic profiling of sequencing data (https://ccb.jhu.edu/software/kraken2/)
- Download page: https://ccb.jhu.edu/software/kraken2/downloads.shtml

## Installation
#### Install clostyper from source
These instructions will install the latest version of `ClosTyper`:

```
git clone https://gitlab.com/FLI_Bioinfo/clostyper.git
ln -s `pwd`/clostyper/bin/clostyper /usr/local/bin/ # choose a folder in your $PATH

```
This should be the directory structure of clostyper

## Usage
### First execution

On first execution of clostyper with `clostyper -h`, a config file will be automatically created under the folder .
This file should include the paths to databases and schemes to be used with `clostyper`.
✍ Important: at least the full path to the Kraken2 database must be provided.

## Basic usage [with test data]

#### download test data:
```
wget -O test_data_clostyper.tar.gz https://zenodo.org/record/6656045/files/test_data_clostyper.tar.gz?download=1
tar xzvf test_data_clostyper.tar.gz
```
### Run clostyper
✍ Note: the full path to the Kraken2 database should have been specified in the config file . If not, you must use `--kraken2db `
* **To check raw data quality**
This will only execute the fastp & kraken2
Basic call: `clostyper --check_quality -d [FASTQ_DIRECTORY] -r [REFERENCE] [-o WORKING_DIRECTORY] `
- With the test data
```
clostyper --check_quality -d test_data/input_data/ -r test_data/input_data/ref.gbk -o results_dir -A -T
```

* **To configure and execute the full pipeline**
✍ WARNING: Many features of clostyper are still experimental. Please report issues in the issue tracker (https://gitlab.com/FLI_Bioinfo/clostyper/-/issues)
This will execute the full pipeline
Basic call: `clostyper -d FASTQ/FASTA_DIRECTORY -r REFERENCE [-o WORKING_DIRECTORY] [-s SPECIES]`
- With the test data
```
clostyper -d test_data/input_data/ -r test_data/input_data/ref.gbk -o results_dir -A -T
```
* **To search _C. difficile_ for mobile elements**
To ONLY search your genomes for Clostridium difficile mobile elements, invoke the flag `--run_only_species_wf` together with flag `-s`
- Example with the test dataset
```
clostyper -d test_data/input_data/ -r test_data/input_data/ref.gbk -o results_dir -s cdifficile --run_only_species_wf -A -T
```

### Full usage options
```
> clostyper -h
ClosTyper: Clostridia characterization and typing pipeline

Version: 0.1-beta
USAGE:
---------------------
clostyper
clostyper
clostyper
---------------------
INPUT:
-d, --fastx-directory

-t, --sample_table

-r, --reference
-s, --species

OUTPUT:
-o, --output-directory
-w, --overwrite
-q, --quiet
WORKFLOW:
-Q, --check_quality
--run_cgmlst
--run_only_species_wf
--select_reference
--snp_pipeline

--kraken2db
--disable_pangenome
--disable_report
OTHERS:
-A, --autorun
-T, --threads
--check_dep
--no-color
HELP:
-h, --help
--help_all
--version
--citation (available at: https://gitlab.com/FLI_Bioinfo/ClosTyper) --check_quality -d FASTQ_DIRECTORY [-o WORKING_DIRECTORY] -d FASTQ/FASTA_DIRECTORY -r REFERENCE [-o WORKING_DIRECTORY] [-s SPECIES] [--run_cgmlst] -t SAMPLE_TABLE -r REFERENCE [-o WORKING_DIRECTORY] [-s SPECIES] [--run_cgmlst] DIR, a directory where fastq reads or assembled genomes are present. Required unless -t flag was used Format: [ID]_{1,2}.fastq{.gz} [ID]_S*_R{1,2}_001.fastq{.gz} [ID]_R{1,2}.fastq{.gz} OR [ID].{fasta,fna,fa} FILE (tab delimited), a four-columns based table. See an example in the documentation! Required unless the -d flag was used. If -t and -d flags were activated, -d will be ignored FILE, Reference genome. Format: {ID}.{gbk,fasta,gff,embl} (required) Run species-specific workflow (default: False; run only the general workflow) Currently supported Clostridia species are: cdifficile DIR, output directory for the snakemake results (default: output_dir_[timestamp]/) Overwrite an existing directory with the results. Useful to append results to previous runs Suppress clostyper messages. Report only warnings, errors and the snakemake call Quickly perform quality assurance on Illumina data [recommended before doing analysis] [EXPERIMENTAL] Do cgMLST analyis using the chewiesnake pipeline [EXPERIMENTAL] Execute ONLY the specified species-specific workflow. Require '-s' [EXPERIMENTAL] Select appropriate reference for the SNP anlysis of the dataset [EXPERIMENTAL] Select which pipeline to call SNPs [EXPERIMENTAL] Supported SNP pipelines are: snippy, reddog, nasp, cfsanpipeline Path to (Mini)kraken2 DB Disable the pangenome analyis [EXPERIMENTAL] Do not make the html report [EXPERIMENTAL] Automatically run snakemake workflow after configuration (default: False) Number of threads to use (default: 16) Check if dependencies are ok and then exit Do not use a colored output (default: False) Show this help and exit Show extended help for all software settings options [EXPERIMENTAL] Show clostyper's version number and exit Show clostyper's citation and exit

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ssi-dk/bps_fbi_scripts_clostyper

Awesome Lists containing this project

README