https://github.com/linsalrob/atavide
Atavistic processing of metagenomics data.
https://github.com/linsalrob/atavide
bioinformatics bioinformatics-pipeline metagenomics metagenomics-binning metagenomics-counts metagenomics-toolkit
Last synced: 2 months ago
JSON representation
Atavistic processing of metagenomics data.
- Host: GitHub
- URL: https://github.com/linsalrob/atavide
- Owner: linsalrob
- License: mit
- Created: 2021-09-07T09:37:40.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-11-22T03:15:11.000Z (11 months ago)
- Last Synced: 2025-04-08T21:32:21.214Z (6 months ago)
- Topics: bioinformatics, bioinformatics-pipeline, metagenomics, metagenomics-binning, metagenomics-counts, metagenomics-toolkit
- Language: Python
- Homepage:
- Size: 107 KB
- Stars: 2
- Watchers: 3
- Forks: 2
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGEME.md
- License: LICENSE
- Citation: CITATION.md
Awesome Lists containing this project
README
[](https://edwards.flinders.edu.au)
[](https://www.zenodo.org/badge/latestdoi/403921714)
[](https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.241.1)
[](https://opensource.org/licenses/MIT)
# atavide
`atavide` is a simple, yet complete workflow for metagenomics data analysis, including QC/QA, optional host removal, assembly and cross-assembly, and individual read based annotations. We have also built in some advanced analytics including tools to assign annotations from reads to contigs, and to generate metagenome-assembled genomes in several different ways, giving you the power to explore your data!
`atavide` is 100% snakemake and conda, so you only need to install the snakemake workflow, and then everything else will be installed with conda.
It is definitely a work in progress, but you can run it with the following command
```bash
snakemake --configfile config/atavide.yaml -s workflow/atavide.snakefile --profile slurm
```But you will need a [slurm profile](https://fame.flinders.edu.au/blog/2021/08/02/snakemake-profiles-updated) to make this work!
# Installation and getting going
1. Clone this repository from GitHub: `git clone https://github.com/linsalrob/atavide.git`
2. Set the location of the repository: `export ATAVIDE_DIR=$PWD/atavide/`
3. Install a few python dependencies. You probably already have most of these, but the one that trips up is `pysam`. We're working on getting `conda` configured properly to do this automatically. `pip install -r $ATAVIDE_DIR/requirements.txt`
4. Install the [appropriate super-focus database](https://github.com/metageni/SUPER-FOCUS/issues/66) [hint: probably version 2] and set the `SUPERFOCUS_DB` directory to [point to the location of those files](https://github.com/metageni/SUPER-FOCUS#database).
5. Copy the [NCBI taxonomy](https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/) (You really just need the [taxdump.tar.gz](https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz) file), and set the `NCBI_TAXONOMY` environment variable to point to the location of those files.
6. Have a directory of fastq files with both `_R1_` and `_R2_` files in a data directory: `$DATA_DIR/fastq`
7. Run atavide: `cd $DATA_DIR && snakemake --configfile $ATAVIDE_DIR/atavide.yaml -s $ATAVIDE_DIR/workflow/atavide.snakefile --profile slurm`# Current processing steps:
### Steps:
1. QC/QA with [prinseq++](https://github.com/Adrian-Cantu/PRINSEQ-plus-plus)
2. optional host removal using bowtie2 and samtools, [as described previously](https://edwards.flinders.edu.au/command-line-deconseq/). To enable this, you need to provide a path to the host db and a host db.### Metagenome assembly
1. pairwise assembly of each sample using [megahit](https://github.com/voutcn/megahit)
2. extraction of all reads that do not assemble using samtools flags
3. assembly of all unassembled reads using [megahit](https://github.com/voutcn/megahit)
4. compilation of _all_ contigs into a single unified set using [Flye](https://github.com/fenderglass/Flye)
5. comparison of reads -> contigs to generate coverage### MAG creation
1. [metabat](https://bitbucket.org/berkeleylab/metabat/src/master/)
2. [concoct](https://github.com/BinPro/CONCOCT)
3. Pairwise comparisons using [turbocor](https://github.com/dcjones/turbocor) followed by clustering### Read-based annotations
1. [Kraken2](https://ccb.jhu.edu/software/kraken2/)
2. [singlem](https://github.com/wwood/singlem)
3. [SUPER-focus](https://github.com/metageni/SUPER-FOCUS)
4. [FOCUS](https://github.com/metageni/FOCUS)Want something else added to the suite? File an issue on GitHub and we'll add it ASAP!