https://github.com/dridk/mucobiome

A simple 16S RNA pipeline
https://github.com/dridk/mucobiome

Last synced: about 1 month ago
JSON representation

A simple 16S RNA pipeline

Host: GitHub
URL: https://github.com/dridk/mucobiome
Owner: dridk
Created: 2016-02-24T22:38:16.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2019-01-13T20:05:45.000Z (over 6 years ago)
Last Synced: 2025-01-31T17:52:33.637Z (3 months ago)
Language: HTML
Homepage:
Size: 1.19 MB
Stars: 3
Watchers: 5
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Mucobiome ![GitHub Logo](https://img.shields.io/badge/snakemake-≥3.5.2-brightgreen.svg?style=flat-square)

Mucobiome is a simple pipeline which aims to analyse 16S RNA genomics data from high throughput sequencer.
Neither QIIME nor MOTHUR are required. Few dependency like vsearch are necessary and are listed bellow.
Mucobiome use [snakemake](https://bitbucket.org/johanneskoester/snakemake/wiki/Home) as the backbone. This tools make it possible to run
the pipeline optimally using multithreading.

## How it works ? **Fastq paired files** -> **biom**

the pipeline takes several pair-end fastq as input ( one per sample) and generate one biom file which contains OTU table with taxonomy and sample
meta data.
- For each reads pairs :
- Merge fastq file using vsearch or flash
- Clean fastq file using sickle
- Reverse reads with seqtk
- Trim adaptators from reads using cutadapts
- Dereplicate reads with vsearch
- merge all reads pairs into one fasta file
- With Greengene database
- Extract interesting region from greengene 16S database using cutadapts and user's adaptator
- Make a taxonomy assignement using vsearch --usearch_global
- Compare sequence from the merged file and greengene database
- Create a Biom file
- Add taxonomy and sample metadata into the biom file

## Installation
### Python depedencies
Mucobiome has been written with Python 3.4.

pip install -r requirements.txt

### Install vsearch
```
wget https://github.com/torognes/vsearch/archive/v2.3.4.tar.gz
tar xzf v2.3.4.tar.gz
cd vsearch-2.3.4
./autogen.sh
./configure
make
make install # as root or sudo make install
```

### Install seqtk
```
git clone https://github.com/lh3/seqtk.git;
cd seqtk;
make
sudo make install
```

### Install sickle
```
https://github.com/najoshi/sickle.git
cd sickle
make
sudo make install
```

## Download Database
Mucobiome works with greengene. But you can use another database if you respect the same format.
Run download_greengene.sh from the database folder to download greengene data.

cd database; sh download_greengene.sh

## usage
### Test your installation
The actual repositories contains a simple dataset. Try the following commands which do nothing but display all pipeline command.

```
# you are in the main directory
snakemake -d working_directory -np --configfile config.yaml
```

OptionsDescription
-dThe working directory. All generated files will be drop here
-nDon't execute any commands.
-pPrint commands
--configfileTell which config file to use

### Input file
input data are paired fastq.gz files. You must put all your datas into the data/raw folder. Both paired files must respect the following syntax.
*{Sample}* is your samplename. Do not use "." character in sample name. use alpha numeric only .

{SAMPLE}_1.fastq.gz
{SAMPLE}_2.fastq.gz

### config.yaml
This file contains all parameters required to perform an analysis.

OptionsDescription
raw_folderthe directory which contains fastq input files
primer_forwardThe forward primer. By defaut primers select the V3-V5 region
reverse_primerThe reverse primer. By defaut primers select the V3-V5 region
database_fastaThis is a fasta file which contains complete 16S sequence. By default it use greengene
database_taxonomyThis is a two columns file. Fasta header ID from database_fasta and the taxonomy
sample_dataSample meta data
thresholdTaxonomy assignement threshold. By default 97%
qualityRemove all reads bellow this threshold. You should set this value to 20
min_lenMinimum reads length accepted
max_lenMaximum reads length accepted
merge_toolUse Flash or vsearch to perform merging. By default it use vsearch

### sample_data.tsv
This files contains metadata for each sample. Fill it with your own data.

### Run your experiment
When all requirements are done, you can run one of the following commands

# Run your pipeline using 60 threads
snakemake -d working_directory -p --configfile config.yaml --cores 60

# Force the pipeline to rebuild the last step
snakemake -d working_directory -fp --configfile config.yaml --cores 60

# Force the pipeline to rebuild everything
snakemake -d working_directory -Fp --configfile config.yaml --cores 60

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dridk/mucobiome

Awesome Lists containing this project

README