https://github.com/dridk/pacbio_rna_seq
https://github.com/dridk/pacbio_rna_seq
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/dridk/pacbio_rna_seq
- Owner: dridk
- Created: 2021-11-10T10:30:54.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-04-11T12:49:35.000Z (about 3 years ago)
- Last Synced: 2025-01-31T17:52:30.776Z (3 months ago)
- Language: Python
- Size: 132 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
This pipeline was created as part of the [GOLD project](https://aviesan.fr/fr/aviesan/accueil/menu-header/instituts-thematiques-multi-organismes/genetique-genomique-et-bioinformatique/programme-transversal-gold).
## Installation
#### Dependencies
* [python >= 3.9 ](https://www.python.org/downloads)
- seaborn
- pandas
- matplotlib
* [seqkit](https://bioinf.shenwei.me/seqkit/)
* [lima](https://lima.how/)
* [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
* [minimap2](https://lh3.github.io/minimap2/)
* [samtools](http://www.htslib.org/)
* [bedtools](https://bedtools.readthedocs.io/en/latest/)#### Install environment from conda
```bash
conda env create -n gold -f env.yaml
````## Usage
#### Clone the repository
```bash
git clone [email protected]:dridk/pacbio_rna_seq.git
```#### Edit config.yaml
- ```FASTQ``` The Fastq file path generated by PacBio Sequencing
- ```BARCODE``` The Fasta file path describing barcodes used by lima for demultiplexing ( see example in repository )
- ```PRIMERS``` The Fasta file describing primers used for PacBio amplicon sequencing ( see example in repository )
- ```REFERENCE``` The fasta reference file used by minimap2 for alignement ( e.g: hg19.fa )#### Run the pipeline
Put ```your_file.fastq``` generated by PacBio in the same folder than *config.yaml* and run the following command.
You can edit how many threads you want to use with ```--cores``` option.```
snakemake -Fp --cores 10 --configfile config.yaml
```## Output
The pipeline will generate one file per barcode and amplicon.
For instance HBB.bc1022.bam contains aligned reads from HBB amplicon and bc1022 barcode identifer.- ```debarcoding.{barcode}--{barcode}.fastq``` : Demultiplexed reads
- ```{amplicon}.{barcode}.fastq``` : Transcripts reads
- ```{amplicon}.{barcode}.bam``` : Aligned transcripts Reads
- ```{amplicon}.{barcode}.bed``` : Transcripts structures as a bed file
- ```{amplicon}.{barcode}.hash.bed``` : Transcripts structures as a bed file with a unique ID to identify the transcript
- ```{amplicon}.{barcode}.hash.png``` : Distribution plot of transcripts
- ```cluster.{amplicon}.png``` : Transcripts abundance heatmapFor instance, the following heatmap shows transcript abundances for each barcode.
Each transcript is identified by a hash number generated from the transcript structure bed file.
This make possible to identify transcripts among differents samples.