https://github.com/RolandFaure/QuickDeconvolution
A software providing reference-free barcode deconvolution for linked reads data
https://github.com/RolandFaure/QuickDeconvolution
bioinformatics linked-reads
Last synced: 9 months ago
JSON representation
A software providing reference-free barcode deconvolution for linked reads data
- Host: GitHub
- URL: https://github.com/RolandFaure/QuickDeconvolution
- Owner: RolandFaure
- License: gpl-3.0
- Created: 2021-04-08T08:37:27.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-07-02T14:33:47.000Z (over 1 year ago)
- Last Synced: 2024-07-31T20:30:45.431Z (over 1 year ago)
- Topics: bioinformatics, linked-reads
- Language: C++
- Homepage:
- Size: 7.93 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-linked-reads - QuickDeconvolution - reads experiments without a reference genome | (Tools)
README
# QuickDeconvolution
Quick and scalable software to deconvolve read clouds from linked-reads experiments without a reference genome. When several fragments of DNA have been sequenced with the same barcode, QuickDeconvolution provides the user with enhanced barcodes to distinguish the reads coming from the different fragments
## Installation
You can install QuickDeconvolution through Bioconda
```
conda install -c bioconda quickdeconvolution
```
Alternatively, QuickDeconvolution is quite straightforward to compile from source.
You will need make and cmake >= 2.8 to compile the sources. In the desired folder, run
```
git clone https://github.com/RolandFaure/QuickDeconvolution.git
cd QuickDeconvolution/
cmake ./
make
```
An executable named QuickDeconvolution should appear in the folder. A small test file "test.fastq", from a simulated sequencing experiment on a small synthetic genome, is provided in the folder "test_data" to test the program.
```
QuickDeconvolution -i test_data/test.fastq -o test_data/test_out.fastq
```
The program should run in less than a minute and output in test_out.fastq the reads, with barcode extensions (-1, -2,...). This is only intended as a test to see if QD is running: the deconvolution is expected to be very bad because the synthetic genome is very short (thus two long reads overlap with high probability).
## Usage
```bash
SYNOPSIS
./QuickDeconvolution -i [] -o [] [-k []] [-w
[]] [-d []] [-t []] [-a []]
OPTIONS
-k, --kmers-length
size of kmers [default:21]
-w, --window-size
size of window guaranteed to contain at least one minimizing kmer [default:40]
-d, --density
on average 1/2^d kmers are indexed [default:3]
-t, --threads
number of threads [default:1]
-a, --dropout
QD does not try to deconvolve clouds smaller than this value [default:0]
```
### Input
QuickDeconvolution takes as input `-i` a fasta or a fastq file containing barcoded reads with the tag `BX:Z` designating a barcode (this is the default output of [longranger basic](https://support.10xgenomics.com/genome-exome/software/pipelines/latest/advanced/other-pipelines)). For example
```
@read_456 cov:23.45 BX:Z:AAAACTGTAT
```
If the reads are paired, provide QuickDeconvolution with an interleaved file where the two ends of the pairs have the same name, it will recognize it. To interleave two files, you can use this command line:
```
paste -d '\n' <(awk '{if (NR%4==1){printf"\n";printf $0;} else{printf "((()))"$0;}}' reads_foward.fq) <(awk '{if (NR%4==1){printf"\n";printf $0;}else{ printf "((()))"$0;}}' reads_reverse.fq) | sed 's/((()))/\n/g' > sequencing_reads_interleaved.fastq
```
### Output
QuickDeconvolution outputs the fasta/q file given as input, with an additional tag (-0, -1, -2...) at the end of the line, so that the deconvolved reads look like
```
@read_456 cov:23.45 BX:Z:AAAACTGTAT-1
```
Within each barcode, reads having the same tag come from the same fragment. WARNING: the -0 tag is a special tag, indicating reads that could not be deconvolved by the program. If a tag is already present, QuickDeconvolution will nonetheless append a new tag at the end of the barcode:
```
@read_456 cov:23.45 BX:Z:AAAACTGTAT-1-3
```
### Options
Option -a is the dropout option: the program disregard all clouds containing fewer reads than this value. You may want to use the option if you know you'll need clouds of a certain size for your downstream analyses, in which case it might be a waste of time to deconvolve the smallest clouds.
Option -t is the number of threads to launch simultaneously on the program. Wall-clock time decreases and RAM usage increases with the number of threads.
Options k, w and d are parameters of the alignment within QuickDeconvolution. The deconvolution should not be very sensitive to these values.
k is the length of the k-mers. Avoid decreasing k below 15.
d is to monitor the density of sparse k-mers. On average 1/2^d k-mers will be sparse.
While choosing sparse k-mers, the program is ensured to choose at least 1 k-mer in a window of size w.
Decreasing w and/or d may in some cases increase precision at the expense of run-time. Keep w within the range [10,50] and d within range [1,5].
## License
QuickDeconvolution is distributed under the license GPL3
## Citation
QuickDeconvolution is published in *Bioinformatics advances*. You can cite using:
Faure, Roland, and Dominique Lavenier. “QuickDeconvolution: Fast and Scalable Deconvolution of Linked-Read Sequencing Data.” *Bioinformatics Advances*, September 26, 2022, vbac068. _https://doi.org/10.1093/bioadv/vbac068_.