Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/morispi/LRez

Standalone tool and library allowing to work with barcoded linked-reads
https://github.com/morispi/LRez

10x 10x-genomics 10xgenomics barcode barcodes bioinformatics haplotagging index linked linked-reads reads stlfr tell-seq

Last synced: about 2 months ago
JSON representation

Standalone tool and library allowing to work with barcoded linked-reads

Lists

README

        

# LRez

LRez provides a standalone tool allowing to work with barcoded linked-reads such as 10X Genomics data, as well as library allowing to easily use it in other projects.

Presently, it is directly compatible with the following linked-reads technologies, given the barcodes are reported using the `BX:Z` tag (if this is not the case, pre-processing scripts are given in the [utils/](utils/) directory):
- 10x Genomics
- Haplotagging
- stLFR
- TELL-Seq

LRez has different functionalities such as comparing regions pairs or contigs extremities to retrieve their common barcodes and extracting barcodes from given regions
of a BAM file, as well as indexing and querying both BAM and FASTQ files to quickly retrieve reads or alignments sharing a given barcode or list of barcodes.
In can thus be used in different applications, such as variant calling or scaffolding.

Requirements
--------------

- A Unix based operating system.
- g++, minimum version 5.5.0.
- CMake, minimum version 2.8.2.
- zlib, minimum version 1.2.11.

Installation from source
--------------

Clone the LRez repository, along with its submodules with:

```bash
git clone --recursive https://github.com/morispi/LRez
```

Then run the install.sh script:

```bash
./install.sh
```

The installation script will build dependencies, the binary standalone in the `bin` folder, as well as the library, allowing to use LRez in other projects, in the `lib` folder.

Installation from conda
--------------

Alternatively, LRez is also distributed as a bioconda package, which can be installed with:

```bash
conda install -c bioconda lrez
```

Using the toolkit
--------------

### Usage

`LRez [SUBCOMMAND]`

where [SUBCOMMAND] can be one of the following:

- compare: Compute the number of common barcodes between pairs of regions, or between pairs of contigs' extremities
- extract: Extract the barcodes from a given region of a BAM file
- stats: Retrieve general stats from a BAM file
- index bam: Index the offsets or occurrences positions of the barcodes contained in a BAM file
- query bam: Query the barcodes index to retrieve alignments in a BAM file, given a barcode or list of barcodes
- index fastq: Index the offsets of the barcodes contained in a fastq file
- query fastq: Query the barcodes index to retrieve alignments in a fastq file, given a barcode or list of barcodes

### Subcommands

A description of each subcommand as well as its options is given below.

#### Compare

`LRez compare` allows to compute the number of common barcodes between all possibles pairs of a given list of regions, or between a given contig's extremities and all other contigs' extremities.

--bam STRING, -b STRING: BAM file containing the alignments
--index STRING, -i SRING: Barcodes offsets index built with the index bam subcommand
--region STRING, -r STRING: File containing regions of interest in format chromosome:startPosition-endPosition
--contig STRING, -c STRING: Contig of interest
--contigs STRING, -c STRING: File containing a list of contigs of interest
--size INT, -s INT: Size of contigs' extremities to consider (optional, default: 1000)
--output STRING, -o STRING: File where to output the results (optional, default: stdout)
--threads INT, -t INT: Number of threads to use (optional, default: 1)

#### Extract

`LRez extract` allows to extract the list of barcodes in a given region of a BAM file.

--bam STRING, -b STRING: BAM file to extract barcodes from
--region STRING, -r STRING: Region of interest in format chromosome:startPosition-endPosition
--all, -a: Extract all barcodes
--output STRING, -o STRING: File where to output the extracted barcodes (optional, default: stdout)
--duplicates, -d: Include duplicate barcodes (optional, default: false)
--threads INT, -t INT: Number of threads to use (optional, default: 1)

#### Stats

`LRez stats` allows to retrieve general stats from the BAM file.

--bam STRING, -b STRING: BAM file to extract barcodes from
--regions INT, -r INT: Number of regions to consider to define stats (optional, default: 1000)
--size INT, -s INT: Size of the regions to consider (optional, default: 1000)
--output STRING, -o STRING: File where to output the extracted barcodes (optional, default: stdout)
--threads INT, -t INT: Number of threads to use (optional, default: 1)

#### Index BAM

`LRez index bam` allows to index the offsets or occurrences positions of the barcodes contained in a BAM file.

--bam STRING, -b STRING: BAM file to index
--output STRING, -o STRING: File where to store the index
--offsets, -f: Index the offsets of the barcodes in the BAM file
--positions, -p: Index the (chromosome, begPosition) occurrences positions of the barcodes
--primary, -r: Only index barcodes that appear in a primary alignment (optional, default: false)
--quality INT, -q INT: Only index barcodes that appear in an alignment of quality higher than this number (optional, default: 0)
--threads INT, -t INT: Number of threads to use (optional, default: 1)

#### Query BAM

`LRez query bam` allows to query a barcodes index and a BAM file to retrieve alignments containing the query barcodes.

--bam STRING, -b STRING: BAM file to search
--index STRING, -i STRING: Barcodes offsets index, built with the index bam subcommand, using the -f option.
---query STRING, -q STRING: Query barcode to search in the BAM / index
--list STRING, -l STRING: File containing a list of barcodes to search in the BAM / index
--output STRING, -o STRING: File where to output the extracted alignments (optional, default: stdout)
--threads INT, -t INT: Number of threads to use (optional, default: 1)

#### Index fastq

`LRez index fastq` allows to index the offsets of the barcodes contained in a fastq file.

--fastq STRING, -f STRING: Fastq file to index
--output STRING, -o STRING: File where to store the index
--gzip, -g: Fastq file is gzipped (optional, default: false)
--threads INT, -t INT: Number of threads to use (optional, default: 1)

#### Query fastq

`LRez query fastq` allows to query a barcodes index and a fastq file to retrieve alignments containing the query barcodes.

--fastq STRING, -f STRING: Fastq file to search
--index STRING, -i STRING: Barcodes index, built with the index fastq subcommand
--query STRING, -q STRING: Query barcode to search in the fastq file and the index
--list STRING, -l STRING: File containing a list of barcodes to search in the fastq file and the index
--collectionOfLists STRING, -c STRING: File of files (FOF) e.g. file containing files' names of lists of barcodes to search in the fastq file and the index
--output STRING, -o STRING: File where to output the extracted reads (optional, default: stdout)
--gzip, -g: Fastq file is gzipped (optional, default: false)
--threads INT, -t INT: Number of threads to use (optional, default: 1)

Using the API
--------------

Complete documentation of the different API functions is provided at https://morispi.github.io/LRez/files.html. Additionnal information and usage examples are provided on the Wiki page https://github.com/morispi/LRez/wiki.

Notes
--------------

LRez has been developed and tested on x86-64 GNU/Linux.
Support for any other platform has not been tested.

Authors
--------------

Pierre Morisse, Claire Lemaitre and Fabrice Legeai.

Reference
--------------

Pierre Morisse, Claire Lemaitre, Fabrice Legeai. LRez: C++ API and toolkit for analyzing and managing Linked-Reads data. Bioinformatics Advances, vbab022, https://doi.org/10.1093/bioadv/vbab022

Contact
--------------

You can report problems and bugs as issues on this repository : https://github.com/morispi/LRez/issues