Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kehrlab/bcctools
Correcting barcodes in 10X linked-read sequencing data.
https://github.com/kehrlab/bcctools
Last synced: 3 months ago
JSON representation
Correcting barcodes in 10X linked-read sequencing data.
- Host: GitHub
- URL: https://github.com/kehrlab/bcctools
- Owner: kehrlab
- License: gpl-3.0
- Created: 2018-09-06T16:31:02.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-04-12T10:50:51.000Z (10 months ago)
- Last Synced: 2024-07-31T20:28:38.837Z (6 months ago)
- Language: C++
- Size: 55.7 KB
- Stars: 4
- Watchers: 3
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-linked-reads - bcctools - read sequencing data|![GitHub last commit](https://img.shields.io/github/last-commit/kehrlab/bcctools?label=%20) (Tools)
README
bcctools
=======A toolbox for correcting barcodes in 10X linked-read sequencing data.
Prerequisites
-------------* GCC version >= 4.9 (supports C++14)
* SeqAn core library, version 2.3.1? (https://github.com/seqan/seqan)
* SDSL - Succinct Data Structure Lirbrary (https://github.com/simongog/sdsl-lite)
* kseq.h from HTSlib (https://github.com/samtools/htslib)Installation
------------1. Download the Seqan core library. You do not need to follow the SeqAn install instructions. You only need the directory .../include/seqan with all its content (the SeqAn core library).
2. Download and install the SDSL.
3. Download HTSlib or just put the kseq.h header file into a folder named htslib.
3. Edit lines 14-17 in the Makefile to point to the directories of SeqAn, SDSL and HTSlib.
4. Run 'make' in the bcctools directory.If everything is setup correctly, this will create the binary 'bcctools'.
Usage
-----The only input needed for barcode correction is a pair of barcoded FASTQ files generated on the 10X Chromium platform.
Optionally, you can specify a barcode whitelist file.The program consists of several commands, which are listed when running
./bcctools --help
For a short description of each command and an overview of arguments and options, you can run
./bcctools --help
If you need the output to be sorted and/or converted to SAM, BAM, or (gzipped) FASTQ format, you can run the provided bash script. For a short description of options and arguments of this script run
./scripts/run_bcctools -h
### The whitelist command
./bcctools whitelist [OPTIONS]
Creates a barcode whitelist based on barcode occurence in the data.
Creating a whitelist from your data is recommended (rather than using the 10X whitelist) to reduce the number of alternatives during correction and prevents false corrections.### The index command
./bcctools index [OPTIONS]
Creates a barcode index from the given barcode whitelist and writes it to disk. This command is optional as the index can be created on the fly in the 'correct' command.
### The correct command
./bcctools correct [OPTIONS]
Corrects barcodes of the given barcoded read pair data using the specified barcode whitelist. A barcode index is computed on the fly unless index files are present for the specified barcode whitelist. The output is a tab-separated file holding one read pair per line as decribed below.
### The stats command
./bcctools stats [OPTIONS]
./bcctools stats [OPTIONS]
./bcctools stats [OPTIONS]Computes the number of read pairs with whitelisted, corrected and unrecognized barcodes, a barcode occurrence histogram and counts quality values of corrected barcode positions.
Example
-------mkdir bcctools_example && cd bcctools_example/
ln -s /path/to/first.fq.gz
ln -s /path/to/second.fq.gz./bcctools whitelist -o whitelist.txt first.fq.gz
./bcctools correct whitelist.txt first.fq.gz second.fq.gz > corrected.tsvUsing the bash script to create a BAM file sorted by the corrected barcode sequence:
./script/run_bcctools -f bam first.fq.gz second.fq.gz
Output format
-------------The output format of the correct command is a simple tab-separated format, where each read pair and its barcode information is given on a single line.
The fields are as follows:Field | Description
--- | ---
READ NAME | The read or query name taken from the FASTQ file and cropped at the first whitespace.
CORRECTED BARCODE | A comma separated list of possible barcode corrections. If the raw barcode is whitelisted, the value of this field is identical to the RAW BARCODE field. An asterisk '*' indicates that the barcode is not whitelisted and correction was unsuccessful.
RAW BARCODE | The first 16 base pairs of the first read in the read pair.
7-MER SPACER | The seven base pairs following the first 16 base pairs of the first read in the read pair.
TRIMMED FIRST READ | The remaining base pairs of the first read in the read pair after trimming the barcode and 7-mer spacer sequence.
SECOND READ | The second read sequence.
BARCODE QUALITY STRING | The first 16 values of the quality string of the first read in the read pair.
7-MER SPACER QUALITY STRING | The seven values following the first 16 values of the quality string of the first read in the read pair.
TRIMMED FIRST READ QUALITY STRING | The remaining quality string after trimming the barcode and 7-mer spacer quality values.
SECOND READ QUALITY STRING | The quality string of the second read in the read pair.Contact
-------For questions and comments contact birte.kehr [at] ukr.de or create an issue.