Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/BUStools/bustools
Tools for working with BUS files
https://github.com/BUStools/bustools
Last synced: 2 months ago
JSON representation
Tools for working with BUS files
- Host: GitHub
- URL: https://github.com/BUStools/bustools
- Owner: BUStools
- License: bsd-2-clause
- Created: 2018-11-12T17:27:22.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-10-21T05:30:54.000Z (3 months ago)
- Last Synced: 2024-10-21T08:07:21.690Z (3 months ago)
- Language: C
- Homepage: https://bustools.github.io/
- Size: 3.44 MB
- Stars: 91
- Watchers: 13
- Forks: 23
- Open Issues: 30
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-single-cell - bustools - [C++] - A suite of tools for manipulating BUS files for single cell RNA-Seq pre-processing. bustools can be used to error correct barcodes, collapse UMIs, produce gene count or transcript compatibility count matrices, and is useful for many other tasks. (Software packages / RNA-seq)
README
# bustools
__bustools__ is a program for manipulating [__BUS__](https://github.com/BUStools/BUS) files for single cell
RNA-Seq datasets. It can be used to error correct barcodes, collapse UMIs, produce gene count or transcript compatibility count matrices, and is useful for many other tasks. See the [__kallisto | bustools website__](https://www.kallistobus.tools/) for examples and instructions on how to use __bustools__ as part of a single-cell RNA-seq workflow.If you use __bustools__ please cite
Melsted, Páll, Booeshaghi, A. Sina et al. [Modular, efficient and constant-memory single-cell RNA-seq preprocessing.](https://doi.org/10.1038/s41587-021-00870-2) Nature Biotechnology, 2021.
For some background on the design and motivation for the __BUS__ format and __bustools__ see
Melsted, Páll, Ntranos, Vasilis and Pachter, Lior [The Barcode, UMI, Set format and BUStools](https://doi.org/10.1093/bioinformatics/btz279), Bioinformatics, 2019.
## BUS format
__bustools__ works with __BUS__ files which can be generated efficiently from raw sequencing data, e.g. using [__kallisto__](http://pachterlab.github.io/kallisto).
## Installation
Binaries for Mac, Linux, Windows, and Rock64 can be downloaded from the [__bustools__ website](https://bustools.github.io/download). Binary installation time is less than two minutes.
To compile bustools download the source code with
`git clone https://github.com/BUStools/bustools.git`
Navigate to the bustools directory
`cd bustools`
Make a build directory and move there:
`mkdir build`
`cd build`
Run cmake:
`cmake ..`
Build the code:
`make`
The bustools executable will be located in build/src. To install bustools into the cmake install prefix path type:
`make install`
## Usage
To see a list of available commands, type `bustools` in the terminal
~~~
> bustools
Usage: bustools [arguments] ..Where can be one of:
sort Sort a BUS file by barcodes and UMIs
correct Error correct a BUS file
count Generate count matrices from a sorted BUS file
inspect Produce a report summarizing a sorted BUS file
allowlist Generate an on-list from a sorted BUS file
capture Capture records from a BUS file
text Convert a binary BUS file to a tab-delimited text file
fromtext Convert tab-delimited text file to a binary BUS file
extract Extracts reads from input FASTQ files based on BUS file
umicorrect Use a UMI correction algorithm
compress Compress a sorted BUS file
decompress Decompress a compressed BUS fileRunning bustools without arguments prints usage information for
~~~### sort
Raw BUS output from pseudoalignment programs may be unsorted. To simply and accelerate downstream processing BUS files can be sorted using `bustools sort`
~~~
> bustools sort
Usage: bustools sort [options] bus-filesOptions:
-t, --threads Number of threads to use
-m, --memory Maximum memory used
-T, --temp Location and prefix for temporary files
required if using -p, otherwise defaults to output
-o, --output File for sorted output
-p, --pipe Write to standard output
~~~This will create a new BUS file where the BUS records are sorted by barcode first, UMI second, and equivalence class third.
### correct
BUS files can be barcode error corrected with respect to a technology-specific "on list" of barcodes using `bustools correct`.~~~
> bustools correct
Usage: bustools correct [options] bus-filesOptions:
-o, --output File for corrected bus output
-w, --onlist File of on-listed barcodes to correct to
-p, --pipe Write to standard output
~~~### count
BUS files can be converted into a barcode-feature matrix, where the feature can be TCCs (Transcript Compatibility Counts) or genes using `bustools count`.~~~
> bustools count
Usage: bustools count [options] bus-filesOptions:
-o, --output File for corrected bus output
-g, --genemap File for mapping transcripts to genes
-e, --ecmap File for mapping equivalence classes to transcripts
-t, --txnames File with names of transcripts
--genecounts Aggregate counts to genes only
--cm Count multiplicites instead of UMIs
~~~### inspect
A report summarizing the contents of a sorted BUS file can be output either to standard out or to a JSON file for further analysis using `bustools inspect`.~~~
> bustools inspect
Usage: bustools inspect [options] sorted-bus-fileOptions:
-o, --output File for JSON output (optional)
-e, --ecmap File for mapping equivalence classes to transcripts
-w, --onlist File of on-listed barcodes to correct to
-p, --pipe Write to standard output
~~~`--ecmap` and `--onlist` are optional parameters; `bustools inspect` is much faster without them, especially without the former.
Sample output (to stdout):
~~~
Read in 3148815 BUS records
Total number of reads: 3431849Number of distinct barcodes: 162360
Median number of reads per barcode: 1.000000
Mean number of reads per barcode: 21.137281Number of distinct UMIs: 966593
Number of distinct barcode-UMI pairs: 3062719
Median number of UMIs per barcode: 1.000000
Mean number of UMIs per barcode: 18.863753Estimated number of new records at 2x sequencing depth: 2719327
Number of distinct targets detected: 70492
Median number of targets per set: 2.000000
Mean number of targets per set: 3.091267Number of reads with singleton target: 1233940
Estimated number of new targets at 2x seuqencing depth: 6168
Number of barcodes in agreement with on-list: 92889 (57.211752%)
Number of reads with barcode in agreement with on-list: 3281671 (95.623992%)
~~~### allowlist
`bustools allowlist` generates an on-list based on the barcodes in a sorted BUS file.~~~
Usage: bustools allowlist [options] sorted-bus-fileOptions:
-o, --output File for the on-list
-f, --threshold Minimum number of times a barcode must appear to be included in on-list
~~~`--threshold` is a (highly) optional parameter. If not provided, `bustools allowlist` will determine a threshold based on the first 200 to 100,200 records.
### capture
`bustools capture` can separate BUS files into multiple files according to the capture criteria.~~~
Usage: bustools capture [options] bus-filesOptions:
-o, --output Directory for output
-c, --capture List of transcripts to capture
-e, --ecmap File for mapping equivalence classes to transcripts
-t, --txnames File with names of transcripts
~~~### text
BUS files can be converted to a tab-separated format for easy inspection and processing using shell scripts or high level languages with `bustools text`.
~~~
> bustools text
Usage: bustools text [options] bus-filesOptions:
-o, --output File for text output
~~~### fromtext
Converts a plaintext tab-separated representation of a BUS file to a binary BUS file.
~~~
> bustools text
Usage: bustools text [options] bus-filesOptions:
-o, --output File for text output
~~~### compress
Compresses a sorted BUS file.
~~~
> bustools compress
Usage: bustools compress [options] sorted-bus-file
Note: BUS file should be sorted by barcode-umi-ecOptions:
-N, --chunk-size Number of rows to compress as a single block
-o, --output File to write compressed output
-p, --pipe Write to standard output.
~~~Reference for bustools compression:
Einarsson, P and Melsted, Páll [BUSZ: compressed BUS files](https://doi.org/10.1093/bioinformatics/btad295), Bioinformatics, 2023.
### decompress
Decompresses (inflates) a compressed BUS file.
~~~
> bustools compress
Usage: bustools compress [options] sorted-bus-file
Note: BUS file should be sorted by barcode-umi-ecOptions:
-o, --output File to write decompressed output
-p, --pipe Write to standard output.
~~~