https://github.com/ebedthan/sabreur
fast, reliable and handy demultiplexing tool for fastx files
https://github.com/ebedthan/sabreur
bioinformatics demultiplexing rust sequence-analysis
Last synced: 5 months ago
JSON representation
fast, reliable and handy demultiplexing tool for fastx files
- Host: GitHub
- URL: https://github.com/ebedthan/sabreur
- Owner: Ebedthan
- License: mit
- Created: 2021-04-13T19:55:32.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2024-04-02T15:57:49.000Z (almost 2 years ago)
- Last Synced: 2025-03-18T05:51:20.426Z (11 months ago)
- Topics: bioinformatics, demultiplexing, rust, sequence-analysis
- Language: Rust
- Homepage:
- Size: 60.7 MB
- Stars: 14
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# sabreur
## 🔎 About
**sabreur** is a command-line tool designed to **demultiplex barcoded sequencing reads** into separate files. It supports:
- **FASTA** and **FASTQ** formats
- **Compressed inputs and outputs**: `gzip`, `bzip2`, `xz`, and `zstd`
- **Paired-end** and **Single-end** reads
It uses a barcode file to match reads and dispatches each to the corresponding output. Reads with unknown barcodes go into a separate file.
Powered by [niffler](https://github.com/luizirber/niffler) for seamless compression support.
## 🚀 Usage
### ▶️ Paired-end mode
```bash
sabreur barcode.txt input_R1.fq.gz input_R2.fq.gz
```
### ▶️ Single-end mode
```bash
sabreur barcode.txt input.fq
```
sabreur automatically detects the format and compression. Just provide the inputs!
## ⚙️ Command-Line Options
```
USAGE:
sabreur [options] []
ARGS:
input barcode file
input forward fastx file
input reverse fastx file
OPTIONS:
-m, --mismatch maximum number of mismatches [default: 0]
-o, --out ouput directory [default: sabreur_out]
-f, --format output files compression format
-l, --level compression level [default: 1]
--force force reuse of output directory
-q, --quiet decrease program verbosity
-h, --help Print help information
-V, --version Print version information
```
## 📦 Installation
### Requirements
- [Rust](https://rust-lang.org) in stable channel
- libgz for gz file support
- liblzma for xz file support
- libbzip2 for bzip2 file support
- zstd for zstd file support
### 🛠️ From Source (via Cargo)
```bash
git clone https://github.com/Ebedthan/sabreur.git
cd sabreur
cargo install --path . --root ~/.cargo
sabreur --help
```
### 📁 Prebuilt Binaries
Download binaries for your platform from the [releases page](https://github.com/Ebedthan/sabreur/releases):
* macOS (Apple Silicon): [Download](https://github.com/Ebedthan/sabreur/releases/download/v0.7.0/sabreur-v0.7.0-aarch64-apple-darwin.tar.xz) • [Checksum](https://github.com/Ebedthan/sabreur/releases/download/v0.7.0/sabreur-v0.7.0-aarch64-apple-darwin.tar.xz.sh256)
* macOS (Intel): [Download](https://github.com/Ebedthan/sabreur/releases/download/v0.7.0/sabreur-v0.7.0-x86_64-apple-darwin.tar.xz) • [Checksum](https://github.com/Ebedthan/sabreur/releases/download/v0.7.0/sabreur-v0.7.0-x86_64-apple-darwin.tar.xz.sha256)
* Linux (x86\_64): [Download](https://github.com/Ebedthan/sabreur/releases/download/v0.7.0/sabreur-v0.7.0-x86_64-unknown-linux-gnu.tar.xz) • [Checksum](https://github.com/Ebedthan/sabreur/releases/download/v0.7.0/sabreur-v0.7.0-x86_64-unknown-linux-gnu.tar.xz.sha256)
* Windows (x86\_64): [Download](https://github.com/Ebedthan/sabreur/releases/download/v0.7.0/sabreur-v0.7.0-x86_64-pc-windows-msvc.zip) • [Checksum](https://github.com/Ebedthan/sabreur/releases/download/v0.7.0/sabreur-v0.7.0-x86_64-pc-windows-msvc.zip.sha256)
## Benchmark
Benchmarked with [hyperfine](https://github.com/sharkdp/hyperfine) [dataset](https://figshare.com/articles/dataset/Paired-end_fastq_files_for_demultiplexing/14701629).
| Tool | Single-end uncompressed output | Single-end compressed output | Paired-end uncompressed output | Paired-end compressed output |
| :--- | :----: | :----: | :----: | :----: |
| [idemp](https://github.com/yhwu/idemp) | - | 211.571 ± 3.718 | - | 366.247 ± 10.482 |
| [sabre](https://github.com/najoshi/sabre) | 32.911 ± 2.411 | - | 109.470 ± 49.909 | - |
| **sabreur** | 10.843 ± 0.531| 93.840 ± 0.446 | 40.878 ± 13.743 | 187.533 ± 0.572 |
### 🗜️ Compression format performance
A simple benchmark of the different compression format (`sabreur tests/bc_pe_fq.txt tests/input_R1.fastq.gz tests/input_R2.fastq.gz`), zst being the fastest.
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `--format zst` | 43.096 ± 1.547 | 41.179 | 46.878 | 1.00 |
| `--format bz2` | 94.049 ± 4.762 | 87.984 | 101.140 | 2.18 ± 0.14 |
| `--format gz` | 123.107 ± 1.748 | 120.529 | 125.166 | 2.86 ± 0.11 |
| `--format xz` | 285.692 ± 18.625 | 264.960 | 325.750 | 6.63 ± 0.49 |
## 📄 Barcode File Format
The barcode file must be tab-delimited in the format:
```
barcode1 barcode1_file1.fq barcode1_file2.fq
barcode2 barcode2_file1.fq barcode2_file2.fq
...
```
Output filenames must be unique. You can use .fq, .fastq, .fa, or .fasta as extensions.
### Minimum supported Rust version
`sabreur` minimum [Rust](https://www.rust-lang.org/) version is 1.78.0.
## 🤝 Contributing
- Contributions are welcome under the [Contributor Code of Conduct](https://github.com/Ebedthan/sabreur/blob/main/CODE_OF_CONDUCT.md).
- Please open an [issue or pull request on GitHub](https://github.com/Ebedthan/sabreur/issues).
## 🐛 Bugs & Support
Found a bug or have a feature request? → [Open an issue](https://github.com/Ebedthan/sabreur/issues).
## 📜 License
This project is licensed under the MIT License.