https://github.com/drbh/quemer

GPU accelerated k-mer counter
https://github.com/drbh/quemer

biology cuda gpu

Last synced: about 1 year ago
JSON representation

GPU accelerated k-mer counter

Host: GitHub
URL: https://github.com/drbh/quemer
Owner: drbh
Created: 2025-04-26T20:40:52.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-04-26T20:40:54.000Z (about 1 year ago)
Last Synced: 2025-05-04T15:50:58.811Z (about 1 year ago)
Topics: biology, cuda, gpu
Language: Cuda
Homepage:
Size: 5.86 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# quemer
### *pronounced "k-mer" [keɪ-mɜr]*

GPU accelerated k-mer counter.

Built on top of the [bqtools](https://github.com/arcinstitute/bqtools) and the `.bq` file format.

## 📦 Installation

build and copy to local bin

```bash
make && PATH="$HOME/.local/bin:$PATH" cp quemer ~/.local/bin/
```

## 📋 Usage

run to see help
```bash
quemer
# ┌───────────────────────────────────────┐
# │ quemer - K-mer Counter │
# └───────────────────────────────────────┘
# Usage: quemer
```

## Performance

| Tool | Dataset | FASTQ Size | BQ Size | Processing Time |
| --------- | ------------------------ | ---------- | ------- | --------------- |
| quemer | E. coli ERR4245144 8-mer | 4.5GB | 579M | 1.34s |
| jellyfish | E. coli ERR4245144 8-mer | 4.5GB | - | 7.48s |

> [!NOTE]
> approximately 5.6x faster than jellyfish on my dev box, please do your own benchmarking.

## Example

Run the following commands to download the E. coli dataset, convert it to bq format, and run k-mer analysis.

Counting 8-mers in the E. coli dataset (8.427M records) takes about 1.327 seconds on an RTX 4090.

```bash
# Download the E. coli dataset
curl -L -o data/ecoli.fastq.gz https://ftp.sra.ebi.ac.uk/vol1/fastq/ERR424/004/ERR4245144/ERR4245144_1.fastq.gz
gunzip data/ecoli.fastq.gz

# Convert to bq format
bqtools encode data/ecoli.fastq -o data/ecoli.bq --policy c -T 32
# 8426997 records written

# Run k-mer analysis
time quemer data/ecoli.bq 8
# ┌───────────────────────────────────────┐
# │ quemer - K-mer Counter │
# └───────────────────────────────────────┘
# ┌───────────────────────────────────────────────────┐
# │ GPU: NVIDIA GeForce RTX 4090 (8.9) RAM: 23.53 GB │
# └───────────────────────────────────────────────────┘
# Input: data/ecoli.bq (k=8) | 8426997 records × 251 bp (506.31 MB)
# Packing sequences... done (888.52 ms, 2380.55 Mbp/s)
# Transferring to GPU... done (36.82 ms, 21.60 GB/s)
# Counting 2115176240 k-mers... done (44.83 ms, 47180.43 Mbp/s)
# Retrieving results... done (65536/65536 non-zero)
# Writing to data/ecoli.bq.k8.fa... done

# ┌──────────────────────────────────────────────────────┐
# │ K-MER COUNTING PERFORMANCE │
# ├──────────────────────────────┬───────────┬───────────┤
# │ Operation │ Time (ms) │ % Total │
# ├──────────────────────────────┼───────────┼───────────┤
# │ Host memory allocation │ 210.81 │ 16.54% │
# │ Sequence packing │ 888.53 │ 69.73% │
# │ GPU setup & transfer │ 37.09 │ 2.91% │
# │ Kernel execution │ 44.83 │ 3.52% │
# │ Result retrieval │ 0.12 │ 0.01% │
# │ Output writing │ 3.88 │ 0.30% │
# ├──────────────────────────────┼───────────┼───────────┤
# │ TOTAL │ 1274.18 │ 100.00% │
# ├──────────────────────────────┴───────────┴───────────┤
# │ METRICS │
# ├──────────────────────────────┬───────────────────────┤
# │ Processing rate (Mbp/s) │ 1660.03 │
# │ Throughput (GB/s) │ 0.42 │
# │ k-mer size │ 8 │
# │ Records processed │ 8426997 │
# └──────────────────────────────┴───────────────────────┘
# quemer data/ecoli.bq 8 26.85s user 0.68s system 2075% cpu 1.327 total
```

## Comparing to other tools

A very non scientific comparison with `jellyfish` follows.

First we can run `jellyfish` on 32 threads to find all 8-mers in the E. coli dataset.

Then we can dump `.jf` file into a fasta file that is human readable.

```bash
time jellyfish count -m 8 -s 100M -t 16 data/ecoli.fastq
# jellyfish count -m 8 -s 100M -t 16 data/ecoli.fastq 82.29s user 0.62s system 1108% cpu 7.476 total

time jellyfish dump mer_counts.jf > mer_counts_dumps.fa
# jellyfish dump mer_counts.jf > mer_counts_dumps.fa 0.01s user 0.01s system 96% cpu 0.017 total
```

the top of the file looks like:

```fasta
>785758
AAAAAAAA
>93506
AAAAAAAC
>81435
AAAAAAAG
>111666
AAAAAAAT
```

and with `quemer`:

```fasta
>785758
AAAAAAAA
>94615
AAAAAAAC
>81435
AAAAAAAG
>111666
AAAAAAAT
```

**note we can see that the counts differ and this is due to a structural difference in `bq` files do not allow `N` characters.

above we set the `--policy c` flag, which replaces all `N` characters with `C` characters. Which is why above the counts differ - specifically a higher count for `AAAAAAAC` (N's were replaced with C's).

## Requirements
- NVIDIA GPU (RTX 4090 recommended)
- CUDA Toolkit

## References
- [K-mer (Wikipedia)](https://en.wikipedia.org/wiki/K-mer)
- [European Nucleotide Archive](https://www.ebi.ac.uk/ena/browser/home)

## TODO

- [ ] Review sequence packing and host allocation (+86% of the overall time)
- [ ] Add more tests
- [ ] Expand to large k-mers
- [ ] Improve overall performance

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/drbh/quemer

Awesome Lists containing this project

README