Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/suchapalaver/krust
Bioinformatics 101 tool for counting unique k-length substrings in DNA
https://github.com/suchapalaver/krust
beginner-friendly bioinformatics bioinformatics-tool bitpacking bytes clap dashmap genomics insta k-mer k-mer-counting k-mers kmer kmer-counting kmers needletail parallelization rayon rust rust-bio
Last synced: 11 days ago
JSON representation
Bioinformatics 101 tool for counting unique k-length substrings in DNA
- Host: GitHub
- URL: https://github.com/suchapalaver/krust
- Owner: suchapalaver
- License: mit
- Created: 2021-09-06T18:32:57.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-09-25T02:11:52.000Z (about 1 year ago)
- Last Synced: 2024-10-13T13:28:58.885Z (25 days ago)
- Topics: beginner-friendly, bioinformatics, bioinformatics-tool, bitpacking, bytes, clap, dashmap, genomics, insta, k-mer, k-mer-counting, k-mers, kmer, kmer-counting, kmers, needletail, parallelization, rayon, rust, rust-bio
- Language: Rust
- Homepage:
- Size: 23.1 MB
- Stars: 30
- Watchers: 0
- Forks: 5
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `krust`: counts k-mers, written in rust
`krust` is a [k-mer](https://en.wikipedia.org/wiki/K-mer) counter - a bioinformatics 101 tool for counting the frequency of substrings of length `k` within strings of DNA data. `krust` is written in Rust and run from the command line. It takes a FASTA file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a [reverse complement](https://en.wikipedia.org/wiki/Complementarity_(molecular_biology)#DNA_and_RNA_base_pair_complementarity)) and their frequency across all records in the given data. `krust` is tested for accuracy against [jellyfish](https://github.com/gmarcais/Jellyfish).
```bash
krust: counts k-mers, written in rustUsage: krust
Arguments:
provides k length, e.g. 5
path to a FASTA file, e.g. /home/lisa/bio/cerevisiae.pan.faOptions:
-h, --help Print help information
-V, --version Print version information
````krust` supports either `rust-bio` or `needletail` to read FASTA record. Use the `--features` flag to select.
Run `krust` with `rust-bio`'s fasta reader to count *5*-mers like this:
```bash
cargo run --release --features rust-bio -- 5 your/local/path/to/fasta_data.fa
```or, searching for *21*-mers with `needletail` as the fasta reader, like this:
```bash
cargo run --release --features needletail -- 21 your/local/path/to/fasta_data.fa
````krust` prints to `stdout`, writing, on alternate lines:
```bash
>114928
ATGCC
>289495
AATCA
...
```