https://github.com/moold/kseq-rs
A FASTA/FASTQ format parser library
https://github.com/moold/kseq-rs
fasta fastq
Last synced: about 2 months ago
JSON representation
A FASTA/FASTQ format parser library
- Host: GitHub
- URL: https://github.com/moold/kseq-rs
- Owner: moold
- License: mit
- Created: 2021-07-28T14:10:19.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-03-01T01:32:02.000Z (over 1 year ago)
- Last Synced: 2024-04-29T13:42:42.475Z (over 1 year ago)
- Topics: fasta, fastq
- Language: Rust
- Homepage:
- Size: 77.1 KB
- Stars: 19
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/moold/kseq/archive/refs/heads/main.zip)
[](https://crates.io/crates/kseq)
[](https://docs.rs/kseq/)
# kseq
`kseq` is a simple fasta/fastq (**fastx**) format parser library for [Rust](https://www.rust-lang.org/), its main function is to iterate over the records from fastx files (similar to [kseq](https://attractivechaos.github.io/klib/#Kseq%3A%20stream%20buffer%20and%20FASTA%2FQ%20parser) in `C`). It uses shared buffer to read and store records, so the speed is very fast. It supports a **plain** or **gz** fastx file or [`io::stdin`](https://doc.rust-lang.org/std/io/fn.stdin.html), as well as a **fofn** (file-of-file-names) file, which contains multiple plain or gz fastx files (one per line).Using `kseq` is very simple. Users only need to call `parse_path` to parse a path or `parse_reader` to parse a reader, and then use `iter_record` method to get each record.
- `parse_path` This function takes a path that implements [`AsRef`](https://doc.rust-lang.org/std/path/struct.Path.html) as input, a path can be a `fastx` file, `-` for [`io::stdin`](https://doc.rust-lang.org/std/io/fn.stdin.html), or a `fofn` file. It returns a `Result` type:
- `Ok(T)`: A struct `T` with the `iter_record` method.
- `Err(E)`: An error `E` including missing input, can't open or read, wrong fastx format or invalid path or file errors.- `parse_reader` This function takes a reader that implements [`std::io::Read`](https://doc.rust-lang.org/std/io/trait.Read.html) as input. It returns a `Result` type:
- `Ok(T)`: A struct `T` with the `iter_record` method.
- `Err(E)`: An error `E` including missing input, can't open or read, wrong fastx format or invalid path or file errors.- `iter_record` This function can be called in a loop, it returns a `Result>` type:
- `Ok(Some(Record))`: A struct `Record` with methods:
- `head -> &str`: get sequence id/identifier
- `seq -> &str`: get sequence
- `des -> &str`: get sequence description/comment
- `sep -> &str`: get separator
- `qual -> &str`: get quality scores
- `len -> usize`: get sequence length***Note:*** call `des`, `sep` and `qual` will return `""` if `Record` doesn't have these attributes.
- `Ok(None)`: Stream has reached `EOF`.
- `Err(ParseError)`: An error [`ParseError`](https://docs.rs/kseq/0.3.0/kseq/record/enum.ParseError.html) including `IO`, `TruncateFile`, `InvalidFasta` or `InvalidFastq` errors.## Example
```no_run
use std::env::args;
use std::fs::File;
use kseq::parse_path;fn main(){
let path: String = args().nth(1).unwrap();
let mut records = parse_path(path).unwrap();
// let mut records = parse_reader(File::open(path).unwrap()).unwrap();
while let Some(record) = records.iter_record().unwrap() {
println!("head:{} des:{} seq:{} qual:{} len:{}",
record.head(), record.des(), record.seq(),
record.qual(), record.len());
}
}
```## Installation
```text
cargo add kseq
```## Benchmarking
```text
cargo bench
```