https://github.com/feldroop/genedex
A small and fast FM-Index for Rust
https://github.com/feldroop/genedex
bioinformatics data-structures fmindex search sequence-analysis
Last synced: 5 months ago
JSON representation
A small and fast FM-Index for Rust
- Host: GitHub
- URL: https://github.com/feldroop/genedex
- Owner: feldroop
- Created: 2025-09-06T14:03:43.000Z (5 months ago)
- Default Branch: master
- Last Pushed: 2025-09-08T13:54:44.000Z (5 months ago)
- Last Synced: 2025-09-08T15:34:29.513Z (5 months ago)
- Topics: bioinformatics, data-structures, fmindex, search, sequence-analysis
- Language: Rust
- Homepage:
- Size: 59.6 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# genedex: A Small and Fast FM-Index for Rust
Coming soon!
## Possible future extensions:
- API/structure:
- better API for construction
- better API for alphabets
- more alphabets + better test coverage for different alphabets
- gate rayon/OpenMP usage behind feature flag (enabled by default)
- Optimization ideas for existing features:
- space optimization for rarely occurring symbols (such as the sentinel and N in the human Genome),
- improved build memory usage (maybe a configurable, slower low memory mode):
- add u32 saca
- BWT view optimization
- suffix array, lookup table compression using unconventional int widths (e.g. 33 bit)
- paired blocks for less memory usage when using larger alphabets (such as all possible u8 values except 0)
- Novel features (implementation + API):
- bidirectional FM-Index
- searches with errors and "degenerate" chars as in IUPAC fasta definition (using search schemes)
- optimized version for single text without sentinel
- text sampled suffix array (maybe with text ids and other annotations),
- optimized construction directly from (fasta) file reader