Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/edawson/tinyfa
A header-only C++ FASTA parser and indexer
https://github.com/edawson/tinyfa
fasta genomics header-only
Last synced: about 2 months ago
JSON representation
A header-only C++ FASTA parser and indexer
- Host: GitHub
- URL: https://github.com/edawson/tinyfa
- Owner: edawson
- License: mit
- Created: 2018-05-12T18:33:02.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-21T10:08:24.000Z (about 2 years ago)
- Last Synced: 2023-04-04T14:14:24.452Z (almost 2 years ago)
- Topics: fasta, genomics, header-only
- Language: C++
- Homepage:
- Size: 28.3 KB
- Stars: 1
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
tinyFA
-----------------
Parse, index and get random access to FASTA files
with no extra dependencies.### Overview
tinyFA provides a (relatively) fast and highly minimalist header-only library
for reading FASTA files, especially in the case that you'd like random access
via their FAI indices. It requires only a modern (GCC4.8 or newer) C++ compiler.### Build / install
Make sure both headers are in a path accessible to your code. You can either
copy them to your include directory, or pass the directory to your compiler
with ` -I/path/to/tinyFA`.Then just include both headers in your C++ code and use the TFA namespace:
```
#include "tinyFA/tinyfa.hpp"
#include "tinyFA/pliib.hpp"using namespace TFA;
```
### Usage
```
#include "tinyfa.hpp"
#include "pliib.hpp"int main (int argc, char** argv){
// usage: ./getseq
// Parse a FASTA file and extract a subsequence.
// If a FASTA index exists, use it, otherwise,
// build one.
// The tinyFA faidx struct
tiny_faidx_t tf;
// Check if an index exists, and create one if not.
if (!checkFAIndexFileExists(argv[1])){
createFAIndex(argv[1], tf);
}
else{
// Parses an FAI file when passed a FASTA file name.
parseFAIndex(argv[1], tf);
}char* contigName = argv[2];
int start = atoi(argv[3]);
int end = atoi(argv[4]);
char* seq;
// Not passing start/end will return the whole contig.
getSequence(tf, contigName, seq, start, end);
cout << seq << endl;// seq gets allocated in getSequence, so you'll want to delete that.
delete [] seq;return 0;
}
```### Other tools
tinyFA takes a lot of code and inspiration from the
excellent [fastahack](https://github.com/ekg/fastahack).
Fastahack has been extensively used and we recommend it for production environments.
It differs from fastahack in that:1. There's no default library setup for Fastahack. This could easily be remedied
with some small makefile tweaks.
2. tinyFA mostly uses structs and primitive types (rather than STL containers).There's also [htslib](https://github.com/samtools/htslib), but if you want to
parse a FASTA file you have to build utilities for SAM/BAM/CRAM/VCF, it requires
zlib, lzma, and lz4 (which are often not up to date / installed by default).However, htslib can parse files compressed with BGZF; this may be an important
addition when dealing with large files.SeqLib and SeqAn are both excellent tools that add to htslib, with the same cons and
many extra pros.