{"id":24436227,"url":"https://github.com/yhoogstrate/fastafs","last_synced_at":"2026-01-17T07:01:57.564Z","repository":{"id":27304998,"uuid":"110824253","full_name":"yhoogstrate/fastafs","owner":"yhoogstrate","description":"toolkit for file system virtualisation of random access compressed FASTA, FAI, DICT \u0026 TWOBIT files","archived":false,"fork":false,"pushed_at":"2024-08-13T14:06:20.000Z","size":2821,"stargazers_count":22,"open_issues_count":6,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-20T17:44:06.091Z","etag":null,"topics":["2bit","compression","dna-sequences","fasta","filesystem","fuse-filesystem"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yhoogstrate.png","metadata":{"files":{"readme":"README.md","changelog":"Changelog","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-15T11:22:18.000Z","updated_at":"2024-05-27T10:50:10.000Z","dependencies_parsed_at":"2024-01-20T18:37:52.334Z","dependency_job_id":"83cd3663-ac36-4822-8230-803b5a860967","html_url":"https://github.com/yhoogstrate/fastafs","commit_stats":null,"previous_names":[],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/yhoogstrate/fastafs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhoogstrate%2Ffastafs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhoogstrate%2Ffastafs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhoogstrate%2Ffastafs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhoogstrate%2Ffastafs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yhoogstrate","download_url":"https://codeload.github.com/yhoogstrate/fastafs/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhoogstrate%2Ffastafs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28503021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T06:57:29.758Z","status":"ssl_error","status_checked_at":"2026-01-17T06:56:03.931Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["2bit","compression","dna-sequences","fasta","filesystem","fuse-filesystem"],"created_at":"2025-01-20T17:37:59.282Z","updated_at":"2026-01-17T07:01:57.327Z","avatar_url":"https://github.com/yhoogstrate.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FASTAFS: toolkit for file system virtualisation of random access compressed FASTA files\n----\n\n# Citing FASTAFS or its virtualisation philosophy\n\n[10.1186/s12859-021-04455-3](https://doi.org/10.1186/s12859-021-04455-3)\n\n```\nHoogstrate, Y., Jenster, G.W. \u0026 van de Werken, H.J.G.\nFASTAFS: file system virtualisation of random access compressed FASTA files.\nBMC Bioinformatics 22, 535 (2021).\nhttps://doi.org/10.1186/s12859-021-04455-3\n```\n\n----\n\nDirect link to the file format specification:\n\n[https://github.com/yhoogstrate/fastafs/blob/master/doc/FASTAFS-FORMAT-SPECIFICATION.md](https://github.com/yhoogstrate/fastafs/blob/master/doc/FASTAFS-FORMAT-SPECIFICATION.md)\n\n----\n\nLinks:\n[bio.tools](https://bio.tools/fastafs)\n\n---\n\n![](https://bioinf-galaxian.erasmusmc.nl/public/images/fastafs/fastafs-example.gif)\n\n## Elegant integration of sequence data archives, backwards compatible with FASTA and no API's needed\n\nRNA, DNA and protein sequences are commonly stored in the FASTA format. Although very commonly used and easy to read, FASTA files come with additional metadata files and consume unnecessary disk space. These additional metadata files need to be are necessary to achieve random access and have certain interoperability features, and require additional maintaince. Classical FASTA (de-)compressors only offer back and forwards compression of the files, often requiring to decompress to a new copy of the FASTA file making it inpractical solutions in particular for random access use cases. Although they typically produce very compact archives with quick algorithms, they are not widely adopted in our bioinformatics software.\n\nHere we propose a solution; a virtual layer between (random access) FASTA archives and read-only access to FASTA files and their guarenteed in-sync FAI, DICT and 2BIT files, through the File System in Userspace (FUSE) file system layer. When the archive is mounted, fastafs virtualizes a folder containing the FASTA and necessary metadata files, only accessing the chunks of the archive needed to deliver to the file request. This elegant software solution offers several advantages:\n - virtual files and their system calls are identical to flat files and preserve backwards compatibility with tools only compatible with FASTA, also for random access use-cases,\n - there is no need to use additional disk space for temporary decompression or to put entire FASTA files into memory,\n - for random access requests, computational resources are only spent on decompressing the region of interest,\n - it does not need multiple implementations of software libraries for each distinct tool and for each programming language,\n - it does not require to maintain multiple files that all together make up one data entity as it is guaranteed to provide dict- and fai-files that are in sync with their FASTA of origin.\n\nIn addition, the corresponding toolkit offers an interface that allows ENA sequence identification, file integrity verification and  management of the mounted files and process ids.\n\nFASTAFS is deliberately made backwards compatible with both TwoBit and Fasta. The package even allows to mount TwoBit files instead of FASTAFS files, to FASTA files. For those who believe FASTAFS is this famous 15th standard (\u003chttps://xkcd.com/927/\u003e)?\nPartially, it is not designed to replace FASTA nor TwoBit as the mountpoints provide an exact identical way of file access as regular flat file acces, and is thus backwards compatible. Instead, it offers the same old standard with an elegant toolkit that allows easier integration with workflow management systems.\n\n## Installation and compilation\n\nCurrently the package uses cmake for compilation\nRequired dependencies are:\n\n -   libboost (only for unit testing, will be come an optional dependency soon)\n -   libopenssl (for generating MD5 hashes)\n -   libfuse (for access to the fuse layer system and file virtualization)\n -   c++ compiler supporting c++-14\n -   glibc\n -   libssl (for checking sequences with ENA)\n -   zlib (crc32 checksum)\n -   cmake or meson + ninja-build\n -   libcrypto for MD5sums\n \n\n\n\n```\n# debian + ubuntu like systems:\n\nsudo apt install git build-essential cmake libboost-dev libssl-dev libboost-test-dev libboost-system-dev libboost-filesystem-dev zlib1g-dev libzstd-dev libfuse-dev\ngit clone https://github.com/yhoogstrate/fastafs.git\ncd fastafs\n```\n\n```\n# RHEL + CentOS + Fedora like systems:\n\nsudo yum install git cmake gcc-c++ boost-devel openssl-devel libzstd-devel zlib-devel fuse-devel\ngit clone https://github.com/yhoogstrate/fastafs.git\ncd fastafs\n```\n\n\nCompile (release, recommended):\n```\ncmake -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=/usr/local -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON .\nmake \"$@\" -j $(nproc)\nsudo make install\n```\n\nIf you do not have root permission, use the following instead:\n```\ncmake -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=~/.local -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON .\nmake \"$@\" -j $(nproc)\nmake install\n```\n\nIf you like to play with the code and like to help development, you can create a debug binary as follows:\n```\ncmake -DCMAKE_BUILD_TYPE=debug -DCMAKE_INSTALL_PREFIX=/usr/local -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON .\nmake \"$@\" -j $(nproc)\nsudo make install\n```\n\n\nIf you have patches, changes or even cool new features you believe are worth contributing, please run `astyle` with the following command:\n\n```\ncmake -DCMAKE_BUILD_TYPE=debug -DCMAKE_INSTALL_PREFIX=/usr/local -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON .\nmake tidy\n```\n\nThis styles the code in a more or less compatible way with the rest of the code.\nThanks in advance(!)\n\n\nMake or overwrite docs:\n\n```\ncmake -DCMAKE_BUILD_TYPE=debug -DCMAKE_INSTALL_PREFIX=/usr/local -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON .\nmake doc\n```\n\n## usage\n### fastafs cache: adding files to fastafs\nWe can add files to the fastafs database by running:\n\n```\n$ fastafs cache test ./test.fa\n```\n\nOr, starting with 2bit:\n\n```\n$ fastafs cache test-from-2bit ./test.2bit\n```\n\nFASTAFS files will be saved in `~/.local/share/fastafs/\u003cuid\u003e.fastafs` and an entry will be added to the 'database' (`~/.local/share/fastafs/index`).\n\n### fastafs list: overview of fastafs db\n\nThe `list` command lists all FASTAFS files located in the 'database' (`~/.local/share/fastafs/index`):\n\n```\n$ fastafs list\n\nFASTAFS NAME    FASTAFS        SEQUENCES    BASES   DISK SIZE\ntest            v0-x32-2bit    7            88      214      \n```\n\n### fastafs info: stats of cached fasta file\n```\n$ fastafs info\n\n# FASTAFS NAME: /home/youri/.local/share/fastafs/test.fastafs\n# SEQUENCES:    7\nchr1                    16          2bit    75255c6d90778999ad3643a2e69d4344\nchr2                    16          2bit    8b5673724a9965c29a1d76fe7031ac8a\nchr3.1                  13          2bit    61deba32ec4c3576e3998fa2d4b87288\nchr3.2                  14          2bit    99b90560f23c1bda2871a6c93fd6a240\nchr3.3                  15          2bit    3625afdfbeb43765b85f612e0acb4739\nchr4                    8           2bit    bd8c080ed25ba8a454d9434cb8d14a68\nchr5                    6           2bit    980ef3a1cd80afec959dcf852d026246\n```\n\n### fastafs mount: mount fastafs archive to unlock fasta file(s)\n\n```\n$ fastafs mount hg19 /mnt/fastafs/hg19 \n$ ls /mnt/fastafs/hg19 \nhg19.2bit  hg19.dict  hg19.fa  hg19.fa.fai  seq\n\n$ ls -alsh /mnt/fastafs/hg19\ntotal 0\n-rw-r--r-- 1 youri youri 779M Aug 19 15:26 hg19.2bit\n-rw-r--r-- 1 youri youri 7.9K Aug 19 15:26 hg19.dict\n-rw-r--r-- 1 youri youri 3.0G Aug 19 15:26 hg19.fa\n-rw-r--r-- 1 youri youri 3.5K Aug 19 15:26 hg19.fa.fai\ndrwxr-xr-x 1 youri youri    0 Aug 19 15:26 seq\n\n\n$ head -n 5 /mnt/bio/hg19/hg19.dict \n@HD\tVN:1.0\tSO:unsorted\n@SQ\tSN:chr1\tLN:249250621\tM5:1b22b98cdeb4a9304cb5d48026a85128\tUR:fastafs:///hg19\n@SQ\tSN:chr2\tLN:243199373\tM5:a0d9851da00400dec1098a9255ac712e\tUR:fastafs:///hg19\n@SQ\tSN:chr3\tLN:198022430\tM5:641e4338fa8d52a5b781bd2a2c08d3c3\tUR:fastafs:///hg19\n@SQ\tSN:chr4\tLN:191154276\tM5:23dccd106897542ad87d2765d28a19a1\tUR:fastafs:///hg19\n```\n\n### fastafs mount: use custom padding\n\n```\n$ fastafs mount test /mnt/fastafs/test        \n$ ls /mnt/fastafs/test \ntest.2bit  test.dict  test.fa  test.fa.fai\n\n$ cat /mnt/fastafs/test/test.fa\n\u003echr1\nttttccccaaaagggg\n\u003echr2\nACTGACTGnnnnACTG\n\u003echr3.1\nACTGACTGaaaac\n\u003echr3.2\nACTGACTGaaaacc\n\u003echr3.3\nACTGACTGaaaaccc\n\u003echr4\nACTGnnnn\n\u003echr5\nnnACTG\n\n$ umount /mnt/fastafs/test\n\n$ fastafs mount -p 4 test /mnt/fastafs/test        \n$ cat  /mnt/fastafs/test/test.fa | head -n 15\n\u003echr1\ntttt\ncccc\naaaa\ngggg\n\u003echr2\nACTG\nACTG\nnnnn\nACTG\n\u003echr3.1\nACTG\nACTG\naaaa\nc\n```\n\nTo find the file size of chrM (16571):\n```\n$ ls -l /mnt/bio/hg19/seq/chrM\n\n-rw-r--r-- 1 youri youri 16571 Feb  1 10:47 /mnt/bio/hg19/seq/chrM\n```\n\n### Find all running `fastafs mount` / `mount.fastafs` instances\n\nThe output format of `fastafs ps` is: `\u003cpid\u003e\\t\u003csource\u003e\\t\u003cdestination\u003e\\n`\n\n```\n$ fastafs ps\n16383\t/home/youri/.local/share/fastafs/test.fastafs\t/mnt/tmp\n```\n\n### Mounting via fstab (for instance on linux boot)\n\nYou can add the following line(s) to /etc/fstab to make fastafs mount during boot:\n\n```\nmount.fastafs#/home/youri/.local/share/fastafs/hg19.fastafs /mnt/fastafs/hg19 fuse auto,allow_other 0 0\n```\n\nHere `mount.fastafs` refers to the binary that only does mounting, without the rest of the toolkit.\nThis is followed by a hash-tag and the path of the desired fastafs file. The next value is the mount point followed by the argument indicating it runs fuse.\nThe `auto,allow_other` refers to the `-o` argument.\nHere `auto` ensures it is mounted automatically after boot.\nGiven that a system mounts as root user, the default permission of the mount point will be only for root. \nBy setting `allow_other`, file system users get permissions to the mountpoint.\n\n## Contributing\nFeel free to start a discussion or to contribute to the GPL licensed code.\nIf you are willing to make even the smallest contribution to this project in any way, really, feel free to open an issue or to send an e-mail.\n\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/c90c7d61651d4e18aa82a4b02f3599fa)](https://www.codacy.com/app/yhoogstrate/fastafs?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=yhoogstrate/fastafs\u0026amp;utm_campaign=Badge_Grade)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhoogstrate%2Ffastafs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyhoogstrate%2Ffastafs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhoogstrate%2Ffastafs/lists"}