{"id":15467579,"url":"https://github.com/arkanosis/bamrescue","last_synced_at":"2025-04-13T09:33:29.227Z","repository":{"id":48315284,"uuid":"91139234","full_name":"Arkanosis/bamrescue","owner":"Arkanosis","description":"Utility to check Binary Sequence Alignment / Map (BAM) files for corruption and repair them","archived":false,"fork":false,"pushed_at":"2024-09-02T22:04:29.000Z","size":252,"stargazers_count":7,"open_issues_count":15,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-10T19:10:02.115Z","etag":null,"topics":["bam","bam-files","bioinformatics","corruption","repair","rescue"],"latest_commit_sha":null,"homepage":"https://bamrescue.arkanosis.net/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Arkanosis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-13T00:37:04.000Z","updated_at":"2025-04-10T08:33:23.000Z","dependencies_parsed_at":"2024-06-08T17:57:07.423Z","dependency_job_id":"f9a82802-c87c-4ef3-affd-00c5e6d2ce0d","html_url":"https://github.com/Arkanosis/bamrescue","commit_stats":{"total_commits":101,"total_committers":1,"mean_commits":101.0,"dds":0.0,"last_synced_commit":"35ba175bf1716fbc5c9d37207d6e7d08736ffeea"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arkanosis%2Fbamrescue","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arkanosis%2Fbamrescue/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arkanosis%2Fbamrescue/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arkanosis%2Fbamrescue/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Arkanosis","download_url":"https://codeload.github.com/Arkanosis/bamrescue/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248691010,"owners_count":21146250,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bam","bam-files","bioinformatics","corruption","repair","rescue"],"created_at":"2024-10-02T01:23:03.582Z","updated_at":"2025-04-13T09:33:29.208Z","avatar_url":"https://github.com/Arkanosis.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bamrescue [![](https://img.shields.io/crates/v/bamrescue.svg)](https://crates.io/crates/bamrescue) [![AUR](https://img.shields.io/badge/AUR-v0.3.0-green.svg)](https://aur.archlinux.org/packages/bamrescue/) [![deb](https://img.shields.io/badge/deb-v0.3.0-green.svg)](https://apt.arkanosis.net/pool/main/b/bamrescue/bamrescue.deb) [![OCI](https://img.shields.io/badge/OCI-v0.3.0-green.svg)](https://hub.docker.com/repository/docker/arkanosis/bamrescue) [![License](https://img.shields.io/badge/license-ISC-blue.svg)](/LICENSE) [![Build status](https://travis-ci.org/Arkanosis/bamrescue.svg?branch=master)](https://travis-ci.org/Arkanosis/bamrescue)\n\n**bamrescue** is a command line utility to check Binary Sequence\nAlignment / Map (BAM) files for corruption and rescue as much data\nas possible from them in the event they happen to be corrupted.\n\n[![asciicast](https://arkanosis.com/images/bamrescue.png)](https://asciinema.org/a/187594)\n\n## Installation\n\n### On ArchLinux and derivatives (Manjaro…)\n\nA PKGBUILD is provided on AUR for ArchLinux and derivatives. It is only\ntested with an up-to-date ArchLinux.\n\n```bash\n# Get the PKGBUILD\ngit clone https://aur.archlinux.org/bamrescue.git\n\n# Add the author's PGP key\ngpg --recv-keys FA490B15D054C7E83F70B0408C145ABAC11FA702\n\n# Build and install bamrescue\ncd bamrescue\nmakepkg -si\n```\n\nAlternatively, you can install bamrescue using an AUR helper such as [yay](https://github.com/Jguer/yay):\n\n```bash\n# Install bamrescue\nyay -S bamrescue\n```\n\n### On Debian and derivatives (Ubuntu, Mint…)\n\nPre-built packages are provided for Debian and derivatives. They are only\ntested with Debian 12 (Bookworm) and Ubuntu 24.04 LTS (Noble).\n\n```bash\n# Install prerequisites\nsudo apt install curl gnupg\n\n# Add the author's PGP key\ncurl -s https://arkanosis.net/jroquet.pub.asc | sudo tee /usr/share/keyrings/arkanosis.asc\n\n# Add the author's apt stable channel to your apt sources\necho 'deb [arch=amd64 signed-by=/usr/share/keyrings/arkanosis.asc] https://apt.arkanosis.net/ stable main' | sudo tee /etc/apt/sources.list.d/arkanosis.list\n\n# Update and install bamrescue\nsudo apt update\nsudo apt install bamrescue\n```\n\n### In OCI containers (Docker, Podman…)\n\nA Dockerfile is provided for Docker and alternatives. It is only tested with Podman 5.\n\nTo create an OCI image from the Dockerfile, run this command:\n\n```bash\npodman build --tag bamrescue:0.3.0 -f ./Dockerfile\n```\n\nTo run an ephemeral container with the created image, run this command:\n\n```bash\npodman run --rm -it bamrescue:0.3.0 bamrescue --help\n```\n\nYou can of course replace `--help` with the command / option of you choice.\n\n## Usage\n\n```\nUsage: bamrescue check [--quiet] [--threads=\u003cthreads\u003e] \u003cbamfile\u003e\n       bamrescue rescue [--threads=\u003cthreads\u003e] \u003cbamfile\u003e \u003coutput\u003e\n       bamrescue -h | --help\n       bamrescue --version\n\nCommands:\n    check                Check BAM file for corruption.\n    rescue               Keep only non-corrupted blocks of BAM file.\n\nArguments:\n    bamfile              BAM file to check or rescue.\n    output               Rescued BAM file.\n\nOptions:\n    -h, --help           Show this screen.\n    -q, --quiet          Do not output statistics, stop at first error.\n    --threads=\u003cthreads\u003e  Number of threads to use, 0 for auto [default: 0].\n    --version            Show version.\n```\n\n## How it works\n\nA BAM file is a BGZF file ([specification](https://samtools.github.io/hts-specs/SAMv1.pdf)),\nand as such is composed of a series of concatenated RFC1592-compliant gzip\nblocks ([specification](https://tools.ietf.org/html/rfc1952)).\n\nEach gzip block contains at most 64 KiB of data, including a CRC32 checksum of\nthe uncompressed data which is used to check its integrity.\n\nAdditionally, since gzip blocks start with a gzip identifier (ie. 0x1f8b), a\nfixed gzip method (ie. 0x8) and fixed gzip flags (ie. 0x4), and bgzf blocks\ninclude both a bgzf identifier (ie. 0x4243), a fixed extra subfield length\n(ie. 0x2) and their own compressed size, it is possible to skip over corrupted\nblocks (at most 64 KiB) to the next non-corrupted block with limited complexity\nand acceptable reliability.\n\nThis property is used to rescue data from corrupted BAM files by keeping only\ntheir non-corrupted blocks, hopefully rescuing most reads.\n\n## Examples\n\nA bam file of 40 MiB (which is very small by today standards) has been\ncorrupted by two hard drive bad sectors. Most tools (including gzip) choke on\nthe file at the first corrupted byte, meaning that up to 100% of the bam\npayload is considered lost depending on the tool.\n\nLet's check the file using bamrescue:\n\n```shell\n$ bamrescue check samples/corrupted_payload.bam\nbam file statistics:\n   1870 bgzf blocks checked (117 MiB of bam payload)\n      2 corrupted blocks found (0% of total)\n     46 KiB of bam payload lost (0% of total)\n```\n\nIndeed, a whole hard drive bad sector typically amounts for 512 bytes lost,\nwhich is much smaller than an average bgzf block (which can be up to 64 KiB\nlarge).\n\nEven though most tools would gave up on this file, it still contains almost\n100% of non-corrupted bam payload, and the user probably wouldn't mind much if\nthey could work only on that close-to-100% amount of data.\n\nLet's rescue the non-corrupted payload (beware: this takes as much additional\nspace on the disk as the original file):\n\n```shell\n$ bamrescue rescue samples/corrupted_payload.bam rescued_file.bam\nbam file statistics:\n   1870 bgzf blocks found (117 MiB of bam payload)\n      2 corrupted blocks removed (0% of total)\n     46 KiB of bam payload lost (0% of total)\n   1868 non-corrupted blocks rescued (100% of total)\n    111 MiB of bam payload rescued (100% of total)\n```\n\nThe resulting bam file can now be used like if it never had been corrupted.\nRescued data is validated using a CRC32 checksum, so it's not like ignoring\nerrors and working on corrupted data (typical use of gzip to get garbage data\nfrom a corrupted bam file): it's working on (ridiculously) less, validated\ndata.\n\n## Performance\n\nbamrescue is very fast. Actually, it is even faster than gzip while doing more.\n\nHere are some numbers for a [40 MiB, non-corrupted bam file](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqK562G1AlnRep1.bam):\n\n| Command | Time | Corruption detected |\n| :------ | ---: | ------------------: |\n| gzip -t  | 695 ms | No |\n| bamrescue check -q --threads=1 | 1181 ms | No |\n| bamrescue check -q --threads=2 | 661 ms | No |\n| bamrescue check -q --threads=4 | 338 ms | No |\n| bamrescue check --threads=1 | 1181 ms | No |\n| bamrescue check --threads=2 | 661 ms | No |\n| bamrescue check --threads=4 | 338 ms | No |\n\n![Chart](docs/images/benchmarks_nc_2017-07-04.png)\n\nHere are some numbers for the same 40 MiB bam file, with two single-byte\ncorruptions (at ~7 MiB and ~18 MiB, respectively):\n\n| Command | Time | Corruption detected | Number of corrupted blocks reported | Amount of data rescuable¹ |\n| :------ | ---: | ------------------: | ----------------------------------: | ------------------------: |\n| gzip -t  | 93 ms | Yes | N/A | 21 Mio (18%) |\n| bamrescue check -q --threads=1 | 157 ms | Yes | N/A | 21 Mio (18%) |\n| bamrescue check -q --threads=2 | 91 ms | Yes | N/A | 21 Mio (18%) |\n| bamrescue check -q --threads=4 | 56 ms | Yes | N/A | 21 Mio (18%) |\n| bamrescue check --threads=1 | 1174 ms | Yes | 2 | 117 Mio (99.99%) |\n| bamrescue check --threads=2 | 659 ms | Yes | 2 | 117 Mio (99.99%) |\n| bamrescue check --threads=4 | 338 ms | Yes | 2 | 117 Mio (99.99%) |\n\n¹ uncompressed bam payload, rescued using `gzip -d` or `bamrescue rescue`\n\n![Chart](docs/images/benchmarks_c_2017-07-04.png)\n\nNote: these benchmarks have been run on an Intel Core i5-6500 CPU running\nKubuntu 16.04.2 and rustc 1.18.0.\n\n## Caveats\n\nbamrescue does not check whether the bam payload of the file is actually\ncompliant with the bam specification. It only checks if it has not been\ncorrupted after creation, using the error detection codes built in the gzip\nand bgzf formats. This means that as long as the tool used to create a bam\nfile was compliant with the specification, the output of bamrescue will be as\nwell, but bamrescue itself will do nothing to validate that compliance.\n\n## Compiling\n\nRun `cargo build --release` in your working copy.\n\n## Contributing and reporting bugs\n\nContributions are welcome through [GitHub pull requests](https://github.com/Arkanosis/bamrescue/pulls).\n\nPlease report bugs and feature requests on [GitHub issues](https://github.com/Arkanosis/bamrescue/issues).\n\n## License\n\nbamrescue is copyright (C) 2017-2024 Jérémie Roquet \u003cjroquet@arkanosis.net\u003e and\nlicensed under the ISC license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farkanosis%2Fbamrescue","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farkanosis%2Fbamrescue","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farkanosis%2Fbamrescue/lists"}