{"id":22341754,"url":"https://github.com/weaponsforge/fastractor","last_synced_at":"2026-02-05T04:32:46.401Z","repository":{"id":126827965,"uuid":"299400428","full_name":"weaponsforge/fastractor","owner":"weaponsforge","description":"Export .fasta data from a series of remote .bax.h5 files served via Http GET using DEXTRACTOR. ","archived":false,"fork":false,"pushed_at":"2020-11-03T20:19:29.000Z","size":33,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-08T07:53:19.834Z","etag":null,"topics":["dextractor","dextractor-sample","hdf5","hdf5-installation"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/weaponsforge.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-09-28T18:40:52.000Z","updated_at":"2020-11-03T20:19:31.000Z","dependencies_parsed_at":"2023-06-18T05:45:33.515Z","dependency_job_id":null,"html_url":"https://github.com/weaponsforge/fastractor","commit_stats":{"total_commits":23,"total_committers":2,"mean_commits":11.5,"dds":"0.34782608695652173","last_synced_commit":"411991f174b320224055c53def502d559d069b71"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/weaponsforge/fastractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/weaponsforge%2Ffastractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/weaponsforge%2Ffastractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/weaponsforge%2Ffastractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/weaponsforge%2Ffastractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/weaponsforge","download_url":"https://codeload.github.com/weaponsforge/fastractor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/weaponsforge%2Ffastractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29111843,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-05T03:44:17.043Z","status":"ssl_error","status_checked_at":"2026-02-05T03:44:12.077Z","response_time":65,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dextractor","dextractor-sample","hdf5","hdf5-installation"],"created_at":"2024-12-04T08:08:09.263Z","updated_at":"2026-02-05T04:32:46.386Z","avatar_url":"https://github.com/weaponsforge.png","language":null,"readme":"## fastractor\r\n \r\n\u003e Aims to extract `.fasta` data using [DEXTRACTOR](https://github.com/thegenemyers/DEXTRACTOR) from a series of local (3) `.bax.h5` files and (1) `.bas.h5` file.  \r\n\r\n- This guide uses a pre-built [hdf5 binary distribution](https://www.hdfgroup.org/downloads/hdf5/) for linux.  \r\n- To use hdf5 **built from source code**, refer to the [previous installation](https://github.com/weaponsforge/fastractor/wiki/Dextractor:-Build-HDF5-from-Source-Code) guide for more information.\r\n\r\n## Content\r\n\r\n- [**Requirements**](#requirements)\r\n- [**Install the DEXTRACTOR Dependencies**](#install-the-dextractor-dependencies)\r\n\t- [Download the hdf5 Binary Distribution](#download-the-hdf5-binary-distribution)\r\n\t- [Set the hdf5 Library Path](#set-the-hdf5-library-path)\r\n- [**Install DEXTRACTOR**](#install-dextractor)\r\n- [**Dextractor Sample Usage**](#dextractor-sample-usage)\r\n\t- [Download a Dataset](#download-a-dataset)\r\n\t- [Dextractor fasta Extractor Commands](#dextractor-fasta-extractor-commands)\r\n- [**fastractor Usage**](#fastractor-usage)\r\n- [**References**](#references)\r\n\r\n## Requirements\r\n\r\nThe following requirements and dependencies were used for this project. Other system and software configurations are open for testing.\r\n\r\n1. **Virtual Box version 6.14** (for Windows OS)\r\n2. **Windows 10 Pro** (host OS)\r\n\t- Version 1909 (OS Build 18363.1082)\r\n\t- Processor: Intel(R) Core(TM) i7-6700HQ\r\n\t- CPU @2.60GHz 2.60 GHz\r\n\t- GPU: NVIDIA GeForce GTX 1060, 6 GB Dedicated GPU Memory\r\n\t- Memory: 16 GB\r\n\t- System type: 64-bit OS, x64-based processor\r\n3. **CentOS Linux release 8.2.2004 (Core)** - VM (guest OS) running on Virtual Box\r\n\t- Memory: 4 GB\r\n\t- Processors: 2\r\n\t- Hard Disk: 95 GB\r\n\t- kernel 4.18.0-193.19.1.el8\\_2.x86\\_64\r\n\t- gcc 8.3.1\r\n4. **Dextractor dependencies**\r\n\t- hdf5 v1.12.0 \r\n\t\t- a pre-built binary distribution [[hdf5-1.12.0-linux-centos7-x86_64-shared-production.tar.gz]](https://www.hdfgroup.org/downloads/hdf5/) was downloaded and set-up on the Centos OS for this project\r\n\t- Install the hdf5 dependency first if it is not yet installed. Read on [**Install the Dextractor Dependencies**](#install-the-dextractor-dependencies) for more information.\r\n\r\n\r\n## Install the DEXTRACTOR Dependencies\r\n\r\nThe following dependencies must first be installed and configured before proceeding to build dextractor.\r\n\r\n### Download the hdf5 Binary Distribution\r\n\r\n1. Download the hdf5 version 1.12.0 **pre-built binary distribution** for linux from the official [hdf5 downloads](https://www.hdfgroup.org/downloads/hdf5/) website. **hdf5-1.12.0-linux-centos7-x86_64-shared-production.tar.gz** was used for this project.  \r\n\t- **NOTE:** You may need to create an hdf5 group account to view and access the download link below.\r\n\t- `wget https://hdf-wordpress-1.s3.amazonaws.com/wp-content/uploads/manual/HDF5/HDF5_1_12_0/binaries/unix/hdf5-1.12.0-linux-centos7-x86_64-shared-production.tar.gz`\r\n2. Extract the downloaded binary distribution.  \r\n`tar -zxvf hdf5-1.12.0-linux-centos7-x86_64-shared-production.tar.gz`  \r\n\r\n\r\n### Set the hdf5 Library Path\r\n\r\n1. Take note of the extracted binary distribution's **full path** and **/lib path** from the **Download the hdf5 Binary Distribution** section, #2:  \r\n`/home/adminuser/hdf5-1.12.0-linux-centos7-x86_64-shared`.\r\n2. Temporarily export the installed hdf5's `/lib` path from #1 to the `LD_LIBRARY_PATH` environment variable.\r\n   - From the previous examples:  \r\n`export LD_LIBRARY_PATH=/home/adminuser/hdf5-1.12.0-linux-centos7-x86_64-shared/lib`\r\n   - **INFO:** Do this step every time the machine is rebooted, or every time you open a new terminal. This will fix the error:\r\n   - \u003e```dextract: error while loading shared libraries: libhdf5.so.200: cannot open shared object file: No such file or directory.```\r\n\r\n## Install DEXTRACTOR\r\n\r\n\u003e **WARNING:** Proceed to build dextractor only after installing the required dependencies from the [**Install the Dextractor Dependencies**](#install-the-dextractor-dependencies) section.\r\n\r\n1. Clone the dextractor repository.\r\n\t- DEXTRACTOR'S **master** branch @revision [63e0fdd\r\n](https://github.com/thegenemyers/DEXTRACTOR/commit/63e0fdd78f14d7240c951d885773d7e12a46350b) was used for this project.  \r\n\t- `git clone https://github.com/thegenemyers/DEXTRACTOR.git dextractor`\r\n2. Update the `dex2DB.c` source code with reference to Issue [[#26]](https://github.com/thegenemyers/DEXTRACTOR/issues/26).\r\n\t- Replace all occurrences of `DB_CSS` to `DB_CCS` in line `#650` and `#846`.\r\n3. Adjust **hdf5**'s `PATH_HDF5` variable in the `Makefile` to specify the **hdf5** binary release installation location. for this example, hdf5's installation location is in `/home/adminuser/hdf5-1.12.0-linux-centos7-x86_64-shared`.\r\n\t- Open the `Makefile`  \r\n`nano Makefile`\r\n\t- Update the **PATH_HDF5** variable on line #2 with your custom hdf5 installation directory, i.e.:  \r\n`PATH_HDF5 = /home/adminuser/hdf5-1.12.0-linux-centos7-x86_64-shared`\r\n4. Navigate to the `/dextractor` directory from the terminal.\r\n\t- Run `make`\r\n5. For convenience, permanently make dextractor globally accessible anywhere from the terminal. Use the full path where dextractor's binaries are built, for example in **/home/adminuser/dextractor**: \r\n\r\n   ```\r\n   cat \u003c\u003cEOF | sudo tee /etc/profile.d/dextractor.sh\r\n   export PATH=$PATH:/home/adminuser/dextractor\r\n   EOF\r\n   source /etc/profile.d/dextractor.sh\r\n   ```\r\n\r\n\r\n## DEXTRACTOR Sample Usage\r\n\r\n### Download a Dataset\r\n\r\nObtain a single dataset generated by a *PacBio RS II* run. This should have (3) `*.bax.h5`  files (`G.1.bax.h5`, `G.2.bax.h5`, `G.3.bax.h5`) and (1) `.bas.h5` file.\r\n\r\n- Create a directory to contain the `.h5` files input  \r\n`mkdir files`\r\n\r\n- Create a directory to contain the `.fasta` file(s) output  \r\n`mkdir processed`\r\n\r\n\r\n### Dextractor fasta Extractor Commands\r\n\r\n\r\n\r\n- **Extract `.fasta` data from a single `.bax.h5` file.**  \r\n`dextract -vf -oprocessed/extracted.fasta files/m131019_072530_42175_c100583702550000001823087704281410_s1_p0.1.bax.h5`\r\n\r\n- **Extract `.fasta` data from all (3) `.bax.h5` files**  \r\n`dextract -vf -oprocessed/data.fasta files/m131019_072530_42175_c100583702550000001823087704281410_s1_p0.*.bax.h5`\r\n\r\n## fastractor Usage\r\n\r\n\u003e This section is a WIP.\r\n\r\n## References\r\n\r\n[[1]](https://dazzlerblog.wordpress.com/command-guides/dextractor-command-guide/) - dextractor blog  \r\n[[2]](https://dazzlerblog.wordpress.com/2014/03/22/the-dextractor-module-save-disk-space-for-your-pacbio-projects/) - use dextractor to save disk space  \r\n[[3]](https://trello.com/c/SaI1183f) - install hdf5 using cmake\r\n\r\n@weaponsforge  \r\n20201001\r\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweaponsforge%2Ffastractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fweaponsforge%2Ffastractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fweaponsforge%2Ffastractor/lists"}