{"id":30777818,"url":"https://github.com/fulcrumgenomics/fqtk","last_synced_at":"2025-09-05T05:08:36.033Z","repository":{"id":65175977,"uuid":"562995195","full_name":"fulcrumgenomics/fqtk","owner":"fulcrumgenomics","description":"Fast FASTQ sample demultiplexing in Rust.","archived":false,"fork":false,"pushed_at":"2025-05-26T18:01:52.000Z","size":149,"stargazers_count":64,"open_issues_count":8,"forks_count":2,"subscribers_count":12,"default_branch":"main","last_synced_at":"2025-08-30T04:48:40.076Z","etag":null,"topics":["bioinformatics","genomics","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fulcrumgenomics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-11-07T17:30:38.000Z","updated_at":"2025-08-22T01:41:41.000Z","dependencies_parsed_at":"2023-02-19T20:30:55.459Z","dependency_job_id":"b2349f64-9101-4a6e-b187-ad73d87576e6","html_url":"https://github.com/fulcrumgenomics/fqtk","commit_stats":{"total_commits":16,"total_committers":3,"mean_commits":5.333333333333333,"dds":0.25,"last_synced_commit":"dec00ab0390337ef24d5de7f8840ff39bb64d06a"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/fulcrumgenomics/fqtk","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ffqtk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ffqtk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ffqtk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ffqtk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fulcrumgenomics","download_url":"https://codeload.github.com/fulcrumgenomics/fqtk/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ffqtk/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273713621,"owners_count":25154614,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-05T02:00:09.113Z","response_time":402,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","genomics","rust"],"created_at":"2025-09-05T05:08:33.624Z","updated_at":"2025-09-05T05:08:36.016Z","avatar_url":"https://github.com/fulcrumgenomics.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fqtk\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/fulcrumgenomics/fqtk/actions?query=workflow%3ACheck\"\u003e\u003cimg src=\"https://github.com/fulcrumgenomics/fqtk/actions/workflows/build_and_test.yml/badge.svg\" alt=\"Build Status\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/crates/l/fqtk.svg\" alt=\"license\"\u003e\n  \u003ca href=\"https://crates.io/crates/fqtk\"\u003e\u003cimg src=\"https://img.shields.io/crates/v/fqtk.svg?colorB=319e8c\" alt=\"Version info\"\u003e\u003c/a\u003e\n  \u003ca href=\"http://bioconda.github.io/recipes/fqtk/README.html\"\u003e\u003cimg src=\"https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat\" alt=\"Install with bioconda\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://doi.org/10.5281/zenodo.13345414\"\u003e\u003cimg src=\"https://zenodo.org/badge/DOI/10.5281/zenodo.13345414.svg\" alt=\"DOI\"\u003e\u003c/a\u003e\n  \u003cbr\u003e\n\u003c/p\u003e\n\nA toolkit for working with FASTQ files, written in Rust.\n\n\u003cp\u003e\n\u003ca href=\"https://fulcrumgenomics.com\"\u003e\u003cimg src=\".github/logos/fulcrumgenomics.svg\" alt=\"Fulcrum Genomics\" height=\"100\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n[Visit us at Fulcrum Genomics](https://www.fulcrumgenomics.com) to learn more about how we can power your Bioinformatics with fqtk and beyond.\n\n\u003ca href=\"mailto:contact@fulcrumgenomics.com?subject=[GitHub inquiry]\"\u003e\u003cimg src=\"https://img.shields.io/badge/Email_us-brightgreen.svg?\u0026style=for-the-badge\u0026logo=gmail\u0026logoColor=white\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://www.fulcrumgenomics.com\"\u003e\u003cimg src=\"https://img.shields.io/badge/Visit_Us-blue.svg?\u0026style=for-the-badge\u0026logo=wordpress\u0026logoColor=white\"/\u003e\u003c/a\u003e\n\nCurrently `fqtk` contains a single tool, `demux` for demultiplexing FASTQ files based on sample barcodes.\n`fqtk demux` can be used to demultiplex one or more FASTQ files (e.g. a set of R1, R2 and I1 FASTQ files) with any number of sample barcodes at fixed locations within the reads.\nIt is highly efficient and multi-threaded for high performance.\n\nUsage for `fqtk demux` follows:\n\n\u003c!-- start usage --\u003e\n````console\n\nPerforms sample demultiplexing on FASTQs.\n\nThe sample barcode for each sample in the metadata TSV will be compared against the sample\nbarcode bases extracted from the FASTQs, to assign each read to a sample.  Reads that do not\nmatch any sample within the given error tolerance will be placed in the ``unmatched_prefix``\nfile.\n\nFASTQs and associated read structures for each sub-read should be given:\n\n- a single fragment read (with inline index) should have one FASTQ and one read structure\n- paired end reads should have two FASTQs and two read structures\n- a dual-index sample with paired end reads should have four FASTQs and four read structures\n  given: two for the two index reads, and two for the template reads.\n\nIf multiple FASTQs are present for each sub-read, then the FASTQs for each sub-read should be\nconcatenated together prior to running this tool\n(e.g. `zcat s_R1_L001.fq.gz s_R1_L002.fq.gz | bgzip -c \u003e s_R1.fq.gz`).\n\n(Read structures)[\u003chttps://github.com/fulcrumgenomics/fgbio/wiki/Read-Structures\u003e] are made up of\n`\u003cnumber\u003e\u003coperator\u003e` pairs much like the `CIGAR` string in BAM files.\nFour kinds of operators are recognized:\n\n1. `T` identifies a template read\n2. `B` identifies a sample barcode read\n3. `M` identifies a unique molecular index read\n4. `S` identifies a set of bases that should be skipped or ignored\n\nThe last `\u003cnumber\u003e\u003coperator\u003e` pair may be specified using a `+` sign instead of number to\ndenote \"all remaining bases\". This is useful if, e.g., fastqs have been trimmed and contain\nreads of varying length. Both reads must have template bases.  Any molecular identifiers will\nbe concatenated using the `-` delimiter and placed in the given SAM record tag (`RX` by\ndefault).  Similarly, the sample barcode bases from the given read will be placed in the `BC`\ntag.\n\nMetadata about the samples should be given as a headered metadata TSV file with at least the\nfollowing two columns present:\n\n1. `sample_id` - the id of the sample or library.\n2. `barcode` - the expected barcode sequence associated with the `sample_id`.\n\nFor reads containing multiple barcodes (such as dual-indexed reads), all barcodes should be\nconcatenated together in the order they are read and stored in the `barcode` field.\n\nIUPAC bases are supported in the (expected) `barcode` column.  An observed IUPAC base must be\nat least as specific as the corresponding base in the expected sample barcode.  E.g. If the\nobserved base is an N, it will only match expected sample barcrods with an N.  And if the\nobserved base is an R, it will match R, V, D, and N, since the latter IUPAC codes allow both\nA and G (R/V/D/N are a superset of the bases compare to R).\n\nThe read structures will be used to extract the observed sample barcode, template bases, and\nmolecular identifiers from each read.  The observed sample barcode will be matched to the\nsample barcodes extracted from the bases in the sample metadata and associated read structures.\n\nAn observed barcode matches an expected barcode if all the following are true:\n1. The number of mismatches (edits/substitutions) is less than or equal to the maximum\n   mismatches (see `--max-mismatches`).\n2. The difference between number of mismatches in the best and second best barcodes is greater\n   than or equal to the minimum mismatch delta (`--min-mismatch-delta`).\n\nThe expected barcode sequence may contains Ns, which are not counted as mismatches regardless\nof the observed base (e.g. the expected barcode `AAN` will have zero mismatches relative to\nboth the observed barcodes `AAA` and `AAN`).\n\n## Outputs\n\nAll outputs are generated in the provided `--output` directory.  For each sample plus the\nunmatched reads, FASTQ files are written for each read segment (specified in the read\nstructures) of one of the types supplied to `--output-types`.  FASTQ files have names\nof the format:\n\n```bash\n{sample_id}.{segment_type}{read_num}.fq.gz\n```\n\nwhere `segment_type` is one of `R`, `I`, and `U` (for template, barcode/index and molecular\nbarcode/UMI reads respectively) and `read_num` is a number starting at 1 for each segment\ntype.\n\nIn addition a `demux-metrics.txt` file is written that is a tab-delimited file with counts\nof how many reads were assigned to each sample and derived metrics.\n\n## Example Command Line\n\nAs an example, if the sequencing run was 2x100bp (paired end) with two 8bp index reads both\nreading a sample barcode, as well as an in-line 8bp sample barcode in read one, the command\nline would be:\n\n```bash\nfqtk demux \\\n    --inputs r1.fq.gz i1.fq.gz i2.fq.gz r2.fq.gz \\\n    --read-structures 8B92T 8B 8B 100T \\\n    --sample-metadata metadata.tsv \\\n    --output output_folder\n```\n\nUsage: fqtk demux [OPTIONS] --inputs \u003cINPUTS\u003e... --read-structures \u003cREAD_STRUCTURES\u003e... --sample-metadata \u003cSAMPLE_METADATA\u003e --output \u003cOUTPUT\u003e\n\nOptions:\n  -i, --inputs \u003cINPUTS\u003e...\n          One or more input FASTQ files each corresponding to a sequencing read (e.g. R1, I1)\n\n  -r, --read-structures \u003cREAD_STRUCTURES\u003e...\n          The read structures, one per input FASTQ in the same order\n\n  -b, --output-types \u003cOUTPUT_TYPES\u003e...\n          The read structure types to write to their own files (Must be one of T, B, or M for template reads, sample barcode reads, and molecular barcode reads).\n\n          Multiple output types may be specified as a space-delimited list.\n\n          [default: T]\n\n  -s, --sample-metadata \u003cSAMPLE_METADATA\u003e\n          A file containing the metadata about the samples\n\n  -o, --output \u003cOUTPUT\u003e\n          The output directory into which to write per-sample FASTQs\n\n  -u, --unmatched-prefix \u003cUNMATCHED_PREFIX\u003e\n          Output prefix for FASTQ file(s) for reads that cannot be matched to a sample\n\n          [default: unmatched]\n\n      --max-mismatches \u003cMAX_MISMATCHES\u003e\n          Maximum mismatches for a barcode to be considered a match\n\n          [default: 1]\n\n  -d, --min-mismatch-delta \u003cMIN_MISMATCH_DELTA\u003e\n          Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match\n\n          [default: 2]\n\n  -t, --threads \u003cTHREADS\u003e\n          The number of threads to use. Cannot be less than 3\n\n          [default: 8]\n\n  -c, --compression-level \u003cCOMPRESSION_LEVEL\u003e\n          The level of compression to use to compress outputs\n\n          [default: 5]\n\n  -S, --skip-reasons \u003cSKIP_REASONS\u003e\n          Skip demultiplexing reads for any of the following reasons, otherwise panic.\n\n          1. `too-few-bases`: there are too few bases or qualities to extract given the read structures.  For example, if a read is 8bp long but the read structure is `10B`, or if a read is empty and the read structure is `+T`.\n\n  -h, --help\n          Print help information (use `-h` for a summary)\n\n  -V, --version\n          Print version information\n````\n\u003c!-- end usage --\u003e\n\n## Installing\n\n### Installing with `conda`\nTo install with conda you must first [install conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html#installation).\nThen, in your command line (and with the environment you wish to install fqtk into active) run:\n\n```console\nconda install -c bioconda fqtk\n```\n\n### Installing with `cargo`\nTo install with cargo you must first [install rust](https://doc.rust-lang.org/cargo/getting-started/installation.html).\nWhich (On Mac OS and Linux) can be done with the command:\n\n```console\ncurl https://sh.rustup.rs -sSf | sh\n```\n\nThen, to install `fqtk` run:\n\n```console\ncargo install fqtk\n```\n\n### Building From Source\n\nFirst, clone the git repo:\n\n```console\ngit clone https://github.com/fulcrumgenomics/fqtk.git\n```\n\nSecondly, if you do not already have rust development tools installed, install via [rustup](https://rustup.rs/):\n\n```console\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\n```\n\nThen build the toolkit in release mode:\n\n```console\ncd fqtk\ncargo build --release\n./target/release/fqtk --help\n```\n\n## Developing\n\nfqtk is developed in Rust and follows the conventions of using `rustfmt` and `clippy` to ensure both code quality and standardized formatting.\nWhen working on fqtk, before pushing any commits, please first run `./ci/check.sh` and resolve any issues that are reported.\n\n## Releasing a New Version\n\n### Pre-requisites\n\nInstall [`cargo-release`][cargo-release-link]\n\n```console\ncargo install cargo-release\n```\n\n### Prior to Any Release\n\nCreate a release that will not try to push to `crates.io` and verify the command:\n\n```console\ncargo release [major,minor,patch,release,rc...] --no-publish\n```\n\nNote: \"dry-run\" is the default for cargo release.\n\nSee the [`cargo-release` reference documentation][cargo-release-docs-link] for more information\n\n### Semantic Versioning\n\nThis tool follows [Semantic Versioning](https://semver.org/).  In brief:\n\n* MAJOR version when you make incompatible API changes,\n* MINOR version when you add functionality in a backwards compatible manner, and\n* PATCH version when you make backwards compatible bug fixes.\n\n### Major Release\n\nTo create a major release:\n\n```console\ncargo release major --execute\n```\n\nThis will remove any pre-release extension, create a new tag and push it to github, and push the release to creates.io.\n\nUpon success, move the version to the [next candidate release](#release-candidate).\n\nFinally, make sure to [create a new release][new-release-link] on GitHub.\n\n### Minor and Patch Release\n\nTo create a _minor_ (_patch_) release, follow the [Major Release](#major-release) instructions substituting `major` with `minor` (`patch`):\n\n```console\ncargo release minor --execute\n```\n\n### Release Candidate\n\nTo move to the next release candidate:\n\n```console\ncargo release rc --no-tag --no-publish --execute\n```\n\nThis will create or bump the pre-release version and push the changes to the main branch on github.\nThis will not tag and publish the release candidate.\nIf you would like to tag the release candidate on github, remove `--no-tag` to create a new tag and push it to github.\n\n[cargo-release-link]:      https://github.com/crate-ci/cargo-release\n[cargo-release-docs-link]: https://github.com/crate-ci/cargo-release/blob/master/docs/reference.md\n[new-release-link]:        https://github.com/fulcrumgenomics/fqtk/releases/new\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffulcrumgenomics%2Ffqtk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffulcrumgenomics%2Ffqtk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffulcrumgenomics%2Ffqtk/lists"}