{"id":32139127,"url":"https://github.com/stjude-rust-labs/fq","last_synced_at":"2025-10-21T05:22:00.706Z","repository":{"id":40669648,"uuid":"124939307","full_name":"stjude-rust-labs/fq","owner":"stjude-rust-labs","description":"Command line utility for manipulating Illumina-generated FASTQ files.","archived":false,"fork":false,"pushed_at":"2025-10-07T15:59:15.000Z","size":593,"stargazers_count":91,"open_issues_count":4,"forks_count":5,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-10-07T17:51:19.858Z","etag":null,"topics":["bioinformatics","fastq","fastq-files","genomics","illumina","next-generation-sequencing","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stjude-rust-labs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2018-03-12T19:19:06.000Z","updated_at":"2025-10-07T15:59:19.000Z","dependencies_parsed_at":"2023-11-07T02:42:53.431Z","dependency_job_id":"ab25f430-a906-43ac-be2e-221e76aeefb6","html_url":"https://github.com/stjude-rust-labs/fq","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/stjude-rust-labs/fq","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stjude-rust-labs%2Ffq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stjude-rust-labs%2Ffq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stjude-rust-labs%2Ffq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stjude-rust-labs%2Ffq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stjude-rust-labs","download_url":"https://codeload.github.com/stjude-rust-labs/fq/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stjude-rust-labs%2Ffq/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280207278,"owners_count":26290628,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-21T02:00:06.614Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","fastq","fastq-files","genomics","illumina","next-generation-sequencing","rust"],"created_at":"2025-10-21T05:21:59.544Z","updated_at":"2025-10-21T05:22:00.701Z","avatar_url":"https://github.com/stjude-rust-labs.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# fq\n\n[![CI status](https://github.com/stjude-rust-labs/fq/workflows/CI/badge.svg)](https://github.com/stjude-rust-labs/fq/actions)\n\n**fq** filters, generates, subsamples, and validates [FASTQ] files.\n\n[FASTQ]: https://en.wikipedia.org/wiki/FASTQ_format\n\n## Install\n\nThere are different methods to install fq.\n\n### Releases\n\n[Precompiled binaries are built][releases] for modern Linux distributions\n(`x86_64-unknown-linux-gnu`), macOS (`x86_64-apple-darwin`), and Windows\n(`x86_64-pc-windows-msvc`). The Linux binaries require glibc 2.31+ (CentOS/RHEL\n9+, Debian 11+, Ubuntu 20.04+, etc.).\n\n[releases]: https://github.com/stjude-rust-labs/fq/releases\n\n### Conda\n\nfq is available via [Bioconda].\n\n```\n$ conda install fq=0.12.0\n```\n\n[Bioconda]: https://bioconda.github.io/recipes/fq/README.html\n\n### Manual\n\nClone the repository and use [Cargo] to install fq.\n\n```\n$ git clone --depth 1 --branch v0.12.0 https://github.com/stjude-rust-labs/fq.git\n$ cd fq\n$ cargo install --locked --path .\n```\n\n[Cargo]: https://doc.rust-lang.org/cargo/getting-started/installation.html\n\n### Container image\n\nContainer images are managed by Bioconda and available through [Quay.io], e.g.,\nusing [Docker]:\n\n```\n$ docker image pull quay.io/biocontainers/fq:\u003ctag\u003e\n```\n\nSee [the repository tags] for the available tags.\n\nAlternatively, build the development container image:\n\n```\n$ git clone --depth 1 --branch v0.12.0 https://github.com/stjude-rust-labs/fq.git\n$ cd fq\n$ docker image build --tag fq:0.12.0 .\n```\n\n[Quay.io]: https://quay.io/repository/biocontainers/fq\n[the repository tags]: https://quay.io/repository/biocontainers/fq?tab=tags\n[Docker]: https://www.docker.com/\n\n## Usage\n\nfq provides subcommands for filtering, generating, subsampling, and\nvalidating FASTQ files.\n\n### filter\n\n**fq filter** filters a given FASTQ file by a set of names or a sequence\npattern. The result includes only the records that match the given options.\n\n#### Usage\n\n```\nFilters a FASTQ file\n\nUsage: fq filter [OPTIONS] --dsts \u003cDSTS\u003e [SRCS]...\n\nArguments:\n  [SRCS]...  FASTQ sources\n\nOptions:\n      --names \u003cNAMES\u003e\n          Allowlist of record names\n      --sequence-pattern \u003cSEQUENCE_PATTERN\u003e\n          Keep records that have sequences that match the given regular expression\n      --dsts \u003cDSTS\u003e\n          Filtered FASTQ destinations\n  -h, --help\n          Print help\n  -V, --version\n          Print version\n```\n\n#### Examples\n\n```sh\n# Filters an input FASTQ using the given allowlist.\n$ fq filter --names allowlist.txt --dsts /dev/stdout in.fastq\n\n# Filters FASTQ files by matching a sequence pattern in the first input's\n# records and applying the match to all inputs.\n$ fq filter --sequence-pattern ^TC --dsts out.1.fq --dsts out.2.fq in.1.fq in.2.fq\n```\n\n### generate\n\n**fq generate** is a FASTQ file pair generator. It creates two reads, formatting\nnames as [described by Illumina][1].\n\nWhile _generate_ creates \"valid\" FASTQ reads, the content of the files are\ncompletely random. The sequences do not align to any genome.\n\n[1]: https://help.basespace.illumina.com/articles/descriptive/fastq-files/\n\n#### Usage\n\n```\nGenerates a random FASTQ file pair\n\nUsage: fq generate [OPTIONS] \u003cR1_DST\u003e \u003cR2_DST\u003e\n\nArguments:\n  \u003cR1_DST\u003e  Read 1 destination. Output will be gzipped if ends in `.gz`\n  \u003cR2_DST\u003e  Read 2 destination. Output will be gzipped if ends in `.gz`\n\nOptions:\n  -s, --seed \u003cSEED\u003e                  Seed to use for the random number generator\n  -n, --record-count \u003cRECORD_COUNT\u003e  Number of records to generate [default: 10000]\n      --read-length \u003cREAD_LENGTH\u003e    Number of bases in the sequence [default: 101]\n  -h, --help                         Print help\n  -V, --version                      Print version\n```\n\n#### Examples\n\n```sh\n# Generates the default number of records, written to uncompressed files.\n$ fq generate /tmp/r1.fastq /tmp/r2.fastq\n\n# Generates FASTQ paired reads with 32 records, written to gzipped outputs.\n$ fq generate --record-count 32 /tmp/r1.fastq.gz /tmp/r2.fastq.gz\n```\n\n### lint\n\n**fq lint** is a FASTQ file pair validator.\n\n#### Usage\n\n```\nValidates a FASTQ file pair\n\nUsage: fq lint [OPTIONS] \u003cR1_SRC\u003e [R2_SRC]\n\nArguments:\n  \u003cR1_SRC\u003e  Read 1 source. Accepts both raw and gzipped FASTQ inputs\n  [R2_SRC]  Read 2 source. Accepts both raw and gzipped FASTQ inputs\n\nOptions:\n      --lint-mode \u003cLINT_MODE\u003e\n          Panic on first error or log all errors [default: panic] [possible values: panic, log]\n      --single-read-validation-level \u003cSINGLE_READ_VALIDATION_LEVEL\u003e\n          Only use single read validators up to a given level [default: high] [possible values: low, medium, high]\n      --paired-read-validation-level \u003cPAIRED_READ_VALIDATION_LEVEL\u003e\n          Only use paired read validators up to a given level [default: high] [possible values: low, medium, high]\n      --disable-validator \u003cDISABLE_VALIDATOR\u003e\n          Disable validators by code. Use multiple times to disable more than one\n  -h, --help\n          Print help\n  -V, --version\n          Print version\n```\n\n#### Validators\n\n_validate_ includes a set of validators that run on single or paired records.\nBy default, records are validated with all rules, but validators can be\ndisabled using `--disable-validator CODE`, where `CODE` is one of validators\nlisted below.\n\n##### Single\n\n| Code | Level  | Name              | Validation\n|------|--------|-------------------|------------\n| S001 | low    | PlusLine          | Plus line starts with a \"+\".\n| S002 | medium | Alphabet          | All characters in sequence line are one of \"ACGTN\", case-insensitive.\n| S003 | high   | Name              | Name line starts with an \"@\".\n| S004 | low    | Complete          | All four record lines (name, sequence, plus line, and quality) are present.\n| S005 | high   | ConsistentSeqQual | Sequence and quality lengths are the same.\n| S006 | medium | QualityString     | All characters in quality line are between \"!\" and \"~\" (ordinal values).\n| S007 | high   | DuplicateName     | All record names are unique.\n\n##### Paired\n\n| Code | Level   | Name              | Validation\n|------|---------|-------------------|------------\n| P001 | medium  | Names             | Each paired read name is the same, excluding interleave.\n\n#### Examples\n\n```sh\n# Validate both reads using all validators. Exits cleanly (0) if no validation\n# errors occur.\n$ fq lint r1.fastq r2.fastq\n\n# Log errors instead of quitting on first error.\n$ fq lint --lint-mode log r1.fastq r2.fastq\n\n# Disable validators S004 and S007.\n$ fq lint --disable-validator S004 --disable-validator S007 r1.fastq r2.fastq\n```\n\n### subsample\n\n**fq subsample** outputs a subset of records from single or paired FASTQ files.\n\nWhen using a probability (`-p, --probability`), each file is read through once,\nand a subset of records is selected based on that chance. Given the randomness\nused when sampling a uniform distribution, the output record count will not be\nexact but (statistically) close.\n\nWhen using a record count (`-n, --record-count`), the first input is read\ntwice, but it provides an exact number of records to be selected.\n\nA seed (`-s, --seed`) can be provided to influence the results, e.g.,\nfor a deterministic subset of records.\n\nFor paired input, the sampling is applied to each pair.\n\n#### Usage\n\n```\nOutputs a subset of records\n\nUsage: fq subsample [OPTIONS] --r1-dst \u003cR1_DST\u003e \u003c--probability \u003cPROBABILITY\u003e|--record-count \u003cRECORD_COUNT\u003e\u003e \u003cR1_SRC\u003e [R2_SRC]\n\nArguments:\n  \u003cR1_SRC\u003e  Read 1 source. Accepts both raw and gzipped FASTQ inputs\n  [R2_SRC]  Read 2 source. Accepts both raw and gzipped FASTQ inputs\n\nOptions:\n  -p, --probability \u003cPROBABILITY\u003e    The probability a record is kept, as a percentage (0.0, 1.0). Cannot be used with `record-count`\n  -n, --record-count \u003cRECORD_COUNT\u003e  The exact number of records to keep. Cannot be used with `probability`\n  -s, --seed \u003cSEED\u003e                  Seed to use for the random number generator\n      --r1-dst \u003cR1_DST\u003e              Read 1 destination. Output will be gzipped if ends in `.gz`\n      --r2-dst \u003cR2_DST\u003e              Read 2 destination. Output will be gzipped if ends in `.gz`\n  -h, --help                         Print help\n  -V, --version                      Print version\n```\n\n#### Examples\n\n```sh\n# Sample ~50% of records from a single FASTQ file\n$ fq subsample --probability 0.5 --r1-dst r1.50pct.fastq r1.fastq\n\n# Sample ~50% of records from a single FASTQ file and seed the RNG\n$ fq subsample --probability --seed 13 --r1-dst r1.50pct.fastq r1.fastq\n\n# Sample ~25% of records from paired FASTQ files\n$ fq subsample --probability 0.25 --r1-dst r1.25pct.fastq --r2-dst r2.25pct.fastq r1.fastq r2.fastq\n\n# Sample ~10% of records from a gzipped FASTQ file and compress output\n$ fq subsample --probability 0.1 --r1-dst r1.10pct.fastq.gz r1.fastq.gz\n\n# Sample exactly 10000 records from a single FASTQ file\n$ fq subsample --record-count 10000 -r1-dst r1.10k.fastq r1.fastq\n```\n\n## Legal\n\nPlease see [the disclaimer](https://github.com/stjude-rust-labs#disclaimer) that\napplies to all crates and command line tools made available by St. Jude Rust\nLabs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstjude-rust-labs%2Ffq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstjude-rust-labs%2Ffq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstjude-rust-labs%2Ffq/lists"}