{"id":20694739,"url":"https://github.com/vmikk/phredsort","last_synced_at":"2026-04-29T09:01:45.374Z","repository":{"id":261475381,"uuid":"883305374","full_name":"vmikk/phredsort","owner":"vmikk","description":"`phredsort` is a cli tool for sorting sequences in a FASTQ file by their quality scores","archived":false,"fork":false,"pushed_at":"2025-12-11T17:19:33.000Z","size":655,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-22T20:49:36.442Z","etag":null,"topics":["bash","bioinformatics","cli","fastq","phred-quality-scores","sequence-quality"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vmikk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-11-04T18:24:16.000Z","updated_at":"2025-12-11T17:19:36.000Z","dependencies_parsed_at":"2024-12-19T20:25:21.437Z","dependency_job_id":"617977f1-55d0-48ad-a474-a2619add41c6","html_url":"https://github.com/vmikk/phredsort","commit_stats":null,"previous_names":["vmikk/phredsort"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/vmikk/phredsort","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2Fphredsort","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2Fphredsort/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2Fphredsort/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2Fphredsort/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vmikk","download_url":"https://codeload.github.com/vmikk/phredsort/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vmikk%2Fphredsort/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32418173,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash","bioinformatics","cli","fastq","phred-quality-scores","sequence-quality"],"created_at":"2024-11-17T00:06:14.717Z","updated_at":"2026-04-29T09:01:45.369Z","avatar_url":"https://github.com/vmikk.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# phredsort\n\n[![DOI](https://zenodo.org/badge/883305374.svg)](https://doi.org/10.5281/zenodo.14395125)\n[![codecov](https://codecov.io/gh/vmikk/phredsort/graph/badge.svg?token=RPMFI9XT67)](https://codecov.io/gh/vmikk/phredsort)\n\n`phredsort` is a command-line tool for sorting sequences in FASTQ files by their quality scores.\n\n## Usage\n\nBasic usage:\n```bash\n# Read from `input.fastq.gz` and write to `output.fastq.gz`\nphredsort -i input.fastq.gz -o output.fastq.gz\n\n# Read from stdin and write to stdout (default when -i/-o not specified)\nzcat input.fastq.gz | phredsort | less -S\n\n# Explicit stdin/stdout (equivalent to above)\nzcat input.fastq.gz | phredsort -i - -o - | less -S\n```\n\n![phredsort help message](assets/phredsort.webp)\n\n\n### Sort sequences using pre-computed maxEE scores in headers\n```bash\nphredsort headersort -i input.fasta -o output.fasta --metric maxee\n```\n\n### Sort by avgphred scores with quality filtering\n```bash\nphredsort headersort -i input.fastq -o output.fastq --metric avgphred --minqual 20 --maxqual 40\n```\n\n### Sort in ascending order (lower quality first)\n```bash\nphredsort headersort -i input.fa -o output.fa --metric meep --ascending\n```\n\nExamples of supported header formats:\n- Space-separated: \"\u003eseq1 maxee=2.5 size=100\"\n- Semicolon-separated: \"\u003eseq1;maxee=2.5;size=100\"\n\n\n\n## Installation\n\n### Download compiled binary (for Linux)\n\n```bash\nwget https://github.com/vmikk/phredsort/releases/download/1.4.0/phredsort\nchmod +x phredsort\n./phredsort --help\n```\n\n### Build from source\n\n```bash\ngit clone --depth 1 https://github.com/vmikk/phredsort\ncd phredsort\ngo build -ldflags=\"-s -w\" phredsort.go\n./phredsort --help\n```\n\n\n## Quality metrics\n\n`phredsort` supports several metrics (`--metric` parameter) to assess sequence quality:\n\n#### 1. (Back-transformed) average Phred score (`avgphred`)\n- Properly calculated mean quality score that accounts for the logarithmic nature of Phred scores\n- Converts Phred scores to error probabilities, calculates their arithmetic mean, then converts back to Phred scale\n- Formula: `-10 * log10(mean(10^(-Q/10)))`\n- More accurate than simple arithmetic mean of Phred scores, which would overestimate quality\n\n#### 2. Maximum expected error (`maxee`) (as per Edgar \u0026 Flyvbjerg, 2014)\n- Sum of error probabilities for all bases in a sequence\n- Formula: `sum(10^(-Q/10))`\n- Higher values indicate lower quality\n- Depends on sequence length (longer sequences tend to have higher MaxEE)\n\n#### 3. Maximum expected error percentage (`meep`)\n- MaxEE standardized by sequence length\n- Represents expected number of errors per 100 bases\n- Formula: `(MaxEE * 100) / sequence_length`\n- Higher values indicate lower quality\n- Allows fair comparison between sequences of different lengths\n\n#### 4. Low quality base count (`lqcount`)\n- Number of bases below specified quality threshold\n- Useful for binned quality scores (e.g., data from Illumina NovaSeq platform)\n- Counts bases with Phred score \u003c threshold (default: 15)\n- Higher values indicate lower quality\n\n#### 5. Low quality base percentage (`lqpercent`)\n- Percentage of bases below quality threshold\n- Formula: `(lqcount * 100) / sequence_length`\n- Higher values indicate lower quality\n- Normalizes low-quality base count by sequence length\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvmikk%2Fphredsort","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvmikk%2Fphredsort","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvmikk%2Fphredsort/lists"}