{"id":13703760,"url":"https://github.com/mdshw5/fastqp","last_synced_at":"2026-02-24T02:11:08.893Z","repository":{"id":10779058,"uuid":"13045699","full_name":"mdshw5/fastqp","owner":"mdshw5","description":"Simple FASTQ quality assessment using Python","archived":false,"fork":false,"pushed_at":"2021-05-22T02:00:12.000Z","size":2697,"stargazers_count":108,"open_issues_count":14,"forks_count":15,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-05-13T17:38:37.317Z","etag":null,"topics":["bioinformatics","fastq","kmer-distribution","nucleotide-plot","python","sam"],"latest_commit_sha":null,"homepage":"https://pypi.python.org/pypi/fastqp","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"halkeye/codacy-maven-plugin","license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mdshw5.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-09-23T20:01:47.000Z","updated_at":"2025-03-27T01:16:33.000Z","dependencies_parsed_at":"2022-09-23T00:53:03.933Z","dependency_job_id":null,"html_url":"https://github.com/mdshw5/fastqp","commit_stats":null,"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/mdshw5/fastqp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdshw5%2Ffastqp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdshw5%2Ffastqp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdshw5%2Ffastqp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdshw5%2Ffastqp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mdshw5","download_url":"https://codeload.github.com/mdshw5/fastqp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdshw5%2Ffastqp/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260901647,"owners_count":23079802,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","fastq","kmer-distribution","nucleotide-plot","python","sam"],"created_at":"2024-08-02T21:00:59.797Z","updated_at":"2026-02-24T02:11:08.886Z","avatar_url":"https://github.com/mdshw5.png","language":"Python","funding_links":[],"categories":["Next Generation Sequencing"],"sub_categories":["Sequence Processing"],"readme":"fastqp\n======\n[![CI](https://github.com/mdshw5/fastqp/actions/workflows/ci.yml/badge.svg)](https://github.com/mdshw5/fastqp/actions/workflows/ci.yml)\n[![PyPI](https://img.shields.io/pypi/v/fastqp.svg?)](https://pypi.python.org/pypi/fastqp)\n\nSimple FASTQ, SAM and BAM read quality assessment and plotting using Python.\n\nFeatures\n--------\n\n- Requires only Python with Numpy, Scipy, and Matplotlib libraries\n- Works with (gzipped) FASTQ, SAM, and BAM formatted reads\n- Tabular, tidy, output statistics so you can create your own graphs\n- A useful set of default graphics rivaling comparable QC packages\n- Counts *all* IUPAC ambiguous nucleotide codes (NMWSKRYVHDB) if present in sequences\n- Downsamples input files to around 2,000,000 reads (user adjustable)\n- Allows a 5′ and 3′ (left and right) cycle limit for graphics generation\n- Tracks kmers and sequence duplication for the *entire* input file\n- Plots base call reference mismatches for aligned reads\n- Optional sequence duplication calculation using Bloom filters (beta)\n\nRequirements\n------------\n\nTested on Python 2.7, and 3.4\n\nTested on Mac OS 10.10 and Linux 2.6.18\n\nInstallation\n------------\n\n    pip install [--user] fastqp\n\nNote: BAM file support requires [samtools](https://github.com/samtools/samtools)\n\nUsage\n-----\n\n```\nusage: fastqp [-h] [-q] [-s BINSIZE] [-a NAME] [-n NREADS] [-p BASE_PROBS] [-k {2,3,4,5,6,7}] [-o OUTPUT]\n              [-ll LEFTLIMIT] [-rl RIGHTLIMIT] [-mq MEDIAN_QUAL] [--aligned-only | --unaligned-only] [-d]\n              input\n\nsimple NGS read quality assessment using Python\n\npositional arguments:\n  input                 input file (one of .sam, .bam, .fq, or .fastq(.gz) or stdin (-))\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -q, --quiet           do not print any messages (default: False)\n  -s BINSIZE, --binsize BINSIZE\n                        number of reads to bin for sampling (default: auto)\n  -a NAME, --name NAME  sample name identifier for text and graphics output (default: input file name)\n  -n NREADS, --nreads NREADS\n                        number of reads sample from input (default: 2000000)\n  -p BASE_PROBS, --base-probs BASE_PROBS\n                        probabilites for observing A,T,C,G,N in reads (default: 0.25,0.25,0.25,0.25,0.1)\n  -k {2,3,4,5,6,7}, --kmer {2,3,4,5,6,7}\n                        length of kmer for over-repesented kmer counts (default: 5)\n  -o OUTPUT, --output OUTPUT\n                        base name for output files (default: fastqp_figures)\n  -ll LEFTLIMIT, --leftlimit LEFTLIMIT\n                        leftmost cycle limit (default: 1)\n  -rl RIGHTLIMIT, --rightlimit RIGHTLIMIT\n                        rightmost cycle limit (-1 for none) (default: -1)\n  -mq MEDIAN_QUAL, --median-qual MEDIAN_QUAL\n                        median quality threshold for failing QC (default: 30)\n  --aligned-only        only aligned reads (default: False)\n  --unaligned-only      only unaligned reads (default: False)\n  -d, --count-duplicates\n                        calculate sequence duplication rate (default: False)\n```\n\nChanges\n-------\n\nSee [releases page](https://github.com/mdshw5/fastqp/releases) for details.\n\nExamples\n--------\n\n![quality heatmap](https://raw.github.com/mdshw5/fastqp/master/examples/example_qualmap.png)\n\n![gc plot](https://raw.github.com/mdshw5/fastqp/master/examples/example_gc.png)\n\n![gc distribution](https://raw.github.com/mdshw5/fastqp/master/examples/example_gcdist.png)\n\n![nucleotide plot](https://raw.github.com/mdshw5/fastqp/master/examples/example_nucs.png)\n\n![nucleotide mismatch plot](https://raw.github.com/mdshw5/fastqp/master/examples/example_mismatch.png)\n\n![kmer distribution](https://raw.github.com/mdshw5/fastqp/master/examples/example_kmers.png)\n\n![depth plot](https://raw.github.com/mdshw5/fastqp/master/examples/example_depth.png)\n\n![quality percentiles](https://raw.github.com/mdshw5/fastqp/master/examples/example_quals.png)\n\n![quality distribution](https://raw.github.com/mdshw5/fastqp/master/examples/example_qualdist.png)\n\n![adapter kmer distribution](https://raw.github.com/mdshw5/fastqp/master/examples/example_adapters.png)\n\n\nAcknowledgements\n----------------\nThis project is freely licensed by the author, [Matthew Shirley](http://mattshirley.com), and\nwas completed under the mentorship financial support of Drs. [Sarah Wheelan](http://sjwheelan.som.jhmi.edu)\nand [Vasan Yegnasubramanian](http://yegnalab.onc.jhmi.edu) at the Sidney Kimmel Comprehensive\nCancer Center in the Department of Oncology.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdshw5%2Ffastqp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmdshw5%2Ffastqp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdshw5%2Ffastqp/lists"}