{"id":20640695,"url":"https://github.com/rhpvorderman/sequali","last_synced_at":"2025-06-14T02:35:57.535Z","repository":{"id":142375911,"uuid":"613270250","full_name":"rhpvorderman/sequali","owner":"rhpvorderman","description":"Fast sequencing data quality metrics","archived":false,"fork":false,"pushed_at":"2025-05-26T16:39:21.000Z","size":7028,"stargazers_count":26,"open_issues_count":9,"forks_count":0,"subscribers_count":1,"default_branch":"develop","last_synced_at":"2025-05-26T17:47:22.959Z","etag":null,"topics":["bam","fastq","illumina","nanopore","qc","quality-control"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rhpvorderman.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-03-13T08:45:16.000Z","updated_at":"2025-05-26T16:39:25.000Z","dependencies_parsed_at":"2023-09-22T09:29:58.828Z","dependency_job_id":"d99fc674-032b-469a-b895-1e6753df32a7","html_url":"https://github.com/rhpvorderman/sequali","commit_stats":null,"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"purl":"pkg:github/rhpvorderman/sequali","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhpvorderman%2Fsequali","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhpvorderman%2Fsequali/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhpvorderman%2Fsequali/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhpvorderman%2Fsequali/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rhpvorderman","download_url":"https://codeload.github.com/rhpvorderman/sequali/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhpvorderman%2Fsequali/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259749584,"owners_count":22905734,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bam","fastq","illumina","nanopore","qc","quality-control"],"created_at":"2024-11-16T15:30:43.567Z","updated_at":"2025-06-14T02:35:57.524Z","avatar_url":"https://github.com/rhpvorderman.png","language":"C","readme":".. |python-version-shield| image:: https://img.shields.io/pypi/v/sequali.svg\n  :target: https://pypi.org/project/sequali/\n  :alt:\n\n.. |conda-version-shield| image:: https://img.shields.io/conda/v/bioconda/sequali.svg\n  :target: https://bioconda.github.io/recipes/sequali/README.html\n  :alt:\n\n.. |python-install-version-shield| image:: https://img.shields.io/pypi/pyversions/sequali.svg\n  :target: https://pypi.org/project/sequali/\n  :alt:\n\n.. |license-shield| image:: https://img.shields.io/pypi/l/sequali.svg\n  :target: https://github.com/rhpvorderman/sequali/blob/main/LICENSE\n  :alt:\n\n.. |docs-shield| image:: https://readthedocs.org/projects/sequali/badge/?version=latest\n  :target: https://sequali.readthedocs.io/en/latest/?badge=latest\n  :alt:\n\n.. |coverage-shield| image:: https://codecov.io/gh/rhpvorderman/sequali/graph/badge.svg?token=MSR1A6BEGC\n  :target: https://codecov.io/gh/rhpvorderman/sequali\n  :alt:\n\n.. |zenodo-shield| image:: ./docs/_static/images/doi_image.svg\n  :target: https://doi.org/10.1093/bioadv/vbaf010\n  :alt:\n\n|python-version-shield| |conda-version-shield| |python-install-version-shield|\n|license-shield| |docs-shield| |coverage-shield| |zenodo-shield|\n\n========\nSequali\n========\n\n.. introduction start\n\nSequence quality metrics for FASTQ and uBAM files.\n\nFeatures:\n\n+ `MultiQC \u003chttps://multiqc.info\u003e`_ support since MultiQC version 1.22.\n+ Low memory footprint, small install size and fast execution times.\n\n  + Sequali typically needs less than 2 GB of memory and 3-30 minutes runtime\n    when run on 2 cores (the default).\n+ Informative graphs that allow for judging the quality of a sequence at\n  a quick glance.\n+ Overrepresentation analysis using 21 bp sequence fragments. Overrepresented\n  sequences are checked against the NCBI univec database.\n+ Estimate duplication rate using a `fingerprint subsampling technique which is\n  also used in filesystem duplication estimation\n  \u003chttps://www.usenix.org/system/files/conference/atc13/atc13-xie.pdf\u003e`_.\n+ Checks for 6 illumina adapter sequences and 17 nanopore adapter sequences\n  for single read data.\n+ Determines adapters by overlap analysis for paired read data.\n+ Insert size metrics for paired read data.\n+ Per tile quality plots for illumina reads.\n+ Channel and other plots for nanopore reads.\n+ FASTQ and unaligned BAM are supported. See \"Supported formats\".\n+ Reproducible reports without timestamps.\n\nExample reports:\n\n+ `GM24385_1.fastq.gz \u003chttps://sequali.readthedocs.io/en/latest/GM24385_1.fastq.gz.html\u003e`_;\n  HG002 (Genome In A Bottle) on ultra-long Nanopore Sequencing. ENA accession:\n  `ERR3988483 \u003chttps://www.ebi.ac.uk/ena/browser/view/ERR3988483\u003e`_.\n+ `GM24385_1_cut.fastq.gz \u003chttps://sequali.readthedocs.io/en/latest/GM24385_1_cut.fastq.gz.html\u003e`_;\n  ``GM24385_1.fastq.gz`` processed with cutadapt:\n  ``cutadapt -o GM24385_1_cut.fastq.gz --cut -64 --cut 64 --minimum-length 500 -Z --max-aer 0.1 GM24385_1.fastq.gz``.\n  The resulting file has 64 bp cut off from both its ends and after that\n  filtered for a minimum length of 500 and a maximum average error rate of 0.1.\n+ `21C125_R1.fastq.gz \u003chttps://sequali.readthedocs.io/en/latest/21C125_R1.fastq.gz.html\u003e`_;\n  Illumina NovaSeq X paired-end sequencing of *Campylobacter jejuni*. ENA accession:\n  `ERR11204024 \u003chttps://www.ebi.ac.uk/ena/browser/view/ERR11204024\u003e`_.\n\n.. introduction end\n\nFor more information check `the documentation \u003chttps://sequali.readthedocs.io\u003e`_.\n\nSupported formats\n=================\n\n.. formats start\n\n- FASTQ. Only the Sanger variation with a phred offset of 33 and the error rate\n  calculation of 10 ^ (-phred/10) is supported. All sequencers use this\n  format today.\n\n  - Paired end sequencing data is supported.\n  - For sequences called by illumina base callers an additional plot with the\n    per tile quality will be provided.\n  - For sequences called by guppy additional plots for nanopore specific\n    data will be provided.\n- (unaligned) BAM with single reads. Read-pair information is currently ignored.\n\n  - For BAM data as delivered by dorado additional nanopore plots will be\n    provided.\n  - For aligned BAM files, secondary and supplementary reads are ignored\n    similar to how ``samtools fastq`` handles the data.\n\n.. formats end\n\nInstallation\n============\n\n.. installation start\n\nInstallation via pip is available with::\n\n    pip install sequali\n\nSequali is also distributed via bioconda. It can be installed with::\n\n    conda install -c conda-forge -c bioconda sequali\n\n.. installation end\n\nQuickstart\n==========\n\n.. quickstart start\n\n.. code-block::\n\n    sequali path/to/my.fastq.gz\n\nThis will create a report ``my.fastq.gz.html`` and a json ``my.fastq.gz.json``\nin the current working directory.\n\nTo set the directory where the reports are created the ``--outdir`` flag can\nbe used. This is useful when using [MultiQC](https://github.com/multiqc/multiqc).\n\n.. code-block::\n\n    sequali --out-dir /my/dir/all_sequali_reports my.fastq.gz\n\nThe html and json filenames can be set separately.\n\n.. code-block::\n\n    sequali --html before_qc.html --json before_qc.json my.fastq.gz\n    sequali --html after_qc.html --json after_qc.json my.cutadapt.fastq.gz\n\nSequali can handle paired-end data.\n\n.. code-block::\n\n    sequali /sequencing_data/sample100_R1.fastq.gz /sequencing_data/sample100_R2.fastq.gz\n\nAdditionally sequali can handle BAM data. Proper pair handling is not yet supported for\nBAM data, so this is primarily useful for ONT datasets.\n\n.. code-block::\n\n    sequali /sequencing_data/sample100_dorado_called_hac_v4.30.bam\n\nSequali by default uses one thread per compressed input file and one thread for\nthe read processing, typically keeping two cores busy. Sequali can also use a single\ncore, which is slower, but typically more efficient for HPC scenarios where\nmultiple files can be run simultaneously. (Below a SLURM example.)\n\n.. code-block::\n\n    sbatch -c 1 --time 59 --partition short \\\n    --wrap 'sequali --threads 1 /cluster-scratch/myusername/my.fastq.gz'\n\nUsing a thread count higher than ``2`` has no effect. Due to the decompression\nbottleneck, bringing the full power of multithreading to Sequali has limited\nutility whilst having a disproportionally high cost in additional code\ncomplexity.\n\n.. quickstart end\n\nFor all command line options checkout the\n`usage documentation \u003chttps://sequali.readthedocs.io/#usage\u003e`_.\n\nFor more extensive information about the module options check the\n`documentation on the module options\n\u003chttps://sequali.readthedocs.io/#module-option-explanations\u003e`_.\n\nAcknowledgements\n================\n\n.. acknowledgements start\n\n+ `FastQC \u003chttps://www.bioinformatics.babraham.ac.uk/projects/fastqc/\u003e`_ for\n  its excellent selection of relevant metrics. For this reason these metrics\n  are also gathered by Sequali.\n+ The matplotlib team for their excellent work on colormaps. Their work was\n  an inspiration for how to present the data and their RdBu colormap is used\n  to represent quality score data. Check their `writings on colormaps\n  \u003chttps://matplotlib.org/stable/users/explain/colors/colormaps.html\u003e`_ for\n  a good introduction.\n+ Wouter de Coster for his `excellent post on how to correctly average phred\n  scores \u003chttps://gigabaseorgigabyte.wordpress.com/2017/06/26/averaging-basecall-quality-scores-the-right-way/\u003e`_\n  as well as the idea for using end-anchored plots from `NanoQC\n  \u003chttps://github.com/wdecoster/nanoQC\u003e`_.\n+ Marcel Martin for providing very extensive feedback.\n+ Agnès Barnabé for creating a Galaxy wrapper.\n\n.. acknowledgements end\n\nCitation\n========\n.. citation start\n\nIf you wish to credit Sequali please cite `the Sequali article\n\u003chttps://doi.org/10.1093/bioadv/vbaf010\u003e`_.\n\n.. citation end\n\nLicense\n=======\n\n.. license start\n\nThis project is licensed under the GNU Affero General Public License v3. Mainly\nto avoid commercial parties from using it without notifying the users that they\ncan run it themselves. If you want to include code from Sequali in your\nopen source project, but it is not compatible with the AGPL, please contact me\nand we can discuss a separate license.\n\n.. license end","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhpvorderman%2Fsequali","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frhpvorderman%2Fsequali","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhpvorderman%2Fsequali/lists"}