{"id":23768585,"url":"https://github.com/nci-gdc/bio_qcmetrics_tool","last_synced_at":"2026-05-13T22:35:19.345Z","repository":{"id":187417317,"uuid":"134471016","full_name":"NCI-GDC/bio_qcmetrics_tool","owner":"NCI-GDC","description":"Framework for serializing QC metrics into different formats for bioinformatics workflows","archived":false,"fork":false,"pushed_at":"2025-11-18T01:44:23.000Z","size":567,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-11-18T03:15:04.333Z","etag":null,"topics":["bioinformatics","docker","workflow-tool"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NCI-GDC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2018-05-22T20:17:44.000Z","updated_at":"2025-11-18T01:44:27.000Z","dependencies_parsed_at":"2023-08-10T09:41:34.109Z","dependency_job_id":"7f2dd5ad-ddd3-4bdf-a039-54cb5aded47f","html_url":"https://github.com/NCI-GDC/bio_qcmetrics_tool","commit_stats":null,"previous_names":["nci-gdc/bio_qcmetrics_tool"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/NCI-GDC/bio_qcmetrics_tool","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fbio_qcmetrics_tool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fbio_qcmetrics_tool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fbio_qcmetrics_tool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fbio_qcmetrics_tool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NCI-GDC","download_url":"https://codeload.github.com/NCI-GDC/bio_qcmetrics_tool/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fbio_qcmetrics_tool/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33002813,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"ssl_error","status_checked_at":"2026-05-13T13:14:51.610Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","docker","workflow-tool"],"created_at":"2025-01-01T01:37:40.929Z","updated_at":"2026-05-13T22:35:19.325Z","avatar_url":"https://github.com/NCI-GDC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bio_qcmetrics_tool\n\n[![Docker Repository on Quay](https://quay.io/repository/ncigdc/bio-qcmetrics-tool/status?token=6f93e569-076e-45cc-b52f-ac0aba79b5c5 \"Docker Repository on Quay\")](https://quay.io/repository/ncigdc/bio-qcmetrics-tool)\n\n[![Build Status](https://travis-ci.com/NCI-GDC/bio_qcmetrics_tool.svg?token=p66Cyx1mwd8vvwEuBvRM\u0026branch=master)](https://travis-ci.com/NCI-GDC/bio_qcmetrics_tool)\n\nFramework for serializing QC metrics into different formats for bioinformatics workflows. Currently,\nonly the ability to take the metrics files and convert to sqlite is supported. The ability to add new\nmodules is simple, by just inheriting the `ExportQcModule` class and the tool is automatically\nadded to the CLI.\n\nSome of the log/metrics file parsing logic was adapted from:\n\n```\n    MultiQC: Summarize analysis results for multiple tools and samples in a single report\n    Philip Ewels, Mans Magnusson, Sverker Lundin and Max Kaller\n    Bioinformatics (2016)\n    doi: 10.1093/bioinformatics/btw354\n    PMID: 27312411 \n    https://github.com/ewels/MultiQC\n```\n\n## Install\n\nIn a python3.5 virtual environment, run `make init`.\n\nThis will install `pre-commit` and `pip install` the dependencies in the `requirements.txt` file.\n\n## Export\n\nExtract raw metrics files into a sqlite db.\n\n* Fastqc\n```\nusage: bio-qcmetrics-tool export fastqc [-h] -i INPUTS -j JOB_UUID\n                                        --export_format {sqlite} -o OUTPUT\n\nExtract FastQC report from zip archive(s).\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i INPUTS, --inputs INPUTS\n                        Input fastqc zip file. May be used one or more times\n  -j JOB_UUID, --job_uuid JOB_UUID\n                        The job uuid associated with the inputs.\n  --export_format {sqlite}\n                        The available formats to export\n  -o OUTPUT, --output OUTPUT\n                        The path to the output file\n```\n\n* Picard\n```\nusage: bio-qcmetrics-tool export picardmetrics [-h] -i INPUTS -j JOB_UUID\n                                               [--derived_from_file DERIVED_FROM_FILE]\n                                               --export_format {sqlite} -o\n                                               OUTPUT\n\nExtract Picard metrics.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i INPUTS, --inputs INPUTS\n                        Input picard metrics file. May be used one or more\n                        times\n  -j JOB_UUID, --job_uuid JOB_UUID\n                        The job uuid associated with the inputs.\n  --derived_from_file DERIVED_FROM_FILE\n                        The file that the metrics were drived from (e.g., bam\n                        file).\n  --export_format {sqlite}\n                        The available formats to export\n  -o OUTPUT, --output OUTPUT\n                        The path to the output file\n                        \n ```\n \n * Readgroup metadata\n ```\n usage: bio-qcmetrics-tool export readgroup [-h] -i INPUTS -j JOB_UUID -b BAM\n                                           --export_format {sqlite} -o OUTPUT\n\nExtract readgroup metadata\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i INPUTS, --inputs INPUTS\n                        Input readgroup JSON files\n  -j JOB_UUID, --job_uuid JOB_UUID\n                        The job uuid associated with the inputs.\n  -b BAM, --bam BAM     The bam associated with the inputs.\n  --export_format {sqlite}\n                        The available formats to export\n  -o OUTPUT, --output OUTPUT\n                        The path to the output file\n ```\n \n * samtools\n \n **flagstats**\n \n ```\n usage: bio-qcmetrics-tool export samtoolsflagstats [-h] -i INPUTS -j JOB_UUID\n                                                   -b BAM --export_format\n                                                   {sqlite} -o OUTPUT\n\nExtract samtools flagstats metrics.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i INPUTS, --inputs INPUTS\n                        Input flagstats file. May be used one or more times\n  -j JOB_UUID, --job_uuid JOB_UUID\n                        The job uuid associated with the inputs.\n  -b BAM, --bam BAM     The bam that the metrics were derived from.\n  --export_format {sqlite}\n                        The available formats to export\n  -o OUTPUT, --output OUTPUT\n                        The path to the output file\n\n ```\n \n **idxstats**\n \n ```\n usage: bio-qcmetrics-tool export samtoolsidxstats [-h] -i INPUTS -j JOB_UUID\n                                                  -b BAM --export_format\n                                                  {sqlite} -o OUTPUT\n\nExtract samtools idxstats metrics.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i INPUTS, --inputs INPUTS\n                        Input idxstats file. May be used one or more times\n  -j JOB_UUID, --job_uuid JOB_UUID\n                        The job uuid associated with the inputs.\n  -b BAM, --bam BAM     The bam that the metrics were derived from.\n  --export_format {sqlite}\n                        The available formats to export\n  -o OUTPUT, --output OUTPUT\n                        The path to the output file\n ```\n \n **stats**\n \n ```\n usage: bio-qcmetrics-tool export samtoolsstats [-h] -i INPUT -j JOB_UUID -b\n                                               BAM --export_format {sqlite} -o\n                                               OUTPUT\n\nExtract samtools stats metrics.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i INPUT, --input INPUT\n                        Input stats file\n  -j JOB_UUID, --job_uuid JOB_UUID\n                        The job uuid associated with the inputs.\n  -b BAM, --bam BAM     The bam that the metrics were derived from.\n  --export_format {sqlite}\n                        The available formats to export\n  -o OUTPUT, --output OUTPUT\n                        The path to the output file\n ```\n \n * STAR\n ```\n usage: bio-qcmetrics-tool export starstats [-h]\n                                           [--final_log_inputs FINAL_LOG_INPUTS]\n                                           [--gene_counts_inputs GENE_COUNTS_INPUTS]\n                                           -j JOB_UUID [--bam BAM]\n                                           --export_format {sqlite} -o OUTPUT\n\nExtract STAR logs/gene counts metrics.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --final_log_inputs FINAL_LOG_INPUTS\n                        Input star log file.\n  --gene_counts_inputs GENE_COUNTS_INPUTS\n                        Input star gene counts file.\n  -j JOB_UUID, --job_uuid JOB_UUID\n                        The job uuid associated with the inputs.\n  --bam BAM             The bam file associated with the inputs in the same\n                        order.\n  --export_format {sqlite}\n                        The available formats to export\n  -o OUTPUT, --output OUTPUT\n                        The path to the output file\n ```\n\n## Adding new exporters\n\nAll new exporter tools should inherit from `bio_qcmetrics_tool.modules.base.ExportQcModule`. All exporters will\nautomatically have `--export_format` and `--output` command-line parameters added.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnci-gdc%2Fbio_qcmetrics_tool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnci-gdc%2Fbio_qcmetrics_tool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnci-gdc%2Fbio_qcmetrics_tool/lists"}