{"id":18373532,"url":"https://github.com/shujiahuang/cmdbtools","last_synced_at":"2025-04-06T19:32:20.143Z","repository":{"id":50276817,"uuid":"158210103","full_name":"ShujiaHuang/cmdbtools","owner":"ShujiaHuang","description":"Command line tools for CMDB varaints browser","archived":false,"fork":false,"pushed_at":"2024-05-14T06:06:15.000Z","size":1550,"stargazers_count":23,"open_issues_count":3,"forks_count":7,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-22T06:03:29.747Z","etag":null,"topics":["bioinformatics","chinese-genome-database","chinese-millionome-database","cmdb","database","genomics-api","variants","vcf"],"latest_commit_sha":null,"homepage":"http://cmdb.bgi.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ShujiaHuang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-19T11:25:35.000Z","updated_at":"2024-05-14T06:06:19.000Z","dependencies_parsed_at":"2024-11-06T00:10:55.195Z","dependency_job_id":"06452599-8f06-4d77-a4aa-5b26dc56b352","html_url":"https://github.com/ShujiaHuang/cmdbtools","commit_stats":{"total_commits":86,"total_committers":6,"mean_commits":"14.333333333333334","dds":0.4767441860465116,"last_synced_commit":"d3a419781e8345864c574554e2d07813669b7082"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShujiaHuang%2Fcmdbtools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShujiaHuang%2Fcmdbtools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShujiaHuang%2Fcmdbtools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ShujiaHuang%2Fcmdbtools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ShujiaHuang","download_url":"https://codeload.github.com/ShujiaHuang/cmdbtools/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247539301,"owners_count":20955288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","chinese-genome-database","chinese-millionome-database","cmdb","database","genomics-api","variants","vcf"],"created_at":"2024-11-06T00:10:44.795Z","updated_at":"2025-04-06T19:32:17.726Z","avatar_url":"https://github.com/ShujiaHuang.png","language":"Python","readme":"cmdbtools: A command line tools for CMDB varaints browser\n=========================================================\n\n[![PyPI Version](https://img.shields.io/pypi/v/cmdbtools.svg)](https://pypi.org/project/cmdbtools/)\n[![License](https://img.shields.io/pypi/l/cmdbtools.svg)](https://github.com/ShujiaHuang/cmdbtools/blob/master/LICENSE)\n\nIntroduction\n------------\n\nChina is the most populous country and the second largest economy in the\nworld. However, the construction of Chinese genome database is in slow\nprogress. At present, among the world\\'s large-scale international and\nnational genome sequencing projects, such as 1KGP, Genomics England,\nGenome of the Netherlands, ExAC are mostly biased towards the\nconstruction of a genomic baseline for European populations. In those\nprojects, while the sample size goes up to hundreds of thousands for\nsamples with european ancestry in those database, the sequen-cing\nChinese samples is no more than a thousand.\n\nSince a high-quality genomic baseline database serves as an important\ncontrol for medical research and population-oriented clinical and drug\napplications, the Chinese millionome database (CMDB) is developed to\nfill the gap.\n\nThe [Chinese Millionome Database(CMDB)](https://db.cngb.org/cmdb/) is a\nunique large-scale Chinese genomics database produced by BGI and hosted\nin the National GeneBank. The CMDB delivers peridical and useful\nvariation information and scientific insights derived from the analysis\nof millions of Chinese sequencing data. The results aim to promote\ngenetic research and precision medicine actions in China.\n\nThe delivering information includes any of detected variants and the\ncorresponding allele frequency, annotation, frequency comparison to the\nglobal populations from existing databases, etc.\n\nBenchmarking detail and methods are described in our *Cell* paper:\n\nLiu, S. et al.(2018) Genomic Analyses from Non-invasive Prenatal Testing\nReveal Genetic Associations, Patterns of Viral Infections, and Chinese\nPopulation History. *Cell*, 2, 347-359.\n[DOI:https://doi.org/10.1016/j.cell.2018.08.016](https://doi.org/10.1016/j.cell.2018.08.016)\n\n**cmdbtools** is a command line tool for this CMDB variants browser.\n\nQuick start\n-----------\n\nCMDB variant browser allows authorized access its data through an\nGenomics API and **cmdbtools** is a convenient command line tools for\nthis purpose.\n\nInstallation\n------------\n\nInstall the released version by `pip` (Only support Python3 since v1.1.0):\n\n```bash\npip install cmdbtools\n```\n\nSetup\n-----\n\nPlease enable your API access from Profile in [CMDB\nbrowser](https://db.cngb.org/cmdb) before using **cmdbtools**.\n\nLogin\n-----\n\nLogin with `cmdbtools` by using CMDB API access key, which could be\nfound from Profile-\\\u003eGenomics API if you have apply for it.\n\n[![cmdb_genomics_api](assets/figures/cmdb_genomics_api.png)](assets/figures/cmdb_genomics_api.png)\n\n```bash\ncmdbtools login -k your-genomics-api-key\n```\n\nIf everything goes smoothly, **means you can use CMDB as one of your\nvaraints database in command line mode**.\n\nLogout\n------\n\nLogout `cmdbtools` by simply run the command below:\n\n```bash\ncmdbtool logout\n```\n\nQuery a single variant\n----------------------\n\nVariants could be retrieved from CMDB by using `query-varaint`.\n\nRun `cmdbtools query-variant -h` to see all available options. There\\'re\ntwo different ways to retrive variants.\n\nOne is to use `-c` and `-p` parameters for single variant, the other way\nuses `-l` for multiple positions.\n\nHere are examples for quering single varaint by chromosome name and\nposition.\n\n```bash\ncmdbtools query-variant -c chr17 -p 41234470\n```\n\nand you will get something looks like below:\n\n```bash\n##fileformat=VCFv4.2\n##FILTER=\u003cID=LowQual,Description=\"Low quality\"\u003e\n##INFO=\u003cID=CMDB_AN,Number=1,Type=Integer,Description=\"Number of Alleles in Samples with Coverage from CMDB_hg19_v1.0\"\u003e\n##INFO=\u003cID=CMDB_AC,Number=A,Type=Integer,Description=\"Alternate Allele Counts in Samples with Coverage from CMDB_hg19_v1.0\"\u003e\n##INFO=\u003cID=CMDB_AF,Number=A,Type=Float,Description=\"Alternate Allele Frequencies from CMDB_hg19_v1.0\"\u003e\n##INFO=\u003cID=CMDB_FILTER,Number=A,Type=Float,Description=\"Filter from CMDB_hg19_v1.0\"\u003e\n#CHROM  POS ID  REF ALT QUAL    FILTER  INFO\n17  41234470    rs1060915\u0026CD086610\u0026COSM4416375  A   G   74.38   PASS    CMDB_AF=0.361763,CMDB_AC=4625,CMDB_AN=12757\n```\n\nQuering multiple varants.\n-------------------------\n\nA list of variants could be retrieved from CMDB by using the parameters\nof `-l` when apply by `query-varaint`.\n\n```bash\ncmdbtools query-variant -l positions.list \u003e result.vcf\n```\n\nFormat for [positions.list](tests/positions.list), could be a mixture of\n`chrom   position` and `chrom    start   end`, even with or without\n`chr` in the chromosome ID column:\n\n```\n#CHROM  POS\nchr22   17662378\nchr22   17662408\n22  17662442\n22  17662444\n22  17662699\n22  17662729\n22  17690496\n22  17662353    17663671\n22  17669209    17669357\n```\n\n`result.vcf` is VCF format and looks like below:\n\n```\n##fileformat=VCFv4.2\n##FILTER=\u003cID=LowQual,Description=\"Low quality\"\u003e\n##INFO=\u003cID=CMDB_AN,Number=1,Type=Integer,Description=\"Number of Alleles in Samples with Coverage from CMDB_hg19_v1.0\"\u003e\n##INFO=\u003cID=CMDB_AC,Number=A,Type=Integer,Description=\"Alternate Allele Counts in Samples with Coverage from CMDB_hg19_v1.0\"\u003e\n##INFO=\u003cID=CMDB_AF,Number=A,Type=Float,Description=\"Alternate Allele Frequencies from CMDB_hg19_v1.0\"\u003e\n##INFO=\u003cID=CMDB_FILTER,Number=A,Type=Float,Description=\"Filter from CMDB_hg19_v1.0\"\u003e\n#CHROM  POS ID  REF ALT QUAL    FILTER  INFO\nchr22   17662699    rs58754958  A   G   59.86   PASS    CMDB_AF=0.031047,CMDB_AC=441,CMDB_AN=13553\nchr22   17662793    rs7289170   A   G   64.23   PASS    CMDB_AF=0.050419,CMDB_AC=842,CMDB_AN=16135\nchr22   17669245    rs116020027 G   T   30.3    PASS    CMDB_AF=0.003453,CMDB_AC=43,CMDB_AN=11280\nchr22   17690409    rs362129    G   A   32.3    PASS    CMDB_AF=0.065438,CMDB_AC=686,CMDB_AN=10236\n```\n\nYou can even use `-c` `-p` and `-l` simultaneously if you like.\n\n```bash\ncmdbtools query-variant -c 22 -p 46616520 -l positions.list \u003e result.vcf\n```\n\nAnnotate your VCF files\n-----------------------\n\nAnnotate your VCF file with CMDB by using `cmdbtools annotate` command.\n\nDownload a list of example variants in VCF format from\n[multiple_samples.vcf.gz](tests/multiple_samples.vcf.gz). To annotate\nthis list of variants with allele frequences from CMDB, you can just run\nthe following command in Linux or Mac OS.\n\n```bash\ncmdbtools annotate -i multiple_samples.vcf.gz \u003e multiple_samples_CMDB.vcf\n```\n\nIt\\'ll take about 2 ~ 3 minutes to complete 3,000+ variants\\'\nannotation. Then you will get 4 new fields with the information of CMDB\nin VCF INFO:\n\n-   `CMDB_AF`: Allele frequece in CMDB;\n-   `CMDB_AN`: Coverage in CMDB in population level;\n-   `CMDB_AC`: Allele count in population level in CMDB;\n-   `CMDB_FILTER`: Filter status in CMDB.\n\n```\n##fileformat=VCFv4.2\n##ALT=\u003cID=NON_REF,Description=\"Represents any possible alternative allele at this location\"\u003e\n##FILTER=\u003cID=LowQual,Description=\"Low quality\"\u003e\n##INFO=\u003cID=AC,Number=A,Type=Integer,Description=\"Allele count in genotypes, for each ALT allele, in the same order as listed\"\u003e\n##INFO=\u003cID=AF,Number=A,Type=Float,Description=\"Allele Frequency, for each ALT allele, in the same order as listed\"\u003e\n##INFO=\u003cID=AN,Number=1,Type=Integer,Description=\"Total number of alleles in called genotypes\"\u003e\n##INFO=\u003cID=BaseQRankSum,Number=1,Type=Float,Description=\"Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities\"\u003e\n##reference=file:///home/tools/hg19_reference/ucsc.hg19.fasta\n##INFO=\u003cID=CMDB_AN,Number=1,Type=Integer,Description=\"Number of Alleles in Samples with Coverage from CMDB_hg19_v1.0\"\u003e\n##INFO=\u003cID=CMDB_AC,Number=A,Type=Integer,Description=\"Alternate Allele Counts in Samples with Coverage from CMDB_hg19_v1.0\"\u003e\n##INFO=\u003cID=CMDB_AF,Number=A,Type=Float,Description=\"Alternate Allele Frequencies from CMDB_hg19_v1.0\"\u003e\n##INFO=\u003cID=CMDB_FILTER,Number=A,Type=Float,Description=\"Filter from CMDB_hg19_v1.0\"\u003e\n#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO\nchr21   9413612 .       C       T       6906.62 .       AC=25;AF=0.313;AN=80;BaseQRankSum=0.425;CMDB_AC=2459;CMDB_AF=0.207525;CMDB_AN=11834;CMDB_FILTER=PASS\nchr21   9413629 .       C       T       8028.88 .       AC=30;AF=0.375;AN=80;BaseQRankSum=-1.200e+00;CMDB_AC=6906;CMDB_AF=0.305445;CMDB_AN=22406;CMDB_FILTER=PASS\nchr21   9413700 .       G       A       7723.82 .       AC=30;AF=0.375;AN=80;BaseQRankSum=-9.000e-02\nchr21   9413735 .       C       A       10121.72        .       AC=35;AF=0.438;AN=80;BaseQRankSum=0.977;CMDB_AC=2385;CMDB_AF=0.283965;CMDB_AN=8382;CMDB_FILTER=PASS\nchr21   9413839 .       C       T       8192.08 .       AC=28;AF=0.350;AN=80;BaseQRankSum=-5.200e-02\nchr21   9413840 .       C       A       11514.35        .       AC=38;AF=0.475;AN=80;BaseQRankSum=0.253\nchr21   9413870 .       T       C       7390.60 .       AC=26;AF=0.325;AN=80;BaseQRankSum=-4.270e-01\nchr21   9413880 .       T       A       146.96  .       AC=1;AF=0.013;AN=80;BaseQRankSum=2.12;ClippingRankSum=0.00\nchr21   9413909 .       G       A       1131.78 .       AC=10;AF=0.125;AN=80;BaseQRankSum=0.549;CMDB_AC=209;CMDB_AF=0.01507;CMDB_AN=13683;CMDB_FILTER=PASS\nchr21   9413913 .       C       T       8120.65 .       AC=28;AF=0.350;AN=80;BaseQRankSum=-4.390e-01;CMDB_AC=2870;CMDB_AF=0.205597;CMDB_AN=13955;CMDB_FILTER=PASS\nchr21   9413945 .       T       C       43787.68        .       AC=71;AF=0.888;AN=80;BaseQRankSum=0.089\nchr21   9413995 .       C       T       9632.44 .       AC=29;AF=0.363;AN=80;BaseQRankSum=0.747\nchr21   9413996 .       A       G       41996.48        .       AC=71;AF=0.888;AN=80;BaseQRankSum=-1.242e+00;CMDB_AC=3308;CMDB_AF=0.688533;CMDB_AN=4790;CMDB_FILTER=PASS\nchr21   9414003 .       T       C       4256.54 .       AC=19;AF=0.238;AN=80;BaseQRankSum=-6.030e-01\n```\n\nCitation\n--------\n\n**If you use CMDB in your scientific publication, we would appreciate cite these papers:**\n\n\n- Siyang Liu, **Shujia Huang**. et al. “Genomic Analyses from Non-invasive Prenatal Testing Reveal Genetic Associations, Patterns of Viral Infections, and Chinese Population History.” *Cell*, vol. 175,2 (2018): 347-359.e14. [doi:10.1016/j.cell.2018.08.016](https://doi.org/10.1016/j.cell.2018.08.016)\n\n\n- Zhichao Li, Xiaosen Jiang, Mingyan Fang, Yong Bai, Siyang Liu, **Shujia Huang**, Xin Jin, CMDB: the comprehensive population genome variation database of China, *Nucleic Acids Research*, 2022;, gkac638, [https://doi.org/10.1093/nar/gkac638](https://doi.org/10.1093/nar/gkac638)\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshujiahuang%2Fcmdbtools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshujiahuang%2Fcmdbtools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshujiahuang%2Fcmdbtools/lists"}