{"id":13704069,"url":"https://github.com/brentp/mosdepth","last_synced_at":"2025-03-11T05:44:33.012Z","repository":{"id":41390296,"uuid":"101316174","full_name":"brentp/mosdepth","owner":"brentp","description":"fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing","archived":false,"fork":false,"pushed_at":"2025-01-14T22:42:20.000Z","size":1281,"stargazers_count":708,"open_issues_count":42,"forks_count":101,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-01-14T23:27:07.285Z","etag":null,"topics":["coverage","depth","exome","genome","hacktoberfest","nim","nim-lang","sequencing","wgs"],"latest_commit_sha":null,"homepage":"","language":"Nim","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brentp.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["brentp"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2017-08-24T16:32:11.000Z","updated_at":"2025-01-14T22:42:24.000Z","dependencies_parsed_at":"2022-07-15T11:18:42.591Z","dependency_job_id":"c6056a2b-aa59-4923-a380-7e15d03f3bcb","html_url":"https://github.com/brentp/mosdepth","commit_stats":null,"previous_names":[],"tags_count":29,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brentp%2Fmosdepth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brentp%2Fmosdepth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brentp%2Fmosdepth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brentp%2Fmosdepth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brentp","download_url":"https://codeload.github.com/brentp/mosdepth/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242980781,"owners_count":20216285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["coverage","depth","exome","genome","hacktoberfest","nim","nim-lang","sequencing","wgs"],"created_at":"2024-08-02T21:01:03.799Z","updated_at":"2025-03-11T05:44:32.988Z","avatar_url":"https://github.com/brentp.png","language":"Nim","readme":"fast BAM/CRAM depth calculation for **WGS**, **exome**, or **targeted sequencing**.\n[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/mosdepth/README.html)\n\n\n![logo](https://user-images.githubusercontent.com/1739/29678184-da1f384c-88ba-11e7-9d98-df4fe3a59924.png \"logo\")\n\n[![Build](https://github.com/brentp/mosdepth/actions/workflows/build.yml/badge.svg)](https://github.com/brentp/mosdepth/actions/workflows/build.yml) [![citation](https://img.shields.io/badge/cite-open%20access-orange.svg)](https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx699/4583630?guestAccessKey=35b55064-4566-4ab3-a769-32916fa1c6e6)\n\n`mosdepth` can output:\n\n+ per-base depth about 2x as fast `samtools depth`--about 25 minutes of CPU time for a 30X genome.\n+ mean per-window depth given a window size--as would be used for CNV calling.\n+ the mean per-region given a BED file of regions.\n* the mean or median per-region cumulative coverage histogram given a window size\n+ a distribution of proportion of bases covered at or above a given threshold for each chromosome and genome-wide.\n+ quantized output that merges adjacent bases as long as they fall in the same coverage bins e.g. (10-20)\n+ threshold output to indicate how many bases in each region are covered at the given thresholds.\n+ A summary of mean depths per chromosome and within specified regions per chromosome.\n+ a [d4](https://github.com/38/d4-format) file (better than bigwig).\n\nwhen appropriate, the output files are bgzipped and indexed for ease of use.\n\n## usage\n\n```\nmosdepth 0.3.11\n\n  Usage: mosdepth [options] \u003cprefix\u003e \u003cBAM-or-CRAM\u003e\n\nArguments:\n\n  \u003cprefix\u003e       outputs: `{prefix}.mosdepth.global.dist.txt`\n                          `{prefix}.mosdepth.summary.txt`\n                          `{prefix}.per-base.bed.gz` (unless -n/--no-per-base is specified)\n                          `{prefix}.regions.bed.gz` (if --by is specified)\n                          `{prefix}.quantized.bed.gz` (if --quantize is specified)\n                          `{prefix}.thresholds.bed.gz` (if --thresholds is specified)\n\n  \u003cBAM-or-CRAM\u003e  the alignment file for which to calculate depth.\n\nCommon Options:\n\n  -t --threads \u003cthreads\u003e     number of BAM decompression threads [default: 0]\n  -c --chrom \u003cchrom\u003e         chromosome to restrict depth calculation.\n  -b --by \u003cbed|window\u003e       optional BED file or (integer) window-sizes.\n  -n --no-per-base           dont output per-base depth. skipping this output will speed execution\n                             substantially. prefer quantized or thresholded values if possible.\n  -f --fasta \u003cfasta\u003e         fasta file for use with CRAM files [default: ].\n\nOther options:\n\n  -F --flag \u003cFLAG\u003e                  exclude reads with any of the bits in FLAG set [default: 1796]\n  -i --include-flag \u003cFLAG\u003e          only include reads with any of the bits in FLAG set. default is unset. [default: 0]\n  -x --fast-mode                    dont look at internal cigar operations or correct mate overlaps (recommended for most use-cases).\n  -a --fragment-mode                count the coverage of the full fragment including the full insert (proper pairs only).\n  -q --quantize \u003csegments\u003e          write quantized output see docs for description.\n  -Q --mapq \u003cmapq\u003e                  mapping quality threshold. reads with a quality less than this value are ignored [default: 0]\n  -l --min-frag-len \u003cmin-frag-len\u003e  minimum insert size. reads with a smaller insert size than this are ignored [default: -1]\n  -u --max-frag-len \u003cmax-frag-len\u003e  maximum insert size. reads with a larger insert size than this are ignored. [default: -1]\n  -T --thresholds \u003cthresholds\u003e      for each interval in --by, write number of bases covered by at\n                                    least threshold bases. Specify multiple integer values separated\n                                    by ','.\n  -m --use-median                   output median of each region (in --by) instead of mean.\n  -R --read-groups \u003cstring\u003e         only calculate depth for these comma-separated read groups IDs.\n  -h --help                         show help\n```\nIf you use mosdepth please cite [the publication in bioinformatics](https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx699/4583630?guestAccessKey=35b55064-4566-4ab3-a769-32916fa1c6e6)\n\n\nSee the section below for more info on distribution.\n\nIf `--by` is a BED file with 4 or more columns, it is assumed the the 4th column is the name.\nThat name will be propagated to the `mosdepth` output in the 4th column with the depth in the 5th column.\nIf you don't want this behavior, simply send a bed file with 3 columns.\n\n### exome example\n\nTo calculate the coverage in each exome capture region:\n```\nmosdepth --by capture.bed sample-output sample.exome.bam\n```\nFor a 5.5GB exome BAM and all 1,195,764 ensembl exons as the regions,\nthis completes in 1 minute 38 seconds with a single CPU.\n\nPer-base output will go to `sample-output.per-base.bed.gz`,\nthe mean for each region will go to `sample-output.regions.bed.gz`;\neach of those will be written along with a CSI index that can be\nused for tabix queries.\nThe distribution of depths will go to `sample-output.mosdepth.dist.txt`\n\n### WGS example\n\nFor 500-base windows\n\n```\nmosdepth -n --fast-mode --by 500 sample.wgs $sample.wgs.cram\n```\n\n`-n` means don't output per-base data, this will make `mosdepth`\na bit faster as there is some cost to outputting that much text.\n\n--fast-mode avoids the extra calculations of mate pair overlap and cigar operations,\nand also allows htslib to extract less data from CRAM, providing a substantial speed\nimprovement.\n\n### Callable regions example\n\nTo create a set of \"callable\" regions as in [GATK's callable loci tool](https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_coverage_CallableLoci.php):\n\n```\n# by setting these ENV vars, we can control the output labels (4th column)\nexport MOSDEPTH_Q0=NO_COVERAGE   # 0 -- defined by the arguments to --quantize\nexport MOSDEPTH_Q1=LOW_COVERAGE  # 1..4\nexport MOSDEPTH_Q2=CALLABLE      # 5..149\nexport MOSDEPTH_Q3=HIGH_COVERAGE # 150 ...\nmosdepth -n --quantize 0:1:5:150: $sample.quantized $sample.wgs.bam\n```\n\nFor this case. A regions with depth of 0 are labelled as \"NO_COVERAGE\", those with\ncoverage of 1,2,3,4 are labelled as \"LOW_COVERAGE\" and so on.\n\nThe result is a BED file where adjacent bases with depths that fall into the same\nbin are merged into a single region with the 4th column indicating the label.\n\n\n### Distribution only with modified precision\n\nTo get only the distribution value, without the depth file or the per-base and using 3 threads:\n\n```\nMOSDEPTH_PRECISION=5 mosdepth -n -t 3 $sample $bam\n```\n\nOutput will go to `$sample.mosdepth.dist.txt`\n\nThis also forces the output to have 5 decimals of precision rather than the default of 2.\n\n## D4\n\nD4 is a format created by [Hao Hou](https://github.com/38) in the Quinlan lab. It is\nincorporated into `mosdepth` as of version 0.3.0 for per-base output with the `--d4` flag.\nIt improves write speed dramatically; for one test-case it takes **24.8s** to write a\nper-base.bed.gz with mosdepth compared to **7.7s** to write a d4 file. For the same case,\nrunning `mosdepth` without writing per-base takes 5.9 seconds so D4 greatly mitigates\nthe cost of outputing per-base depth **and** the output is more useful.\n\n## Installation\n\n\nThe simplest option is to download the [binary from the releases](https://github.com/brentp/mosdepth/releases).\n\nAnother quick way is to [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/mosdepth/README.html)\n\nIt can also be installed with `brew` as `brew install brewsci/bio/mosdepth` or used via docker with quay:\n```\ndocker pull quay.io/biocontainers/mosdepth:0.3.3--h37c5b7d_2\ndocker run -v /hostpath/:/opt/mount quay.io/biocontainers/mosdepth:0.2.4--he527e40_0 mosdepth -n --fast-mode -t 4 --by 1000 /opt/mount/sample /opt/mount/$bam\n```\n\nThe binary from releases is static, with no dependencies. If you build it yourself,\n`mosdepth` requires htslib version 1.4 or later. If you get an error\nabout \"`libhts.so` not found\", set `LD_LIBRARY_PATH` to the directory that\ncontains `libhts.so`. e.g.\n\n`LD_LIBRARY_PATH=~/src/htslib/ mosdepth -h`\n\nIf you get the error `could not import: hts_check_EOF` you may need to\ninstall a more recent version of htslib.\n\nIf you do want to install from source, see the [install.sh](https://github.com/brentp/mosdepth/blob/master/scripts/install.sh).\n\nIf you use archlinux, you can [install as a package](https://aur.archlinux.org/packages/mosdepth/)\n\n## distribution output\n\nThis is **useful for QC**.\n\nThe `$prefix.mosdepth.global.dist.txt` file contains, a cumulative distribution indicating the\nproportion of total bases (or the proportion of the `--by` for `$prefix.mosdepth.region.dist.txt`) that were covered\nfor at least a given coverage value. It does this for each chromosome, and for the\nwhole genome.\n\nEach row will indicate:\n + chromosome (or \"total\")\n + coverage level\n + proportion of bases covered at that level\n\nThe last value in each chromosome will be coverage level of 0 aligned with\n1.0 bases covered at that level.\n\nA python plotting script is provided in `scripts/plot-dist.py` that will make\nplots like below. Use is `python scripts/plot-dist.py \\*global.dist.txt` and the output\nis `dist.html` with a plot for the full set along with one for each chromosome.\n\nUsing something like that, we can plot the distribution from the entire genome.\nBelow we show this for samples with ~60X coverage:\n\n![WGS Example](https://user-images.githubusercontent.com/1739/29646192-2a2a6126-883f-11e7-91ab-049295eb3531.png \"WGS Example\")\n\nWe can also view the Y chromosome to verify that males and females\ntrack separately. Below, we that see female samples cluster along the axes while male samples have\nclose to 30X coverage for almost 40% of the genome.\n\n![Y Example](https://user-images.githubusercontent.com/1739/29646191-2a246564-883f-11e7-951a-aa68d7a1a6ed.png \"Y Example\")\n\nSee [this blog post](https://web.archive.org/web/20181018084459/http://www.gettinggeneticsdone.com/2014/03/visualize-coverage-exome-targeted-ngs-bedtools.html) for\nmore details.\n\n## thresholds\n\ngiven a set of regions to the `--by` argment, `mosdepth` can report the number of bases in each region that\nare covered at or above each threshold value given to `--thresholds`. e.g:\n```\nmosdepth --by exons.bed --thresholds 1,10,20,30 $prefix $bam\n```\n\nwill create a file $prefix.thresholds.bed.gz with an extra column for each requested threshold.\nAn example output for the above command (assuming exons.bed had a 4th column with gene names) would look like (including the header):\n\n```\n#chrom  start   end     region           1X   10X  20X  30X\n1       11869   12227   ENSE00002234944  358  157  110  0\n1       11874   12227   ENSE00002269724  353  127  10   0\n1       12010   12057   ENSE00001948541  47   8    0    0\n1       12613   12721   ENSE00003582793  108  0    0    0\n```\n\nIf there is no name (4th) column in the bed file send to `--by` then that column will contain \"unknown\"\nin the output.\n\nThis is extremely efficient. In our tests, excluding per-base output (`-n`) and using this argument with\n111K exons and 12 values to `--thresholds` increases the run-time by \u003c 5%.\n\n## quantize\n\nquantize allows splitting coverage into bins and merging adjacent regions that fall into the same bin even if they have\ndifferent exact coverage values. This can dramatically reduce the size of the output compared to the per-base.\n\nIt also allows outputting regions of low, high, and \"callable\" coverage as in [GATK's callable loci tool](https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_coverage_CallableLoci.php).\n\nAn example of quantize arguments:\n```\n--quantize 0:1:4:100:200: # ... arbitary number of quantize bins.\n```\n\nindicates bins of: 0:1, 1:4, 4:100, 100:200, 200:infinity\nwhere the upper endpoint is non-inclusive.\n\nThe default for `mosdepth` is to output labels as above (0:1, 1:4, 4:100... etc.)\n\nTo change what is reported as the bin number, a user can set environment variables e.g.:\n\n```\nexport MOSDEPTH_Q0=NO_COVERAGE\nexport MOSDEPTH_Q1=LOW_COVERAGE\nexport MOSDEPTH_Q2=CALLABLE\nexport MOSDEPTH_Q3=HIGH_COVERAGE\n```\n\nIn this case, the bin label is replaced by the text in the appropriate environment variable.\n\nThis is very efficient. In our tests, excluding per-base output (`-n`) and using this argument with\n9 bins to `--quantize` increases the run-time by ~ 20%. In contrast, the difference in time with\nand without `-n` can be 2-fold.\n\n## how it works\n\nAs it encounters each chromosome, `mosdepth` creates an array the length of the chromosome.\nFor every start it encounters, it increments the value in that position of the array. For every\nstop, it decrements that position. From this, the depth at a particular position is the\ncumulative sum of all array positions preceding it (a similar algorithm is used in BEDTools\nwhere starts and stops are tracked separately). `mosdepth` **avoids double-counting\noverlapping mate-pairs** and it **tracks every aligned part of every read using the CIGAR\noperations**. Because of this data structure, the the coverage `distribution` calculation\ncan be done without a noticeable increase in run-time. The image below conveys the concept:\n\n![alg](https://user-images.githubusercontent.com/1739/29647913-d79ab028-8848-11e7-86cf-60d4b087bc3b.png \"algorithm\")\n\nThis array accounting is very fast. There are no extra allocations or objects to track and\nit is also conceptually simple. For these reasons, it is faster than `samtools depth` which\nworks by using the [pileup](http://samtools.sourceforge.net/pileup.shtml) machinery that\ntracks each read, each base.\n\nThe `mosdepth` method has some limitations. Because a large array is allocated and it is\nrequired (in general) to take the cumulative sum of all preceding positions to know the depth\nat any position, it is slower for small, 1-time regional queries. It is, however fast for\nwindow-based or BED-based regions, because it first calculates the full chromosome coverage\nand then reports the coverage for each region in that chromosome. Another downside is it uses\nmore memory than samtools. The amount of memory is approximately equal to 32-bits * longest chrom\nlength, so for the 249MB chromosome 1, it will require 1GB of memory.\n\n`mosdepth` is written in [nim](https://nim-lang.org/) and it uses our [htslib](https://github.com/samtools/htslib)\nvia our nim wrapper [hts-nim](https://github.com/brentp/hts-nim/)\n\n## speed and memory comparison\n\n`mosdepth`, `samtools`, `bedtools`, and `sambamba` were run on a 30X genome.\nrelative times are relative to mosdepth per-base mode with a single thread.\n\n`mosdepth` can report the mean depth in 500-base windows genome-wide info\nunder 9 minutes of user time with 3 threads.\n\n| format |    tool    | threads  | mode   | relative time | run-time | memory |\n| ------ | ---------- | -------- | ------ | ------------- | -------  | -------|\n|  BAM   |  mosdepth  |    1     | base   |     1         |  25:23   |  1196  |\n|  BAM   |  mosdepth  |    3     | base   |    0.57       |  14:27   |  1197  |\n|  CRAM  |  mosdepth  |    1     | base   |    1.17       |  29:47   |  1205  |\n|  CRAM  |  mosdepth  |    3     | base   |    0.56       |  14:08   |  1225  |\n|  BAM   |  mosdepth  |    3     | window |    0.34       |  8:44    |  1277  |\n|  BAM   |  mosdepth  |    1     | window |    0.80       |  20:26   |  1212  |\n|  CRAM  |  mosdepth  |    3     | window |    0.35       |  8:47    |  1233  |\n|  CRAM  |  mosdepth  |    1     | window |    0.88       |  22:23   |  1209  |\n|  BAM   |  sambamba  |    1     | base   |    5.71       | 2:24:53  |  166   |\n|  BAM   |  samtools  |    1     | base   |    1.98       | 50:12    |  27    |\n|  CRAM  |  samtools  |    1     | base   |    1.79       | 45:21    |  451   |\n|  BAM   |  bedtools  |    1     | base   |    5.31       | 2:14:44  |  1908  |\n\n\nNote that the threads to `mosdepth` (and samtools) are decompression threads. After\nabout 4 threads, there is no benefit for additional threads:\n\n![mosdepth-scaling](https://user-images.githubusercontent.com/1739/31246294-256d1b7c-a9ca-11e7-8e28-6c4d07cba3f5.png)\n\n\n### Accuracy\n\nWe compared `samtools depth` with default arguments to `mosdepth` without overlap detection and discovered **no\ndifferences across the entire chromosome**.\n","funding_links":["https://github.com/sponsors/brentp"],"categories":["Next Generation Sequencing"],"sub_categories":["BAM File Utilities"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrentp%2Fmosdepth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrentp%2Fmosdepth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrentp%2Fmosdepth/lists"}