{"id":13763630,"url":"https://github.com/ParkerLab/ataqv","last_synced_at":"2025-05-10T17:30:54.751Z","repository":{"id":52251032,"uuid":"45886241","full_name":"ParkerLab/ataqv","owner":"ParkerLab","description":"A toolkit for QC and visualization of ATAC-seq results.","archived":false,"fork":false,"pushed_at":"2023-02-17T17:14:26.000Z","size":55856,"stargazers_count":61,"open_issues_count":3,"forks_count":10,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-02-12T23:44:55.867Z","etag":null,"topics":["atac-seq","quality-control","software","tool","visualization"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ParkerLab.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2015-11-10T03:59:20.000Z","updated_at":"2024-01-23T02:33:16.000Z","dependencies_parsed_at":"2022-08-30T19:51:00.103Z","dependency_job_id":"e7fbdb71-e33f-4c62-b86e-2ad1cb09884f","html_url":"https://github.com/ParkerLab/ataqv","commit_stats":null,"previous_names":[],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ParkerLab%2Fataqv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ParkerLab%2Fataqv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ParkerLab%2Fataqv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ParkerLab%2Fataqv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ParkerLab","download_url":"https://codeload.github.com/ParkerLab/ataqv/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253453191,"owners_count":21911054,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atac-seq","quality-control","software","tool","visualization"],"created_at":"2024-08-03T15:00:54.429Z","updated_at":"2025-05-10T17:30:49.741Z","avatar_url":"https://github.com/ParkerLab.png","language":"C++","funding_links":[],"categories":["QC"],"sub_categories":[],"readme":"####################################\nataqv: ATAC-seq QC and visualization\n####################################\n\n***********\nWhat is it?\n***********\n\nA toolkit for measuring and comparing ATAC-seq results, made in the\n`Parker lab`_ at the University of Michigan. We wrote it to help us\nunderstand how well our ATAC-seq assays had worked, and to make it\neasier to spot differences that might be caused by library prep or\nsequencing.\n\nThe main program, ``ataqv``, examines your aligned reads and reports some\nbasic metrics, including:\n\n* reads mapped in proper pairs\n* optical or PCR duplicates\n* reads mapping to autosomal or mitochondrial references\n* the ratio of short to mononucleosomal fragment counts\n* mapping quality\n* various kinds of problematic alignments\n\nIf you also have a file of peaks called on your data, that file can be\nexamined to report read coverage of the peaks.\n\nWith a file of transcription start sites, ataqv can report a TSS\nenrichment metric based on the transposition activity around those\nlocations.\n\nThe report is printed as plain text to standard output, and detailed\nmetrics are written to JSON files for further processing.\n\nA web-based visualization and comparison tool and a script to prepare\nthe JSON output for it are also provided. The web viewer includes\ninteractive tables of the metrics and plots of fragment length,\ndistance from a fragment length reference distribution, mapping\nquality, counts of reads overlapping peaks, and peak territory.\n\nWeb viewer demo: https://parkerlab.github.io/ataqv/demo/\n\n******************\nWhere does it run?\n******************\n\nIt's tested on Linux and Macs. It may compile and run on other UNIX\nsystems.\n\n****\nHelp\n****\n\nIf you have questions or suggestions, mail us at\n`parkerlab-software@umich.edu`_, or file a `GitHub issue`_.\n\n****\nCiting\n****\n\nAtaqv is now published in Cell Systems: https://doi.org/10.1016/j.cels.2020.02.009\n\n***************\nGetting started\n***************\n\nThere are several ways to get ``ataqv`` running on your system:\ninstall a binary package; install it with `Homebrew`_ or `Linuxbrew`_;\nor build it from source.\n\nBinary packages (Linux only)\n============================\n\nWe provide several Linux binary packages under `recent releases on\nGithub`_. Install ``.deb`` files with ``dpkg``, ``.rpm`` files with\n``dnf`` or ``yum``, or download and extract the ``ataqv-x.x.x.tar.gz``\nfile and add the full path to the resulting ``ataqv-x.x.x/bin``\nsubdirectory to your PATH environment variable.\n\nHomebrew (Mac or Linux)\n=======================\n\nThe easiest way to install ataqv from source is via `Homebrew`_ on\nMacs, or `Linuxbrew`_ on Linux, using our `tap`_. At a shell prompt::\n\n  brew tap ParkerLab/tap\n  brew install ataqv\n\nBuilding from source manually\n=============================\n\nPrerequisites\n-------------\n\nTo build ataqv, you need:\n\n* Linux or a Mac (it may work on other UNIX systems, but it's untested)\n* C++11 compiler (gcc 4.9 or newer, or clang on OS X)\n* `Boost`_\n* `HTSlib`_\n\nThe ``mkarv`` script that collects ataqv results and sets up a web\napplication to visualize them requires Python 2.7 or newer.\n\nTo run the test suite, you'll also need `LCOV`_, which can be\ninstalled via `Homebrew`_ or `Linuxbrew`_.\n\nOn Debian-based Linux distributions, you can install dependencies\nwith::\n\n  sudo apt install libboost-all-dev libhts-dev ncurses-dev libtinfo-dev zlib1g-dev lcov\n\nand the latest supported option among::\n\n  sudo apt install libstdc++-6-dev\n  sudo apt install libstdc++-5-dev\n  sudo apt install libstdc++-4.9-dev\n\nBuilding\n--------\n\nAt your shell prompt::\n\n  git clone https://github.com/ParkerLab/ataqv\n  cd ataqv\n  make\n\nIf Boost and htslib are not available in default system locations (for\nexample if you're using environment modules, or compiling in your home\ndirectory) you'll probably need to give ``make`` some hints via the\n``CPPFLAGS`` and ``LDFLAGS`` variables::\n\n  make CPPFLAGS=\"-I/path/to/boost/include -I/path/to/htslib/include\" LDFLAGS=\"-L/path/to/boost/lib -L/path/to/htslib/lib\"\n\nIf the environment variables ``BOOST_ROOT`` or ``HTSLIB_ROOT`` are set\nto directories containing ``include`` and ``lib`` subdirectories, the\ncompiler configuration can be made simpler::\n\n  make BOOST_ROOT=/path/to/boost HTSLIB_ROOT=/path/to/htslib\n\nOr you can specify directories in BOOST_INCLUDE, BOOST_LIB,\nHTSLIB_INCLUDE, and HTSLIB_LIB separately.\n\nIf you use custom locations like this, you will probably need to set\nLD_LIBRARY_PATH for the shared libraries to be found at runtime::\n\n  export LD_LIBRARY_PATH=/path/to/boost/lib:/path/to/htslib/lib:$LD_LIBRARY_PATH\n\nDependency notes\n----------------\n\nBoost\n^^^^^\n\nIf your Boost installation used their \"tagged\" layout, the libraries\nwill include metadata in their names; on Linux this usually just means\nthat they'll have a ``-mt`` suffix to indicate multithreading\nsupport. Specify ``BOOST_TAGGED=yes`` in your make commands to link\nwith those.\n\nHTSlib\n^^^^^^\n\nIf HTSlib was built to use libcurl, you'll need to link with that as\nwell::\n\n  make HTSLIBCURL=yes\n\nInstallation\n------------\n\nThe Makefile supports the common `DESTDIR` and `prefix` variables. To\ninstall to /usr/local::\n\n  make install prefix=/usr/local\n\nSupport for the `Environment Modules`_ system is also included. You\ncan install to the modules tree by defining the ``MODULES_ROOT`` and\n``MODULEFILES_ROOT`` variables. If your modules are kept under\n``/opt/modules``, with their accompanying module files under\n``/opt/modulefiles``, run::\n\n  make install-module MODULES_ROOT=/opt/modules MODULEFILE_ROOT=/opt/modulefiles\n\nAnd then you should be able to run ``module load ataqv`` to have\neverything available in your environment.\n\n*****\nUsage\n*****\n\nPrerequisites\n=============\n\nYou'll need to have a BAM file containing alignments of your ATAC-seq\nreads to your reference genome. If you want accurate duplication\nmetrics, you'll also need to have marked duplicates in that BAM\nfile. If you have a BED file containing peaks called on your data,\nataqv can produce some additional metrics using that.\n\nVerifying ataqv results with data from a variety of common tools is on\nour to-do list, but so far, we've only used `bwa`_, `Picard's\nMarkDuplicates`_, and `MACS2`_ for these steps. A pipeline like ours\ncan be generated with the included ``make_ataqv_pipeline`` script. Its\noutput product starts from a BAM file of aligned reads, marks\nduplicates and calls peaks, then runs ataqv and produces a web viewer\nfor the output.\n\nRunning\n=======\n\nThe main program is ataqv, which is run as follows::\n  \n  ataqv [options] organism alignment-file\n  \n  where:\n      organism is the subject of the experiment, which determines the list of autosomes\n      (see \"Reference Genome Configuration\" below).\n  \n      alignment-file is a BAM file with duplicate reads marked.\n  \n  Basic options\n  -------------\n  \n  --help: show this usage message.\n  --verbose: show more details and progress updates.\n  --version: print the version of the program.\n  --threads \u003cn\u003e: the maximum number of threads to use (right now, only for calculating TSS enrichment).\n  \n  Optional Input\n  --------------\n  \n  --peak-file \"file name\"\n      A BED file of peaks called for alignments in the BAM file. Specify \"auto\" to use the\n      BAM file name with \".peaks\" appended, or if the BAM file contains read groups, to\n      assume each read group has a peak file whose name is the read group ID with \".peaks\"\n      appended. If you specify a single filename instead of \"auto\" with read groups, the \n      same peaks will be used for all reads -- be sure this is what you want.\n  \n  --tss-file \"file name\"\n      A BED file of transcription start sites for the experiment organism. If supplied,\n      a TSS enrichment score will be calculated according to the ENCODE data standards.\n      This calculation requires that the BAM file of alignments be indexed.\n  \n  --tss-extension \"size\"\n      If a TSS enrichment score is requested, it will be calculated for a region of \n      \"size\" bases to either side of transcription start sites. The default is 1000bp.\n  \n  --excluded-region-file \"file name\"\n      A BED file containing excluded regions. Peaks or TSS overlapping these will be ignored.\n      May be given multiple times.\n  \n  Output\n  ------\n  \n  --metrics-file \"file name\"\n      The JSON file to which metrics will be written. The default filename will be based on\n      the BAM file, with the suffix \".ataqv.json\".\n  \n  --log-problematic-reads\n      If given, problematic reads will be logged to a file per read group, with names\n      derived from the read group IDs, with \".problems\" appended. If no read groups\n      are found, the reads will be written to one file named after the BAM file.\n\n  --tabular-output\n      If given, the metrics file output will be a tabular (TSV) text file, not JSON. This\n      output CANNOT be used to generate the HTML report, and excludes several metrics that\n      would otherwise be included in the JSON output (e.g., the full fragment length\n      distribution, the full TSS coverage curve, and the full mapping quality distribution).\n      This option is not recommended when analyzing bulk ATAC-seq data, but may be useful\n      when analyzing single nucleus ATAC-seq data with large numbers of distinct cell\n      barcodes (say, \u003e100k); in such a case this option should substantially reduce memory\n      usage, reduce runtime, and avoid the need to parse a large JSON file in downstream\n      analysis, while still outputting the metrics commonly used to QC single nucleus\n      ATAC-seq data (TSS enrichment, read counts, and mitochondrial read counts, amongst others).\n\n  --less-redundant\n      If given, output a subset of metrics that should be less redundant. If this flag is used,\n      the same flag should be passed to mkarv when making the viewer.\n      \n  Metadata\n  --------\n\n  The following options provide metadata to be included in the metrics JSON file.\n  They make it easier to compare results in the ataqv web interface.\n\n  --name \"name\"\n    A label to be used for the metrics when there are no read groups. If there are read\n    groups, each will have its metrics named using its ID field. With no read groups and\n    no --name given, your metrics will be named after the alignment file.\n\n  --ignore-read-groups\n    Even if read groups are present in the BAM file, ignore them and combine metrics\n    for all reads under a single sample and library named with the --name option. This\n    also implies that a single peak file will be used for all reads; see the --peak option.\n\n  --nucleus-barcode-tag \"nucleus_barcode_tag\"\n    Data is single-nucleus, with the barcode stored in this BAM tag.\n    In this case, metrics will be collected per barcode.\n\n  --description \"description\"\n    A short description of the experiment.\n\n  --url \"URL\"\n    A URL for more detail on the experiment (perhaps using a DOI).\n\n  --library-description \"description\"\n    Use this description for all libraries in the BAM file, instead of using the DS\n    field from each read group.\n\n  Reference Genome Configuration\n  ------------------------------\n\n  ataqv includes lists of autosomes for several organisms:\n\n    Organism  Autosomal References\n     -------  ------------------\n         fly  2R 2L 3R 3L 4\n       human  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22\n       mouse  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19\n         rat  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20\n        worm  I II III IV V\n       yeast  I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI\n\n    The default autosomal reference lists contain names with \"chr\" prefixes\n    (\"chr1\") and without (\"1\"). If you need a different set of autosomes, you can\n    supply a list with --autosomal-reference-file.\n\n  --autosomal-reference-file \"file name\"\n    A file containing autosomal reference names, one per line. The names must match\n    the reference names in the alignment file exactly, or the metrics based on counts\n    of autosomal alignments will be wrong.\n\n  --mitochondrial-reference-name \"name\"\n    If the reference name for mitochondrial DNA in your alignment file is not \"chrM\",.\n    use this option to supply the correct name. Again, if this name is wrong, all the\n    measurements involving mitochondrial alignments will be wrong.\n\nWhen run, ataqv prints a human-readable summary to its standard\noutput, and writes complete metrics to the JSON file named with the\n`--metrics-file` option.\n\nThe JSON output can be incorporated into a web application that\npresents tables and plots of the metrics, and makes it easy to compare\nresults across samples or experiments. Use the ``mkarv`` script to\ncreate a local instance of the result viewer (run ``mkarv -h`` for complete instructions). A web server is not\nrequired, though you can use one to publish your result viewer\ninstance.\n\nGiven several BAM files (mapped to hg19) and accompanying broadPeak files (along with hg19 TSS files and blacklist), an example workflow might be::\n\n  $ # first, run ataqv on each bam file to generate JSON files as well as human-readable output\n  $ ataqv --peak-file /lab/work/porchard/atacseq/macs2/sample_1_peaks.broadPeak --name sample_1 --metrics-file /lab/work/porchard/atacseq/ataqv/sample_1.ataqv.json.gz --excluded-region-file /lab/work/porchard/atacseq/data/mappability/hg19.blacklist.bed.gz --tss-file /lab/work/porchard/atacseq/data/tss/hg19.tss.refseq.bed.gz --ignore-read-groups human /lab/work/porchard/atacseq/mark_duplicates/sample_1.md.bam \u003e /lab/work/porchard/atacseq/ataqv/sample_1.ataqv.out\n  $ ataqv --peak-file /lab/work/porchard/atacseq/macs2/sample_2_peaks.broadPeak --name sample_2 --metrics-file /lab/work/porchard/atacseq/ataqv/sample_2.ataqv.json.gz --excluded-region-file /lab/work/porchard/atacseq/data/mappability/hg19.blacklist.bed.gz --tss-file /lab/work/porchard/atacseq/data/tss/hg19.tss.refseq.bed.gz --ignore-read-groups human /lab/work/porchard/atacseq/mark_duplicates/sample_2.md.bam \u003e /lab/work/porchard/atacseq/ataqv/sample_2.ataqv.out\n  $ ataqv --peak-file /lab/work/porchard/atacseq/macs2/sample_3_peaks.broadPeak --name sample_3 --metrics-file /lab/work/porchard/atacseq/ataqv/sample_3.ataqv.json.gz --excluded-region-file /lab/work/porchard/atacseq/data/mappability/hg19.blacklist.bed.gz --tss-file /lab/work/porchard/atacseq/data/tss/hg19.tss.refseq.bed.gz --ignore-read-groups human /lab/work/porchard/atacseq/mark_duplicates/sample_3.md.bam \u003e /lab/work/porchard/atacseq/ataqv/sample_3.ataqv.out\n  $\n  $ # run mkarv on the JSON files to generate the interactive web viewer (in this case, SRR891268 will be used as the reference sample in the viewer):\n  $ mkarv my_fantastic_experiment /lab/work/porchard/atacseq/ataqv/sample_1.ataqv.json.gz /lab/work/porchard/atacseq/ataqv/sample_2.ataqv.json.gz /lab/work/porchard/atacseq/ataqv/sample_3.ataqv.json.gz\n  $\n  $ # to see the viewer, open the file my_fantastic_experiment/index.html in your web browser\n\nExample\n=======\n\nThe ataqv package includes a script that will set up and run our\nentire ATAC-seq pipeline on some sample data.\n\nYou'll need to have installed ataqv itself, plus Picard tools,\nsamtools, and MACS2 to run the pipeline. On a Mac, you can obtain\neverything with::\n\n  $ brew install ataqv picard-tools samtools\n  $ pip install MACS2\n\nOn Linux, installation of the dependencies is probably specific to\nyour environment and is left as an exercise for the reader. On Debian,\n``apt-get install picard-tools samtools`` followed by installing MACS2\nwith ``pip install MACS2`` should be enough.\n\nOnce you have the prerequisite programs installed, you can run the\nexample pipeline with::\n\n  $ run_ataqv_example /output/path\n\nComparing your results to others\n================================\n\nPart of this project will be publishing ataqv output for as many\nATAC-seq experiments as we can get our hands on, so we can compare\nthem and learn how changes to the protocol affect the output. Watch\nour `GitHub docs`_ for updates.\n\n***********\nPerformance\n***********\n\nIt's not currently concurrent, so don't allocate it more than a single\nprocessor. Memory usage should typically be no more than a few hundred\nmegabytes.\n\nAnecdotally, processing a 41GB BAM file containing 1,126,660,186\nalignments of the data from the ATAC-seq paper took just under 20\nminutes and 2GB of memory. Adding peak metrics extended the run time\nto almost 40 minutes, but it still used the same amount of memory.\n\n.. _Parker lab: http://theparkerlab.org/\n.. _Boost: http://www.boost.org/\n.. _HTSlib: http://www.htslib.org/\n.. _LCOV: http://ltp.sourceforge.net/coverage/lcov.php\n.. _Homebrew: http://brew.sh/\n.. _Linuxbrew: http://linuxbrew.sh/\n.. _tap: https://github.com/ParkerLab/homebrew-tap\n.. _Environment Modules: https://en.wikipedia.org/wiki/Environment_Modules_%28software%29\n.. _Github issue: https://github.com/ParkerLab/ataqv/issues\n.. _recent releases on GitHub: https://github.com/ParkerLab/ataqv/releases\n.. _bwa: http://bio-bwa.sourceforge.net/\n.. _Picard's MarkDuplicates: https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates\n.. _MACS2: https://github.com/taoliu/MACS/\n.. _Github docs: https://parkerlab.github.io/ataqv/\n.. _parkerlab-software@umich.edu: mailto:parkerlab-software@umich.edu?subject=ataqv\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FParkerLab%2Fataqv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FParkerLab%2Fataqv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FParkerLab%2Fataqv/lists"}