{"id":32207309,"url":"https://github.com/tushiqi/manorm2","last_synced_at":"2026-02-20T16:02:06.913Z","repository":{"id":55611877,"uuid":"232816231","full_name":"tushiqi/MAnorm2","owner":"tushiqi","description":"MAnorm2 for Normalizing and Comparing ChIP-seq Samples","archived":false,"fork":false,"pushed_at":"2022-11-03T07:35:04.000Z","size":42916,"stargazers_count":32,"open_issues_count":4,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-22T05:50:56.829Z","etag":null,"topics":["chip-seq","differential-analysis","empirical-bayes","r-package","winsorize-values"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tushiqi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-09T13:31:31.000Z","updated_at":"2025-07-18T19:12:56.000Z","dependencies_parsed_at":"2023-01-21T12:01:39.479Z","dependency_job_id":null,"html_url":"https://github.com/tushiqi/MAnorm2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tushiqi/MAnorm2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tushiqi%2FMAnorm2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tushiqi%2FMAnorm2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tushiqi%2FMAnorm2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tushiqi%2FMAnorm2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tushiqi","download_url":"https://codeload.github.com/tushiqi/MAnorm2/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tushiqi%2FMAnorm2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280389295,"owners_count":26322507,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chip-seq","differential-analysis","empirical-bayes","r-package","winsorize-values"],"created_at":"2025-10-22T05:51:12.027Z","updated_at":"2025-10-22T05:51:15.209Z","avatar_url":"https://github.com/tushiqi.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\r\nTitle: MAnorm2 1.2.2\r\nAuthor: Shiqi Tu\r\nDate: 2022-10-28\r\nContact: tushiqi@picb.ac.cn\r\n---\r\n\r\n\r\n# Introduction\r\n\r\nMAnorm2 is designed for normalizing and comparing\r\n[ChIP-seq](https://en.wikipedia.org/wiki/ChIP-sequencing) signals across\r\nindividual samples or groups of samples. The latest version of MAnorm2 is\r\nalways available in the [CRAN](https://cran.r-project.org) repository and\r\ncan thus be easily installed by typing `install.packages(\"MAnorm2\")` in an\r\nR session.\r\n\r\nFor older versions of MAnorm2, select a package from under the `dist` folder,\r\ndownload it, and type `install.packages(\"/path/to/the/package\", repos = NULL)`.\r\nIn this way, you may need to pre-install some dependencies of MAnorm2. The\r\ncurrent dependencies of the latest MAnorm2 version include locfit (\u003e= 1.5.9),\r\nscales (\u003e= 0.3.0), and statmod (\u003e= 1.4.34). All these packages are available in\r\nthe [CRAN](https://cran.r-project.org) repository. For dependencies of other\r\nMAnorm2 versions, refer to the `Imports` field in the corresponding\r\n`DESCRIPTION` file.\r\n\r\nSections below give a brief description\r\nof the application scope of MAnorm2 as well as its capability. For a full\r\ndocumentation of MAnorm2, download the HTML version of its vignette from\r\n[here](https://github.com/tushiqi/MAnorm2/tree/master/utility/vignette-MAnorm2)\r\nand use a browser to open it, or type the following code in an R session\r\nafter installing MAnorm2:\r\n\r\n```r\r\nbrowseVignettes(\"MAnorm2\")\r\n```\r\n\r\n\r\n# Format of Input Data\r\n\r\nFor employing the machinery implemented in MAnorm2, you need to prepare a\r\ntable that profiles the ChIP-seq signal in each of a list of genomic intervals\r\nfor each of a set of ChIP-seq samples. The following table provides such an\r\ninstance:\r\n\r\n| chrom|  start|    end| s1.read\\_cnt| s2.read\\_cnt| s1.occupancy| s2.occupancy|\r\n|-----:|------:|------:|------------:|------------:|------------:|------------:|\r\n|  chr1|  28112|  29788|          115|            4|            1|            0|\r\n|  chr1| 164156| 166417|          233|          194|            1|            1|\r\n|  chr1| 166417| 168417|          465|          577|            1|            1|\r\n|  chr1| 168417| 169906|           15|           34|            0|            1|\r\n\r\n(See the `H3K27Ac` dataset bundled with MAnorm2 for another example, and type\r\n`library(MAnorm2); ?H3K27Ac` in an R session for a detailed description of it.)\r\n\r\nTo be specific, each row of the table represents a genomic interval; each of\r\nthe `read_cnt` variables corresponds to a ChIP-seq sample and gives the numbers\r\nof reads from the sample that fall within the genomic intervals (i.e., the raw\r\nread counts); the `occupancy` variables correspond to the `read_cnt`\r\nvariables one by one and specify the occupancy status of each genomic interval\r\nin each sample. Note that an occupancy status of 1 indicates the interval\r\nis enriched with reads in the sample (compared with, for example,\r\nthe surrounding genomic regions or the corresponding input sample). In\r\npractice, the occupancy status of a genomic interval in a certain ChIP-seq\r\nsample could be determined by its overlap with the\r\n[peaks](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-9-r137)\r\nof the sample. Note also that MAnorm2 refers to an interval as occupied by a\r\nsample if the interval is enriched with reads in the sample.\r\n\r\n[MAnorm2_utils](https://github.com/tushiqi/MAnorm2_utils) is specifically\r\ndesigned to coordinate with MAnorm2, and we strongly recommend using it to\r\ncreate input tables of MAnorm2.\r\n\r\n\r\n# Application Scope\r\n\r\nAlthough MAnorm2 has been designed to process ChIP-seq data, it could be\r\napplied in principle to the analysis of any type of data with a similar\r\nstructure, including\r\n[DNase-seq](https://en.wikipedia.org/wiki/DNase-Seq),\r\n[ATAC-seq](https://en.wikipedia.org/wiki/ATAC-seq) and\r\n[RNA-seq](https://en.wikipedia.org/wiki/RNA-Seq) data.\r\nThe only problem associated with such extensions is how to naturally define\r\n\"peaks\" for specific data types.\r\n\r\nMost of the peak callers originally devised for ChIP-seq data\r\n(e.g., [MACS 1.4](https://pypi.org/project/MACS/)) also\r\nwork for DNase-seq and ATAC-seq data. For RNA-seq data, each row of the input\r\ntable should stand for a gene, and we recommend setting a cutoff (e.g., 20) of\r\n*raw read count* to define \"peak\" genes.\r\n\r\n\r\n# Continuous Distribution\r\n\r\nIn spite of the discrete nature of read counts, MAnorm2 uses continuous\r\ndistribution to model ChIP-seq data by first transforming raw read counts into\r\nraw signal intensities. By default, MAnorm2 completes the transformation by\r\nsimply adding an offset count to each raw count and taking a base-2 logarithm.\r\nPractical ChIP-seq data sets, however, may be associated with various\r\nconfounding factors, including batch effects, local sequence compositions and\r\nbackground signals measured by input samples. On this account, functions in\r\nMAnorm2 have been designed to be independent of the specific transformation\r\nemployed. And any methods for correcting for confounding factors could be\r\napplied before invoking MAnorm2, as long as the resulting signal intensities\r\ncould be approximately modeled as following the normal distribution (in\r\nparticular, consider carefully whether it is necessary to apply a logarithmic\r\ntransformation in the final step).\r\n\r\nThe primary reason for which MAnorm2 models ChIP-seq signals as\r\ncontinuous random variables is that the mathematical theory of count\r\ndistributions is far less tractable than that of the normal distribution.\r\nFor example, current statistical methods based on the negative binomial\r\ndistribution are frequently relied on approximations of various kinds.\r\nSpecifically, variance (or dispersion) estimates for individual genomic\r\nintervals are typically treated as known parameters, and their uncertainty\r\ncan hardly be incorporated into the statistical tests for identifying\r\ndifferential signals.\r\n\r\nBesides, after an extensive correction for confounding factors,\r\nthe resulting data range is almost certainly not limited to non-negative\r\nintegers, and the data may have lost their discrete nature and be more akin\r\nto a continuous distribution. Moreover, transforming read counts towards the\r\nnormal distribution unlocks the application of a large repository of mature\r\nstatistical methods that are initially developed for analyzing continuous\r\nmeasurements (e.g., intensity data from microarray experiments). Refer to the\r\n[voom](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-2-r29)\r\narticle for a detailed discussion of this topic.\r\n\r\n\r\n# Normalization\r\n\r\nMAnorm2 implements a robust method for normalizing raw signal intensities\r\nacross any number of ChIP-seq samples.\r\n\r\nTechnically, it considers the common peak regions of two ChIP-seq samples\r\n(i.e., the genomic intervals occupied by both samples; see also the section of\r\n[Format of Input Data](#format-of-input-data)) to have globally invariant\r\nsignals between them. Based on this assumption, MAnorm2 applies a linear\r\ntransformation to the raw signal intensities of one of the two samples such\r\nthat\r\n\r\n 1. the resulting M values (differences in signal intensities between the two\r\n    samples) at the common peak regions have an arithmetic mean of 0;\r\n 2. [sample Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#For_a_sample)\r\n    between the resulting M and A values at the common peak regions is 0 (A\r\n    values refer to average signal intensities across the two samples).\r\n\r\nThis procedure is for normalizing a pair of ChIP-seq samples. It can be\r\nextended to normalization of any number of samples so that the normalized\r\nsignal intensities are comparable across all of them. For more information\r\nregarding the normalization method implemented in MAnorm2, type the\r\nfollowing code in an R session after installing it:\r\n\r\n```r\r\nlibrary(MAnorm2)\r\n?normalize\r\n?normBioCond\r\n```\r\n\r\n\r\n# Differential and Hypervariable Analyses\r\n\r\nMAnorm2 designs a self-contained system of utility functions for calling\r\ndifferential ChIP-seq signals between two or more biological conditions, \r\neach with or without replicate samples. It also implements a method named\r\nHyperChIP for calling hypervariable ChIP-seq signals across samples. To be\r\nnoted, the framework implemented in MAnorm2 for differential and hypervariable\r\nanalyses requires the input signal intensities to have been normalized but \r\nis independent of the specific normalization method. \r\nThis means that any normalization and/or bias\r\ncorrection tools could be adapted to \r\nMAnorm2, as long as the resulting signal measurements are suited to be\r\nmodeled by normal distribution. For example, for highly regular samples,\r\nyou might want to perform the normalization based on their *size factors*\r\n(refer to the `normalizeBySizeFactors` function in MAnorm2 for details).\r\n\r\nTechnically, MAnorm2 has implemented an S3 class named \"bioCond\" for\r\ngrouping ChIP-seq samples belonging to the same biological condition,\r\nand it has devised a number of functions for handling objects of this\r\nclass. Taking advantage of these functions, you can\r\n\r\n 1. call genomic intervals with differential ChIP-seq signals between\r\n    two or more bioConds;\r\n 2. call genomic intervals with hypervariable ChIP-seq signals across multiple\r\n    samples or bioConds;\r\n 3. perform hierarchical clustering on a set of ChIP-seq samples or bioConds by\r\n    measuring the distance (i.e., dissimilarity) between each pair of them.\r\n\r\nNote that, for the samples grouped into a bioCond, MAnorm2 models the\r\nrelationship between observed mean signal intensities at individual intervals\r\nand the associated (observed) variances. In practice, this modeling strategy\r\ncould compensate for the lack of sufficient replicates for deriving accurate\r\nvariance estimates for individual intervals. And each of the above\r\nanalyses takes advantage of the modeling of mean-variance trend to improve\r\nvariance estimation.\r\n\r\nFor an overview of the interface functions provided by MAnorm2, type the\r\nfollowing code in an R session after installing it:\r\n\r\n```r\r\nlibrary(MAnorm2)\r\n?MAnorm2\r\n```\r\n\r\n\r\n# Citation\r\n\r\nTo cite the MAnorm2 package in publications, please use\r\n\r\n\u003e Tu, S., et al.,\r\n\u003e *MAnorm2 for quantitatively comparing groups of ChIP-seq samples*.\r\n\u003e Genome Res, 2021. **31**(1): p. 131-145.\r\n\r\nIf you have performed MA normalization with a pseudo-reference profile as\r\nbaseline, or have employed a Winsorization-based robust parameter estimation\r\nframework, or have performed a hypervariable analysis,\r\nplease cite additionally\r\n\r\n\u003e Chen, H., et al.,\r\n\u003e *HyperChIP: identification of hypervariable signals across ChIP-seq or ATAC-seq samples*.\r\n\u003e Genome Biol, 2022. **23**(1): p. 62.\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftushiqi%2Fmanorm2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftushiqi%2Fmanorm2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftushiqi%2Fmanorm2/lists"}