{"id":25138566,"url":"https://github.com/neurogenomics/epicompare","last_synced_at":"2026-01-16T01:49:39.580Z","repository":{"id":41474407,"uuid":"418478647","full_name":"neurogenomics/EpiCompare","owner":"neurogenomics","description":"Comparison, benchmarking \u0026 QC of epigenetic datasets","archived":false,"fork":false,"pushed_at":"2025-12-02T15:03:00.000Z","size":32124,"stargazers_count":17,"open_issues_count":15,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-12-05T13:19:00.754Z","etag":null,"topics":["benchmark","benchmarking","bioconductor","bioconductor-package","comparison","epigenetics","genetics","html","interactive-reporting","r-package"],"latest_commit_sha":null,"homepage":"https://doi.org/doi:10.18129/B9.bioc.EpiCompare","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neurogenomics.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-10-18T11:53:37.000Z","updated_at":"2025-12-02T14:33:28.000Z","dependencies_parsed_at":"2023-12-20T13:25:45.554Z","dependency_job_id":"36a1f6c9-34d7-4f3b-9c38-a5e5339adb94","html_url":"https://github.com/neurogenomics/EpiCompare","commit_stats":{"total_commits":225,"total_committers":10,"mean_commits":22.5,"dds":0.5066666666666666,"last_synced_commit":"997e1a10a9e65d296b3423c74b95ad560134baef"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/neurogenomics/EpiCompare","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neurogenomics%2FEpiCompare","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neurogenomics%2FEpiCompare/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neurogenomics%2FEpiCompare/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neurogenomics%2FEpiCompare/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neurogenomics","download_url":"https://codeload.github.com/neurogenomics/EpiCompare/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neurogenomics%2FEpiCompare/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28474505,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T00:15:39.755Z","status":"ssl_error","status_checked_at":"2026-01-16T00:15:32.174Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","benchmarking","bioconductor","bioconductor-package","comparison","epigenetics","genetics","html","interactive-reporting","r-package"],"created_at":"2025-02-08T17:17:06.191Z","updated_at":"2026-01-16T01:49:39.564Z","avatar_url":"https://github.com/neurogenomics.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\ntitle: \"\u0026#9878;\u003ccode\u003eEpiCompare\u003c/code\u003e\u0026#9878;\u003cbr\u003eQC and Benchmarking of Epigenomic Datasets\"\nauthor: \"`r rworkflows::use_badges(add_doi = 'https://doi.org/10.1101/2022.07.22.501149',\n                                  add_bioc_release = TRUE,\n                                  add_bioc_download_month = TRUE,\n                                  add_bioc_download_total = TRUE,\n                                  add_bioc_download_rank = TRUE)`\" \ndate: \"\u003ch5\u003e\u003ci\u003eUpdated\u003c/i\u003e: `r format(Sys.Date(), '%b-%d-%Y')`\u003c/h5\u003e\"\noutput: \n  github_document\n---\n\n```{r, echo=FALSE, include=FALSE}\npkg \u003c- read.dcf(\"DESCRIPTION\", fields = \"Package\")[1]\ntitle \u003c- read.dcf(\"DESCRIPTION\", fields = \"Title\")[1]\ndescription \u003c- read.dcf(\"DESCRIPTION\", fields = \"Description\")[1]\nURL \u003c- read.dcf('DESCRIPTION', fields = 'URL')[1]\nowner \u003c- tolower(strsplit(URL,\"/\")[[1]][4]) \n```\n\n\n# Introduction\n\n`EpiCompare` is an R package for comparing multiple epigenomic datasets\nfor quality control and benchmarking purposes. The function outputs a\nreport in HTML format consisting of three sections:\n\n1.  **General Metrics**: Metrics on peaks (percentage of blacklisted and\n    non-standard peaks, and peak widths) and fragments (duplication\n    rate) of samples.\n2.  **Peak Overlap**: Frequency, percentage, statistical significance of \n    overlapping and non-overlapping peaks. This also includes Upset, \n    precision-recall and correlation plots. \n3.  **Functional Annotation**: Functional annotation (ChromHMM, ChIPseeker\n    and enrichment analysis) of peaks. Also includes peak enrichment\n    around Transcription Start Site.\n\n*Note*: Peaks located in blacklisted regions and non-standard chromosomes are \nremoved from the files prior to analysis. \n\n# Installation\n\n## Standard\n\nTo install `EpiCompare` use:\n\n```r\nif (!require(\"BiocManager\", quietly = TRUE)) install.packages(\"BiocManager\")\nBiocManager::install(\"EpiCompare\") \n```\n## All dependencies\n\n\u003cdetails\u003e\n\u003csummary\u003e\u0026#128072; \u003cstrong\u003eDetails\u003c/strong\u003e\u003c/summary\u003e\n\nInstalling all *Imports* and *Suggests* will allow you to use the full functionality of `EpiCompare` right away, without having to stop and install extra dependencies later on. \n\nTo install these packages as well, use:\n\n```R\nBiocManager::install(\"EpiCompare\", dependencies=TRUE) \n```\n\nNote that this will increase installation time, \nbut it means that you won't have to worry about installing any R packages\nwhen using functions with certain suggested dependencies\n\n\u003c/details\u003e\n\n## Development \n\n\u003cdetails\u003e\n\u003csummary\u003e\u0026#128072; \u003cstrong\u003eDetails\u003c/strong\u003e\u003c/summary\u003e\nTo install the development version of `EpiCompare`, use:\n\n```R\nif (!require(\"remotes\")) install.packages(\"remotes\")\nremotes::install_github(\"neurogenomics/EpiCompare\")\n```\n\u003c/details\u003e\n\n## Citation\n \nIf you use ``r pkg``, please cite: \n\n\u003c!-- Modify this by editing the file: inst/CITATION  --\u003e\n\u003e `r citation(pkg)$textVersion`\n\n\n# Documentation\n\n## [EpiCompare website](https://neurogenomics.github.io/EpiCompare) \n## [Docker/Singularity container](https://neurogenomics.github.io/EpiCompare/articles/docker)  \n## [Bioconductor page](https://doi.org/doi:10.18129/B9.bioc.EpiCompare)\n\n### :warning: Note on documentation versioning\n\nThe documentation in this README and the [GitHub Pages website](https://neurogenomics.github.io/EpiCompare/)\npertains to the *development* version of `EpiCompare`.\nOlder versions of `EpiCompare` may have slightly different documentation\n(e.g. available functions, parameters). For documentation in older versions of \n`EpiCompare`, please see the **Documentation** section of the relevant\n version on [Bioconductor](https://doi.org/doi:10.18129/B9.bioc.EpiCompare) \n \n\n# Usage\n\nLoad package and example datasets.\n\n```r\nlibrary(EpiCompare)\ndata(\"encode_H3K27ac\") # example peakfile\ndata(\"CnT_H3K27ac\") # example peakfile\ndata(\"CnR_H3K27ac\") # example peakfile\ndata(\"CnT_H3K27ac_picard\") # example Picard summary output\ndata(\"CnR_H3K27ac_picard\") # example Picard summary output\n```\n\nPrepare input files: \n\n```r\n# create named list of peakfiles \npeakfiles \u003c- list(\"CnT\"=CnT_H3K27ac, \n                  \"CnR\"=CnR_H3K27ac) \n# set ref file and name \nreference \u003c- list(\"ENCODE_H3K27ac\" = encode_H3K27ac) \n# create named list of Picard summary\npicard_files \u003c- list(\"CnT\"=CnT_H3K27ac_picard, \n                     \"CnR\"=CnR_H3K27ac_picard) \n```\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003e\u0026#128072; Tips on importing user-supplied files\u003c/strong\u003e\u003c/summary\u003e\n\n`EpiCompare::gather_files` is helpful for identifying and importing \npeak or picard files.\n```r\n# To import BED files as GRanges object\npeakfiles \u003c- EpiCompare::gather_files(dir = \"path/to/peaks/\", \n                                      type = \"peaks.stringent\")\n# EpiCompare alternatively accepts paths (to BED files) as input \npeakfiles \u003c- list(sample1=\"/path/to/peaks/file1_peaks.stringent.bed\", \n                  sample2=\"/path/to/peaks/file2_peaks.stringent.bed\")\n# To import Picard summary output txt file as data frame\npicard_files \u003c- EpiCompare::gather_files(dir = \"path/to/peaks\", \n                                         type = \"picard\")\n```\n\u003c/details\u003e\n\nRun `EpiCompare()`:\n\n```r\nEpiCompare::EpiCompare(peakfiles = peakfiles,\n                       genome_build = list(peakfiles=\"hg19\",\n                                           reference=\"hg38\"),\n                       genome_build_output = \"hg19\", \n                       picard_files = picard_files,\n                       reference = reference,\n                       run_all = TRUE\n                       output_dir = tempdir())\n```\n\n#### Required Inputs\n\nThese input parameters must be provided:\n\n\u003cdetails\u003e\n\u003csummary\u003e\u0026#128072; \u003cstrong\u003eDetails\u003c/strong\u003e\u003c/summary\u003e\n\n-   `peakfiles` : Peakfiles you want to analyse. EpiCompare accepts\n    peakfiles as GRanges object and/or as paths to BED files. Files must\n    be listed and named using `list()`.\n    E.g. `list(\"name1\"=peakfile1, \"name2\"=peakfile2)`.\n-   `genome_build` : A named list indicating the human genome build used to \n    generate each of the following inputs:\n    -   `peakfiles` : Genome build for the `peakfiles` input. Assumes genome build\n    is the same for each element in the `peakfiles` list.\n    -   `reference` : Genome build for the `reference` input.\n    -   `blacklist` : Genome build for the `blacklist` input. \u003cbr\u003e\n    E.g. `genome_build = list(peakfiles=\"hg38\", reference=\"hg19\", blacklist=\"hg19\")`\n-   `genome_build_output` Genome build to standardise all inputs to. Liftovers \n    will be performed automatically as needed. Default is \"hg19\".\n-   `blacklist` : Peakfile as GRanges object specifying genomic regions\n    that have anomalous and/or unstructured signals independent of the\n    cell-line or experiment. For human hg19 and hg38 genome, use\n    built-in data `data(hg19_blacklist)` and `data(hg38_blacklist)`\n    respectively. For mouse mm10 genome, use built-in data `data(mm10_blacklist)`.\n-   `output_dir` : Please specify the path to directory, where all\n    `EpiCompare` outputs will be saved.\n    \n\u003c/details\u003e\n\n#### Optional Inputs\n\nThe following input files are optional: \n\n\u003cdetails\u003e\n\u003csummary\u003e\u0026#128072; \u003cstrong\u003eDetails\u003c/strong\u003e\u003c/summary\u003e\n\n-   `picard_files` : A list of summary metrics output from \n    [Picard](https://broadinstitute.github.io/picard/). *Picard MarkDuplicates* \n    can be used to identify the duplicate reads amongst the alignment. This tool \n    generates a summary output, normally with the ending \n    *.markdup.MarkDuplicates.metrics.txt*. If this input is provided, metrics on \n    fragments (e.g. mapped fragments and duplication rate) will be included \n    in the report. Files must be in data.frame format and listed using `list()` \n    and named using `names()`. To import Picard duplication metrics (.txt file) \n    into R as data frame, use\n    `picard \u003c- read.table(\"/path/to/picard/output\", header = TRUE, fill = TRUE)`.\n-   `reference` : Reference peak file(s) is used in `stat_plot` and\n    `chromHMM_plot`. File must be in `GRanges` object, listed and named\n    using `list(\"reference_name\" = GRanges_obect)`. If more than one reference \n    is specified, `EpiCompare` outputs individual reports for each reference.\n    However, please note that this can take awhile. \n    \n\u003c/details\u003e\n\n#### Optional Plots\n\nBy default, these plots will not be included in the report unless set to `TRUE`.\nTo turn on all features at once, simply use the `run_all=TRUE` argument:\n\n\u003cdetails\u003e\n\u003csummary\u003e\u0026#128072; \u003cstrong\u003eDetails\u003c/strong\u003e\u003c/summary\u003e\n\n-   `upset_plot` : Upset plot of overlapping peaks between samples.\n-   `stat_plot` : included only if a `reference` dataset is provided.\n    The plot shows statistical significance (p/q-values) of sample peaks\n    that are overlapping/non-overlapping with the `reference` dataset.\n-   `chromHMM_plot` : ChromHMM annotation of peaks. If a `reference`\n    dataset is provided, ChromHMM annotation of overlapping and\n    non-overlapping peaks with the `reference` is also included in the\n    report.\n-   `chipseeker_plot` : ChIPseeker annotation of peaks.\n-   `enrichment_plot` : KEGG pathway and GO enrichment analysis of\n    peaks.\n-   `tss_plot` : Peak frequency around (+/- 3000bp) transcriptional\n    start site. Note that it may take awhile to generate this plot for\n    large sample sizes.\n-   `precision_recall_plot` : Plot showing the precision-recall score across \n    the peak calling stringency thresholds. \n-   `corr_plot` : Plot showing the correlation between the quantiles when the\n    genome is binned at a set size. These quantiles are based on the intensity \n    of the peak, dependent on the peak caller used (q-value for MACS2). \n\n\u003c/details\u003e\n\n#### Other Options\n\n\u003cdetails\u003e\n\u003csummary\u003e\u0026#128072; \u003cstrong\u003eDetails\u003c/strong\u003e\u003c/summary\u003e\n\n-   `chromHMM_annotation` : Cell-line annotation for ChromHMM. Default\n    is K562. Options are:\n    -   \"K562\" = K-562 cells\n    -   \"Gm12878\" = Cellosaurus cell-line GM12878\n    -   \"H1hesc\" = H1 Human Embryonic Stem Cell\n    -   \"Hepg2\" = Hep G2 cell\n    -   \"Hmec\" = Human Mammary Epithelial Cell\n    -   \"Hsmm\" = Human Skeletal Muscle Myoblasts\n    -   \"Huvec\" = Human Umbilical Vein Endothelial Cells\n    -   \"Nhek\" = Normal Human Epidermal Keratinocytes\n    -   \"Nhlf\" = Normal Human Lung Fibroblasts\n-   `interact` : By default, all heatmaps (percentage overlap and\n    ChromHMM heatmaps) in the report will be interactive. If set FALSE,\n    all heatmaps will be static. N.B. If `interact=TRUE`, interactive\n    heatmaps will be saved as html files, which may take time for larger\n    sample sizes.\n-   `output_filename` : By default, the report is named *EpiCompare.html*.\n    You can specify the file name of the report here.\n-   `output_timestamp` : By default FALSE. If TRUE, the filename of the\n    report includes the date.\n    \n\u003c/details\u003e\n\n#### Outputs\n\n`EpiCompare` outputs the following:\n\n1.  **HTML report**: A summary of all analyses saved in specified\n    `output_dir`\n2.  **EpiCompare_file**: if `save_output=TRUE`, all plots generated by\n    `EpiCompare` will be saved in *EpiCompare_file* directory also in\n    specified `output_dir`\n\nAn example report comparing ATAC-seq and DNase-seq can be found\n[here](https://neurogenomics.github.io/EpiCompare/articles/example_report)\n\n## Datasets \n\n`EpiCompare` includes several built-in datasets:\n\n\u003cdetails\u003e\n\u003csummary\u003e\u0026#128072; \u003cstrong\u003eDetails\u003c/strong\u003e\u003c/summary\u003e\n\n-   `encode_H3K27ac`: Human H3K27ac peak file generated with ChIP-seq using K562\ncell-line. Taken from [ENCODE](https://www.encodeproject.org/files/ENCFF044JNJ/)\nproject. For more information, run `?encode_H3K27ac`.  \n-   `CnT_H3K27ac`: Human H3K27ac peak file generated with CUT\u0026Tag using K562 \ncell-line from [Kaya-Okur et al., (2019)](https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8383507). For more\ninformation, run `?CnT_H3K27ac`. \n-   `CnR_H3K27ac`: Human H3K27ac peak file generated with CUT\u0026Run using K562 \ncell-line from [Meers et al., (2019)](https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8581604).\nFor more details, run `?CnR_H3K27ac`. \n\n\u003c/details\u003e\n\n## Contact\n\n### [Neurogenomics Lab](https://www.neurogenomics.co.uk/inst/report/EpiCompare.html)\n\nUK Dementia Research Institute  \nDepartment of Brain Sciences  \nFaculty of Medicine  \nImperial College London  \n[GitHub](https://github.com/neurogenomics)  \n[DockerHub](https://hub.docker.com/orgs/neurogenomicslab)\n\n## Session Info\n\n\u003cdetails\u003e\n\u003csummary\u003e\u0026#128072; \u003cstrong\u003eDetails\u003c/strong\u003e\u003c/summary\u003e\n\n```{r Session Info}\nutils::sessionInfo()\n```\n\u003c/details\u003e\n\u003chr\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneurogenomics%2Fepicompare","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneurogenomics%2Fepicompare","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneurogenomics%2Fepicompare/lists"}