{"id":15518977,"url":"https://github.com/darwinawardwinner/cd4-csaw","last_synced_at":"2025-04-23T04:32:59.326Z","repository":{"id":139219409,"uuid":"58002923","full_name":"DarwinAwardWinner/CD4-csaw","owner":"DarwinAwardWinner","description":"Reproducible reanalysis of a combined ChIP-Seq \u0026 RNA-Seq data set","archived":false,"fork":false,"pushed_at":"2019-08-09T16:53:57.000Z","size":66233,"stargazers_count":16,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-17T19:18:53.303Z","etag":null,"topics":["bioconductor","bioinformatics-pipeline","chip-seq","r","reproducible-research","rna-seq"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DarwinAwardWinner.png","metadata":{"files":{"readme":"README.mkdn","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-05-03T21:32:00.000Z","updated_at":"2020-05-28T11:47:49.000Z","dependencies_parsed_at":null,"dependency_job_id":"4bdc5551-0b14-43b2-add5-636c1551f92c","html_url":"https://github.com/DarwinAwardWinner/CD4-csaw","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DarwinAwardWinner%2FCD4-csaw","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DarwinAwardWinner%2FCD4-csaw/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DarwinAwardWinner%2FCD4-csaw/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DarwinAwardWinner%2FCD4-csaw/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DarwinAwardWinner","download_url":"https://codeload.github.com/DarwinAwardWinner/CD4-csaw/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250372058,"owners_count":21419710,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioconductor","bioinformatics-pipeline","chip-seq","r","reproducible-research","rna-seq"],"created_at":"2024-10-02T10:19:42.055Z","updated_at":"2025-04-23T04:32:58.910Z","avatar_url":"https://github.com/DarwinAwardWinner.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Re-analysis of a combined ChIP-Seq \u0026 RNA-Seq data set\n\nThis is the code for a re-analysis of a [GEO dataset][1] that I\noriginally analyzed for [this paper][2] using statistical methods that\nwere not yet available at the time, such as the\n[csaw Bioconductor package][3], which provides a principled way to\nnormalize windowed counts of ChIP-Seq reads and test them for\ndifferential binding. The original paper only analyzed binding within\npre-defined promoter regions. In addition, some improvements have also\nbeen made to the RNA-seq analysis using newer features of [limma][4]\nsuch as quality weights.\n\nThis workflow downloads the sequence data and sample metadata from the\npublic GEO/SRA release, so anyone can download and run this code to\nreproduce the full analysis.\n\n## Workflow\n\n![Rule Graph](rulegraphs/rulegraph-all.png \"Rule graph of currently implemented workflow\")\n\n### Completed components\n\n* ChIP-seq\n  * Mapping with bowtie2\n  * Peak calling with MACS2 and Epic\n  * Fetching of [blacklists][5] from UCSC\n  * Generation of greylists from ChIP-Seq input samples\n  * IDR analysis of blacklist-filtered peak calls\n  * Computation of cross-correlation function for ChIP-Seq samples,\n    excluding blacklisted regions\n  * Counting in windows across the genome\n* RNA-seq\n  * Mapping with STAR \u0026 HISAT2\n  * Counting reads aligned to genes\n  * Alignment-free bias-corrected transcript quantification using Salmon \u0026 Kallisto\n  * Differential gene expression\n\n### Possible TODO components\n\n* Integrating RNA-seq and ChIP-seq\n  * hiAnnotator: http://bioconductor.org/packages/devel/bioc/html/hiAnnotator.html\n  * ChIPseeker: http://bioconductor.org/packages/devel/bioc/html/ChIPseeker.html\n  * mogsa: http://bioconductor.org/packages/release/bioc/html/mogsa.html\n* Gene set tests\n  * ToPASeq: http://bioconductor.org/packages/devel/bioc/html/ToPASeq.html\n  * mvGST: http://bioconductor.org/packages/devel/bioc/html/mvGST.html\n  * mgsa: http://bioconductor.org/packages/release/bioc/html/mgsa.html\n* QC Stuff\n  * ChIPQC: http://bioconductor.org/packages/release/bioc/html/ChIPQC.html\n  * MultiQC: http://multiqc.info/\n  * Rqc: http://www.bioconductor.org/packages/devel/bioc/html/Rqc.html\n* mixOmics: http://mixomics.org/\n* ica: https://cran.rstudio.com/web/packages/ica/index.html\n* Motif enrichment\n* pcaExplorer: https://bioconductor.org/packages/release/bioc/html/pcaExplorer.html\n\n## TODO Code cleanup\n\n* Remove unnecessary library() calls\n* Put spaces around equals signs\n\n## TODO Other\n\n* Document how to run the pipeline\n* Provide install script for R \u0026 Python packages.\n\n## Dependencies\n\n### Command-line tools\n\n* [ascp](http://downloads.asperasoft.com/en/downloads/50) Aspera\n  download client for downloading SRA files\n* [Bedtools](http://bedtools.readthedocs.io/en/latest/)\n* [Bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)\n  aligner\n* [Epic](https://github.com/endrebak/epic) peak caller\n* [fastq-tools](http://homes.cs.washington.edu/~dcjones/fastq-tools/)\n* [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) aligner\n* [IDR python script](https://github.com/nboley/idr)\n* [Kallisto](https://pachterlab.github.io/kallisto/about) RNA-seq\n  quantifier\n* [MACS2](https://github.com/taoliu/MACS) peak caller\n* [Picard tools](https://broadinstitute.github.io/picard/) for various\n  file manipulation utilities\n* [Salmon](http://salmon.readthedocs.io/en/latest/) RNA-seq quantifier\n  (devel version 0.7.3)\n* [Shoal](https://github.com/COMBINE-lab/shoal)\n* [Snakemake](https://bitbucket.org/snakemake/snakemake/wiki/Home) for\n  running the workflow\n* [SRA toolkit](https://github.com/ncbi/sra-tools) for extracting\n  reads from SRA files\n* [STAR](https://github.com/alexdobin/STAR) aligner\n* [UCSC command-line tools](http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads)\n  (e.g. liftOver)\n\n### Programming languages and packages\n\n* [R](https://www.r-project.org/),\n  [Bioconductor](http://bioconductor.org/), and the following R\n  packages:\n    * From [CRAN](http://cran.r-project.org/): assertthat, doParallel,\n      dplyr, future, getopt, GGally, ggforce, ggfortify, ggplot2, ks,\n      lazyeval, lubridate, magrittr, MASS, Matrix, openxlsx, optparse,\n      parallel, purrr, RColorBrewer, readr, reshape2, rex, scales,\n      stringi, stringr\n    * From [Bioconductor](http://bioconductor.org/): annotate,\n      Biobase, BiocParallel, BSgenome.Hsapiens.UCSC.hg19,\n      BSgenome.Hsapiens.UCSC.hg38, ChIPQC, csaw, edgeR,\n      GenomicFeatures, GenomicRanges, GEOquery, limma, org.Hs.eg.db,\n      Rsamtools, Rsubread, rtracklayer, S4Vectors, SRAdb,\n      SummarizedExperiment, TxDb.Hsapiens.UCSC.hg19.knownGene,\n      tximport\n    * Installed manually:\n      [sleuth](http://pachterlab.github.io/sleuth/about),\n      [wasabi](https://github.com/COMBINE-lab/wasabi)\n* [Python 3](https://www.python.org/) and the following Python\n  packages: biopython, atomicwrites, numpy, pandas, plac, pysam, rpy2,\n  snakemake\n\n[1]: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73214\n[2]: http://www.ncbi.nlm.nih.gov/pubmed/27170561\n[3]: https://bioconductor.org/packages/release/bioc/html/csaw.html\n[4]: https://bioconductor.org/packages/release/bioc/html/limma.html\n[5]: http://www.broadinstitute.org/~anshul/projects/encode/rawdata/blacklists/hg19-blacklist-README.pdf\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarwinawardwinner%2Fcd4-csaw","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdarwinawardwinner%2Fcd4-csaw","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarwinawardwinner%2Fcd4-csaw/lists"}