{"id":19630527,"url":"https://github.com/mahshaaban/integratingdatabioc","last_synced_at":"2026-02-19T19:31:40.171Z","repository":{"id":185878343,"uuid":"332156225","full_name":"MahShaaban/IntegratingDataBioc","owner":"MahShaaban","description":"Source code for the Bioconductor workshop \"Integrating gene expression and DNA-binding data using R/Bioconductor\"","archived":false,"fork":false,"pushed_at":"2021-02-27T03:59:54.000Z","size":1116,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-12T00:17:35.024Z","etag":null,"topics":["bioconductor-packages","chip-seq","gene-expression","rna-se","transcription-factors"],"latest_commit_sha":null,"homepage":"https://mahshaaban.github.io/IntegratingDataBioc/","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MahShaaban.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-01-23T07:49:10.000Z","updated_at":"2021-02-27T13:31:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"38426e79-ebb5-4d84-8116-6c5a16c75622","html_url":"https://github.com/MahShaaban/IntegratingDataBioc","commit_stats":null,"previous_names":["mahshaaban/integratingdatabioc"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/MahShaaban/IntegratingDataBioc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MahShaaban%2FIntegratingDataBioc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MahShaaban%2FIntegratingDataBioc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MahShaaban%2FIntegratingDataBioc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MahShaaban%2FIntegratingDataBioc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MahShaaban","download_url":"https://codeload.github.com/MahShaaban/IntegratingDataBioc/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MahShaaban%2FIntegratingDataBioc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29628776,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T18:02:07.722Z","status":"ssl_error","status_checked_at":"2026-02-19T18:01:46.144Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioconductor-packages","chip-seq","gene-expression","rna-se","transcription-factors"],"created_at":"2024-11-11T12:02:50.261Z","updated_at":"2026-02-19T19:31:40.140Z","avatar_url":"https://github.com/MahShaaban.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"![.github/workflows/basic_checks.yaml](https://github.com/MahShaaban/IntegratingDataBioc/workflows/.github/workflows/basic_checks.yaml/badge.svg)\n\n- This workshop is scheduled for February, 27 and is hosted by\n[R-Ladies Tunisia](https://rladies.org/tunisia-rladies/) \nas part of the\n[R4Bioinfo](R4Bioinfo) series. [(Event Page).](https://www.meetup.com/rladies-tunis/events/276421511/)\n- Part of the materials was presented at \n[EuropBioc 2021](https://github.com/MahShaaban/targetShop). \n- The code is organized in a \n[BiocWorkshop](https://github.com/seandavi/BuildABiocWorkshop) format.\n\n# Integrating gene expression and DNA-binding data using R/Bioconductor\n\n## Description\n\nResearchers use ChIP binding data to identify potential transcription factor \nbinding sites. They use gene expression data from sequencing or microarrays to\nquantify the effect of the factor over-expression or knockdown on \nits targets. The integration of the binding and expression data therefore can be\nused to improve the understanding of a transcription factor function. In this\nworkshop, I present a complete workflow for integrating the gene expression \n(microarray/RNA-seq)\nand DNA-binding data (ChIP-seq) to predict the combined function of \ntwo transcription factors using R/Bioconductor. The example we will be using in \nthe workshop is from real datasets of two functionally and evolutionary related\ntranscription factors YY1 and YY2 in HeLa cells. We will try to identify the\nfactor-specific and the shared targets of the factors in this particular cell \nline. Then we will use a technique find out the aggregate functions of the \nfactors on their individual (inducer or repressor) and common targets \n(cooperative or competitive).\n\n## Pre-requisites\n\n- Basic knowledge of _R_ syntax\n- Familiarity with gene expression and DNA-binding data\n- Readings:\n[Tang et al., 2011](https://pubmed.ncbi.nlm.nih.gov/21940749/),\n[Wang et al., 2013](https://pubmed.ncbi.nlm.nih.gov/24263090/), and \n[Ahmed et al., 2020](https://pubmed.ncbi.nlm.nih.gov/32894066/)\n\n## Participation\n\nParticipants are expected to walk through the code (rmarkdown document). An \nintroduction will be given at the beginning to introduce the motivation of the\nanalysis and the relevant Bioconductor packages.\n\nThere are a few ways to run the code\n\n1. Use [RStudio Cloud](https://rstudio.cloud/) to run a cloud instance of \nRStudio (free). Create a New Project \u003e New Project from Git Repository and paste\nthis URL in the pop up box \n[MahShaaban/IntegratingDataBioc](https://github.com/MahShaaban/IntegratingDataBioc)\n\nPS: Hit or miss due to limited resources on the free tier\n\n2. Use RStudio locally to install the required packages and run the code after\ncloning this repo\n\n```bash\ngit clone https://github.com/MahShaaban/IntegratingDataBioc\ncd IntegratingDataBioc\nopen -a Rstudio vignette/workshop_code.Rmd \n```\n\n3. Use the docker image \n[mahshaaban/IntegratingDataBioc](https://hub.docker.com/repository/docker/mahshaaban/IntegratingDataBioc/)\nand knit the `Rmd` files in `vignettes/` from within Rstudio.\n \n```bash\ndocker pull mahshaaban/target:latest\ndocker run -e PASSWORD=\u003ca_password\u003e -p 8787:8787 mahshaaban/target:latest\n```\n \nAn Rstudio session will be accessable at \n[https://localhost:8787/](https://localhost:8787/)\nin the browser. The login username is always `rstudio` and the password is \n`\u003ca_password\u003e`.\n\nThe packaged is tested using the docker image (option 3) on GitHub \n[Actions](https://github.com/MahShaaban/IntegratingDataBioc/actions).\nTo make sure everything is working fine on your end, run the following in RStudio\n```r\nrcmdcheck::rcmdcheck(args = c(\"--no-manual\"), error_on = \"warning\", check_dir = \"check\")\n```\nThere should be no errors or warnings. \n\n## _R_ / _Bioconductor_ packages used\n\n- target\n\nData management\n\n- GenomicRanges\n- GenomicFeatures\n- rtracklayer\n- AnnotationDbi\n- readr\n- dplyr\n- tidyr\n- purrr\n\nAnnotation packages\n\n- TxDb.Hsapiens.UCSC.hg19.knownGene\n- org.Hs.eg.db\n\n## Time outline\n\n| Activity                                                  | Time |\n|-----------------------------------------------------------|------|\n| Bioconductor packages and classes                         | 20m  |\n| Introduction to the target package                        | 20m  |\n| Code-walkthrough: A use case of YY1 and YY2 in HeLa cells | 20m  |\n\n## Workshop goals and objectives\n\nThe workshop aims to teach participants how to use R/Bioconductor packages to \nread in differential expression and binding peaks data, run a \npredictive analysis and explore its output. I hope that by providing a \ncomplete realistic example, participants would develop an understanding of the \nissues and the importance of integrating those two types of data. Ideally,\nparticipants would be able to adapt this code and the workflow to apply \nthis kind of analysis to their own datasets.\n\n## Learning goals\n\n- Learn to read differential expression and binding peaks data into the \nappropriate R objects\n- Learn to use Bioconductor packages to extract the genomic annotation\n- Learn to prepare the expression and binding data for `target` analysis\n- Understand the `target` output through the package visualization and testing \ntools\n\n## Learning objectives\n\n- Read data into R `data.frame`s and Bioconductor `GRanges` objects\n- Extract information from Bioconductor annotation packages TxDb and org.db\n- Apply the `target` analysis using `associated_peaks` and `direct_targets` \nfunctions\n- Visualize the output using the cumulative distribution functions through\n`plot_predicitons`\n- Test the results using KS test through `test_predicitons`\n\nThis workshop is based on a workflow article: [a draft](https://github.com/MahShaaban/targetFlow)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmahshaaban%2Fintegratingdatabioc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmahshaaban%2Fintegratingdatabioc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmahshaaban%2Fintegratingdatabioc/lists"}