{"id":43928971,"url":"https://github.com/tidyomics/plyranges","last_synced_at":"2026-02-06T23:37:13.544Z","repository":{"id":37933913,"uuid":"101606084","full_name":"tidyomics/plyranges","owner":"tidyomics","description":"A grammar of genomic data transformation","archived":false,"fork":false,"pushed_at":"2025-11-20T14:44:53.000Z","size":3502,"stargazers_count":148,"open_issues_count":39,"forks_count":18,"subscribers_count":11,"default_branch":"devel","last_synced_at":"2025-11-20T16:21:14.246Z","etag":null,"topics":["bioconductor","data-analysis","dplyr","genomic-ranges","genomics","tidy-data"],"latest_commit_sha":null,"homepage":"https://tidyomics.github.io/plyranges/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tidyomics.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2017-08-28T05:12:39.000Z","updated_at":"2025-11-20T14:35:18.000Z","dependencies_parsed_at":"2025-06-24T02:26:33.280Z","dependency_job_id":"c5dcf179-c71c-4a62-88dc-f5cee10509be","html_url":"https://github.com/tidyomics/plyranges","commit_stats":null,"previous_names":["tidyomics/plyranges","sa-lee/plyranges"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tidyomics/plyranges","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidyomics%2Fplyranges","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidyomics%2Fplyranges/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidyomics%2Fplyranges/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidyomics%2Fplyranges/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tidyomics","download_url":"https://codeload.github.com/tidyomics/plyranges/tar.gz/refs/heads/devel","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tidyomics%2Fplyranges/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29180580,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-06T23:15:33.022Z","status":"ssl_error","status_checked_at":"2026-02-06T23:15:09.128Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioconductor","data-analysis","dplyr","genomic-ranges","genomics","tidy-data"],"created_at":"2026-02-06T23:37:13.438Z","updated_at":"2026-02-06T23:37:13.538Z","avatar_url":"https://github.com/tidyomics.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, echo = FALSE, message=FALSE, warning=FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"README-\"\n)\n```\n\n# plyranges: fluent genomic data analysis \u003cimg id=\"plyranges_logo\" src=\"man/figures/logo.png\" align=\"right\" width = \"125\" /\u003e\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check-bioc](https://github.com/tidyomics/plyranges/workflows/R-CMD-check-bioc/badge.svg)](https://github.com/tidyomics/plyranges/actions?query=workflow%3AR-CMD-check-bioc)\n[![BioC status](http://www.bioconductor.org/shields/build/release/bioc/plyranges.svg)](https://bioconductor.org/checkResults/release/bioc-LATEST/plyranges)\n\u003c!-- badges: end --\u003e\n\n[plyranges](https://www.bioconductor.org/packages/release/bioc/html/plyranges.html) provides a consistent interface for importing and wrangling\ngenomics data from a variety of sources. The package defines a grammar of\ngenomic data transformation based on `dplyr` and the Bioconductor packages\n`IRanges`, `GenomicRanges`, and `rtracklayer`. It does this by providing a set\nof verbs for developing analysis pipelines based on _Ranges_ objects that\nrepresent genomic regions:\n\n* Modify genomic regions with the `mutate()` and `stretch()` functions.\n* Modify genomic regions while fixing the start/end/center coordinates with the `anchor_` family of functions.\n* Sort genomic ranges with `arrange()`.\n* Modify, subset, and aggregate genomic data with the `mutate()`,\n`filter()`, and `summarise()`functions.\n* Any of the above operations can be performed on partitions of the\ndata with `group_by()`.\n* Find nearest neighbour genomic regions with the `join_nearest_` family\nof functions.\n* Find overlaps between ranges with the `join_overlaps_` family of functions.\n* Merge all overlapping and adjacent genomic regions with `reduce_ranges()`.\n* Merge the end points of all genomic regions with `disjoin_ranges()`.\n* Import and write common genomic data formats with the `read_/write_` family\nof functions.\n\nFor more details on the features of plyranges, read the\n[vignette](https://tidyomics.github.io/plyranges/articles/an-introduction.html).\nFor a complete case-study on using plyranges to combine ATAC-seq and RNA-seq\nresults read the [*fluentGenomics*\nworkflow](https://tidyomics.github.io/fluentGenomics).\n\nplyranges is part of the [tidyomics](https://github.com/tidyomics)\nproject, providing a `dplyr`-based interface for many types of\ngenomics datasets represented in Bioconductor.\n\n# Installation\n\n[plyranges](https://www.bioconductor.org/packages/release/bioc/html/plyranges.html) can be installed from the latest Bioconductor\nrelease:\n\n```{r, eval=FALSE}\n# install.packages(\"BiocManager\")\nBiocManager::install(\"plyranges\")\n```\n\nTo install the development version from GitHub:\n\n```{r, eval=FALSE}\nBiocManager::install(\"tidyomics/plyranges\")\n```\n\n# Quick overview\n\n## About `Ranges`\n\n`Ranges` objects can either represent sets of integers as `IRanges` (which have\nstart, end and width attributes) or represent genomic intervals (which have\nadditional attributes, sequence name, and strand) as `GRanges`.  In addition,\nboth types of `Ranges` can store information about their intervals as metadata\ncolumns (for example GC content over a genomic interval).\n\n`Ranges` objects follow the tidy data principle: each row of a `Ranges` object\ncorresponds to an interval, while each column will represent a variable about\nthat interval, and generally each object will represent a single unit of\nobservation (like gene annotations).\n\nWe can construct a `IRanges` object from a `data.frame` with a `start` or\n`width` using the `as_iranges()` method.\n\n```{r, message=FALSE}\nlibrary(plyranges)\ndf \u003c- data.frame(start = 1:5, width = 5)\nas_iranges(df)\n# alternatively with end\ndf \u003c- data.frame(start = 1:5, end = 5:9)\nas_iranges(df)\n```\n\nWe can also construct a `GRanges` object in a similar manner. Note that a\n`GRanges` object requires at least a seqnames column to be present in the\ndata.frame (but not necessarily a strand column).\n\n```{r}\ndf \u003c- data.frame(seqnames = c(\"chr1\", \"chr2\", \"chr2\", \"chr1\", \"chr2\"),\n                 start = 1:5,\n                 width = 5)\nas_granges(df)\n# strand can be specified with `+`, `*` (mising) and `-`\ndf$strand \u003c- c(\"+\", \"+\", \"-\", \"-\", \"*\")\nas_granges(df)\n```\n\n# Example: finding GWAS hits that overlap known exons\nLet's look at a more a realistic example (taken from HelloRanges vignette).\n\n```{r, include=FALSE}\ndir \u003c- system.file(package = \"HelloRangesData\", \"extdata/\")\ngenome \u003c- as_granges(read.delim(file.path(dir, \"hg19.genome\"),\n                     header = FALSE),\n                     seqnames = V1, start = 1L, width = V2)\n\ngwas \u003c- read_bed(file.path(dir, \"gwas.bed\"), genome_info = genome)\nexons \u003c- read_bed(file.path(dir, \"exons.bed\"), genome_info = genome)\n```\n\nSuppose we have two _GRanges_ objects: one containing coordinates of known\nexons and another containing SNPs from a GWAS.\n\nThe first and last 5 exons are printed below, there are two additional columns\ncorresponding to the exon name, and a score.\n\nWe could check the number of exons per chromosome using `group_by` and\n`summarise`.\n```{r}\nexons\nexons %\u003e%\n  group_by(seqnames) %\u003e%\n  summarise(n = n())\n```\n\nNext we create a column representing the transcript_id with `mutate`:\n\n```{r}\nexons \u003c- exons %\u003e%\n  mutate(tx_id = sub(\"_exon.*\", \"\", name))\n```\n\nTo find all GWAS SNPs that overlap exons, we use `join_overlap_inner`. This\nwill create a new _GRanges_ with the coordinates of SNPs that overlap exons, as\nwell as metadata from both objects.\n\n```{r}\nolap \u003c- join_overlap_inner(gwas, exons)\nolap\n```\n\nFor each SNP we can count the number of times it overlaps a transcript.\n\n```{r}\nolap %\u003e%\n  group_by(name.x, tx_id) %\u003e%\n  summarise(n = n())\n```\n\nWe can also generate 2bp splice sites on either side of the exon using\n`flank_left` and `flank_right`. We add a column indicating the side of flanking\nfor illustrative purposes. The `interweave` function pairs the left and right\nranges objects.\n\n```{r}\nleft_ss \u003c- flank_left(exons, 2L)\nright_ss \u003c- flank_right(exons, 2L)\nall_ss \u003c- interweave(left_ss, right_ss, .id = \"side\")\nall_ss\n```\n\n# Learning more\n\n- The [*fluentGenomics* workflow](https://sa-lee.github.io/fluentGenomics) package shows you how to combine differential expression genes and differential chromatin accessibility peaks using plyranges. It extends the [case study](https://github.com/mikelove/plyrangesTximetaCaseStudy) by Michael Love for using plyranges with [tximeta](https://bioconductor.org/packages/release/bioc/html/tximeta.html).\n\n- The [extended vignette in the plyrangesWorkshops package](https://github.com/sa-lee/plyrangesWorkshops) has a detailed\nwalk through of using plyranges for coverage analysis.\n\n- The [Bioc 2018 Workshop book](https://bioconductor.github.io/BiocWorkshops/fluent-genomic-data-analysis-with-plyranges.html) has worked examples of using `plyranges` to analyse publicly available genomics data.\n\n\n# Citation\n\nIf you found `plyranges` useful for your work please cite our\n[paper](http://dx.doi.org/10.1186/s13059-018-1597-8):\n\n```\n@ARTICLE{Lee2019,\n  title    = \"plyranges: a grammar of genomic data transformation\",\n  author   = \"Lee, Stuart and Cook, Dianne and Lawrence, Michael\",\n  journal  = \"Genome Biol.\",\n  volume   =  20,\n  number   =  1,\n  pages    = \"4\",\n  month    =  jan,\n  year     =  2019,\n  url      = \"http://dx.doi.org/10.1186/s13059-018-1597-8\",\n  doi      = \"10.1186/s13059-018-1597-8\",\n  pmc      = \"PMC6320618\"\n}\n```\n\n# Contributing\n\nWe welcome contributions from the R/Bioconductor community. We ask that\ncontributors follow the [code of conduct](.github/CODE_OF_CONDUCT.md) and the guide\noutlined [here](.github/CONTRIBUTING.md).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidyomics%2Fplyranges","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftidyomics%2Fplyranges","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftidyomics%2Fplyranges/lists"}