{"id":17287651,"url":"https://github.com/ropensci/baseset","last_synced_at":"2025-09-07T02:05:53.788Z","repository":{"id":52850360,"uuid":"159389093","full_name":"ropensci/BaseSet","owner":"ropensci","description":"Provides classes for working with sets","archived":false,"fork":false,"pushed_at":"2025-02-18T23:01:29.000Z","size":7751,"stargazers_count":11,"open_issues_count":8,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-09-05T00:25:48.867Z","etag":null,"topics":["bioconductor","bioconductor-package","package","r","r-package","sets"],"latest_commit_sha":null,"homepage":"https://docs.ropensci.org/BaseSet","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ropensci.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2018-11-27T19:37:01.000Z","updated_at":"2025-02-18T22:57:53.000Z","dependencies_parsed_at":"2025-02-17T19:23:06.286Z","dependency_job_id":"f68500cc-1c86-439d-8481-33c8bef76a16","html_url":"https://github.com/ropensci/BaseSet","commit_stats":{"total_commits":615,"total_committers":5,"mean_commits":123.0,"dds":"0.014634146341463428","last_synced_commit":"10bd65c23e00ee4e914a4383c1cfd560d735eb34"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/ropensci/BaseSet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2FBaseSet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2FBaseSet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2FBaseSet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2FBaseSet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ropensci","download_url":"https://codeload.github.com/ropensci/BaseSet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2FBaseSet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273986629,"owners_count":25202708,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioconductor","bioconductor-package","package","r","r-package","sets"],"created_at":"2024-10-15T10:05:51.963Z","updated_at":"2025-09-07T02:05:53.761Z","avatar_url":"https://github.com/ropensci.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\neditor_options: \n  chunk_output_type: console\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r setup, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/ropensci/BaseSet/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ropensci/BaseSet/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/ropensci/BaseSet/graph/badge.svg)](https://app.codecov.io/gh/ropensci/BaseSet)\n[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#maturing)\n[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![rOpenSci](https://badges.ropensci.org/359_status.svg)](https://github.com/ropensci/software-review/issues/359)\n[![CRAN status](https://www.r-pkg.org/badges/version/BaseSet)](https://CRAN.R-project.org/package=BaseSet)\n\u003c!-- badges: end --\u003e\n\n# BaseSet\n\nThe goal of BaseSet is to facilitate working with sets in an efficient way. \nThe package implements methods to work on sets, doing intersection, union, complementary, power sets, cartesian product and other set operations in a tidy way. \n\n\nThe package supports [classical](https://en.wikipedia.org/wiki/Set_(mathematics)) and [fuzzy](https://en.wikipedia.org/wiki/Fuzzy_set) sets. \nFuzzy sets are similar to classical sets but there is some vagueness on the relationship between the element and the set. \n\n\nIt also allows to import from several formats used in the life science world. \nLike the [GMT](https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29) and the [GAF](https://geneontology.org/docs/go-annotation-file-gaf-format-2.1/) or the [OBO format](https://obofoundry.org/) file for ontologies.\n\nYou can save information about the elements, sets and their relationship on the object itself. \nFor instance origin of the set, categorical or numeric data associated with sets...\n\nWatch BaseSet working on the [examples](#Examples) below and in the vignettes. \nYou can also find [related packages](#Related-packages) and the differences with BaseSet. \nIf you have some questions or bugs [open an issue](https://github.com/ropensci/BaseSet/issues) (remember the [Code of Conduct](#Code-of-Conduct))\n\n# Installation\n\nThe package depends on some packages from Bioconductor. In order to install some of its dependencies you'll need first to install `{BiocManager}`:\n\n```{r dep, eval = FALSE}\nif (!require(\"BiocManager\")) {\n  install.packages(\"BiocManager\")\n}\n```\n\nYou can install the latest version of BaseSet from [Github](https://github.com/ropensci/BaseSet) with:\n\n```{r eval=FALSE}\nBiocManager::install(\"ropensci/BaseSet\", \n                     dependencies = TRUE, build_vignettes = TRUE, force = TRUE)\n```\n\n \n# Examples {#Examples}\n\n```{r include=FALSE}\nlibrary(\"BaseSet\")\n```\n\n## Sets\n\nWe can create a set like this:\n\n```{r TidySet}\nsets \u003c- list(A = letters[1:5], B = c(\"a\", \"f\"))\nsets_analysis \u003c- tidySet(sets)\nsets_analysis\n```\n\nPerform typical operations like union, intersection. You can name the resulting set or let the default name:\n\n```{r union-intersection}\nunion(sets_analysis, sets = c(\"A\", \"B\")) \n# Or we can give a name to the new set\nunion(sets_analysis, sets = c(\"A\", \"B\"), name = \"D\")\n# Or the intersection\nintersection(sets_analysis, sets = c(\"A\", \"B\"))\n# Keeping the other sets:\nintersection(sets_analysis, sets = c(\"A\", \"B\"), name = \"D\", keep = TRUE) \n```\n\nAnd compute size of sets among other things:\n\n```{r set_size}\nset_size(sets_analysis)\n```\n\nThe elements in one set not present in other:\n\n```{r subraction}\nsubtract(sets_analysis, set_in = \"A\", not_in = \"B\", keep = FALSE)\n```\n\nOr any other verb from [dplyr](https://cran.r-project.org/package=dplyr). We can add columns, filter, remove them and add information about the sets:\n\n```{r dplyr}\nlibrary(\"magrittr\")\nset.seed(4673) # To make it reproducible in your machine\nsets_enriched \u003c- sets_analysis %\u003e% \n  mutate(Keep = sample(c(TRUE, FALSE), 7, replace = TRUE)) %\u003e% \n  filter(Keep == TRUE) %\u003e% \n  select(-Keep) %\u003e% \n  activate(\"sets\") %\u003e% \n  mutate(sets_origin = c(\"Reactome\", \"KEGG\"))\nsets_enriched\n\n# Activating sets makes the verb affect only them:\nelements(sets_enriched)\nrelations(sets_enriched)\nsets(sets_enriched)\n```\n\n## Fuzzy sets\n\nIn [fuzzy sets](https://en.wikipedia.org/wiki/Fuzzy_set) the elements are vaguely related to the set by a numeric value usually between 0 and 1.\nThis implies that the association is not guaranteed.\n\n```{r fuzzy}\nrelations \u003c- data.frame(sets = c(rep(\"A\", 5), \"B\", \"B\"), \n                        elements = c(\"a\", \"b\", \"c\", \"d\", \"e\", \"a\", \"f\"),\n                        fuzzy = runif(7))\nfuzzy_set \u003c- tidySet(relations)\nfuzzy_set\n```\n\nThe equivalent operations performed on classical sets are possible with fuzzy sets:\n\n```{r fuzzy-operations}\nunion(fuzzy_set, sets = c(\"A\", \"B\")) \n# Or we can give a name to the new set\nunion(fuzzy_set, sets = c(\"A\", \"B\"), name = \"D\")\n# Or the intersection\nintersection(fuzzy_set, sets = c(\"A\", \"B\"))\n# Keeping the other sets:\nintersection(fuzzy_set, sets = c(\"A\", \"B\"), name = \"D\", keep = TRUE) \n```\n\nAssuming that the fuzzy value is a probability, we can calculate which is the probability of having several elements:\n\n```{r prob}\n# A set could be empty\nset_size(fuzzy_set)\n# The more probable size of the sets:\nset_size(fuzzy_set) %\u003e% \n  group_by(sets) %\u003e% \n  filter(probability == max(probability))\n# Probability of belonging to several sets:\nelement_size(fuzzy_set)\n```\n\nWith fuzzy sets we can filter at certain levels (called alpha cut):\n\n```{r alphaCut}\nfuzzy_set %\u003e% \n  filter(fuzzy \u003e 0.5) %\u003e% \n  activate(\"sets\") %\u003e% \n  mutate(sets_origin = c(\"Reactome\", \"KEGG\"))\n```\n\n# Related packages {#related}\n\nThere are several other packages related to sets, which partially overlap with BaseSet functionality:\n\n - [`{sets}`]( https://CRAN.R-project.org/package=sets)  \n Implements a more generalized approach, that can store functions or lists as an element of a set (while BaseSet only allows to store a character or factor), but it is harder to operate in a tidy/long way. Also the operations of intersection and union need to happen between two different objects, while a single TidySet object (the class implemented in BaseSet) can store one or thousands of sets.\n\n - [`{GSEABase}`](https://bioconductor.org/packages/GSEABase/)  \n Implements a class to store sets and related information, but it doesn't allow to store fuzzy sets and it is also quite slow as it creates several classes for annotating each set. \n  \n - [`{BiocSet}`](https://bioconductor.org/packages/BiocSet/)  \n Implements a tidy class for sets but does not handle fuzzy sets. It also has less functionality to operate with sets, like power sets and cartesian product. BiocSet was influenced by the development of this package. \n\n - [`{hierarchicalSets}`](https://CRAN.R-project.org/package=hierarchicalSets)  \n This package is focused on clustering of sets that are inside other sets and visualizations. However, BaseSet is focused on storing and manipulate sets including hierarchical sets.\n \n - [`{set6}`](https://cran.r-project.org/package=set6)\n This package implements different classes for different type of sets including fuzzy sets, conditional sets. However, it doesn't handle information associated to elements, sets or relationship. \n \n# Why this package? {#why}\n\nOn bioinformatics when looking for the impact of an experiment enrichment methods are applied.\nThis involves obtaining several sets of genes from several resources and methods.\nUsually these curated sets of genes are taken at face value. \nHowever, there are several resources of sets and they [do not agree between them](https://doi.org/10.1186/1471-2105-14-112), regardless they are used without considering any uncertainty on sets composition. \n\n\nFuzzy theory has long studied sets whose elements have degrees of membership and/or uncertainty. \nTherefore one way to improve the methods involve using fuzzy methods and logic on this field. \nAs I couldn't find any package that provided methods for this I set on creating it (after trying to [expand](https://github.com/llrs/GSEAdv) the existing one I knew).\n\nThis package is intended to be easy to use for someone who is working with collections of sets but flexible about the methods and logic it can use. \nTo be consistent, the standard fuzzy logic is the default but it might not be the right one for your data. \nConsider changing the defaults to match with the framework the data was obtained with. \n\n# Code of Conduct {#CoC}\n\nPlease note that this package is released with a [Contributor\nCode of Conduct](https://ropensci.org/code-of-conduct/). \nBy contributing to this project, you agree to abide by its terms.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Fbaseset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fropensci%2Fbaseset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2Fbaseset/lists"}