{"id":27948258,"url":"https://github.com/const-ae/treelabel","last_synced_at":"2025-05-07T14:57:13.074Z","repository":{"id":272955891,"uuid":"918280038","full_name":"const-ae/treelabel","owner":"const-ae","description":"Store and work with labels that exist in hierarchical relationships","archived":false,"fork":false,"pushed_at":"2025-04-02T13:52:34.000Z","size":1163,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-07T14:57:06.865Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/const-ae.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-17T15:57:22.000Z","updated_at":"2025-04-02T13:52:38.000Z","dependencies_parsed_at":"2025-01-17T17:33:06.188Z","dependency_job_id":"f653b46e-4b9c-43a7-bbdb-2ca5e059866c","html_url":"https://github.com/const-ae/treelabel","commit_stats":null,"previous_names":["const-ae/treelabel"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Ftreelabel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Ftreelabel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Ftreelabel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/const-ae%2Ftreelabel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/const-ae","download_url":"https://codeload.github.com/const-ae/treelabel/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252902635,"owners_count":21822257,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-07T14:57:12.495Z","updated_at":"2025-05-07T14:57:13.058Z","avatar_url":"https://github.com/const-ae.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\noptions(width = 100)\n```\n\n# treelabel\n\n\u003c!-- badges: start --\u003e\n\u003c!-- badges: end --\u003e\n\nThe goal of treelabel is to store and work with labels that exist in a hierarchical relationship. \n\nThis is an alpha software release: feel free to play around with the code, and please provide feedback, but expect breaking changes to the API!\n\n## Motivation\n\n![](man/figures/celltype_tree.png)\n\nI work on single-cell RNA-seq data with gene expression profiles for thousands of cells. A common first step is to annotate each cell's *cell type*. The granularity of these cell type annotations can vary; one can classify cells broadly into *immune cells* or *epithelial cells* or one can be very detailed and distinguish within the immune cells *CD4 positive T regulatory cells* from *CD4 positive T follicular helper cells*. Choosing the best annotation level is difficult because one analysis may need broad cell types, whereas others require the highest possible resolution. The `treelabel` package provides an intuitive interface to store and work with these hierarchically related labels. \n\nDepending on the reference data and annotation method used for the cell typing, you often have multiple (partially) conflicting annotations. `treelabel` provides functions to build a consensus across different annotations and can integrate annotations at different resolutions. Furthermore, `treelabel` supports uncertainty scores associated with a label. For example, most automatic cell type scoring tools (like [Azimuth](https://azimuth.hubmapconsortium.org/) or [celltypist](https://www.celltypist.org/)), return a confidence score in addition to the cell type label. These scores enable a more precise selection of cells where you have sufficient confidence in the cell type label.\n\n## What this package is. And what it is not,\n\nThis package is purposefully kept generic and only makes the following assumptions:\n\n- Your labels have a tree-like relationship: the edges between the labels are directed, and there are no cycles.\n- The relation between a parent and a child can phrased as *is a*. For example, a *`T cell` is a `Immune cell`*. \n- The scores can be logical or non-negative numbers.\n\nThis package does not provide any functionality to:\n\n- Assign cell types to cells based on the expression profile. Use any of the many available automatic cell type scoring tools (see for [this](https://github.com/seandavi/awesome-single-cell?tab=readme-ov-file#cell-type-identification-and-classification) list for some suggestions) or do it manually using clustering plus marker gene expression.\n- Automatically harmonize cell type labels from different references (e.g., figure out that the `NK cells` from one dataset correspond to the `Natural killer cells` from another). You have to do this manually. There is an example further down in the README.\n- Provide the optimal cell type tree. You will probably want to define the tree for your analysis depending on the annotations available to you. As a reference, look at the [cell ontology project](https://cell-ontology.github.io/), which provides a large database of cell type label relationships and is used by the Human Cell Atlas.\n- Plot trees. For demonstration purposes, we will use the `igraph` plots (which are not very pretty), and for the plot on the top, I used the [D3](https://d3js.org/) library from Javascript (which is cumbersome to use from R). See the end of the README for an example how to make pretty plots of trees with ggplot2.\n\n## Installation\n\nYou can install the development version of `treelabel` like this:\n\n``` r\ndevtools::install_github(\"const-ae/treelabel\")\n```\n\n# Documentation\n\n`treelabel` is build to be directly compatible with the `tidyverse`. \n\n```{r, eval=FALSE}\nlibrary(treelabel)\nlibrary(tidyverse)\n```\n```{r, include=FALSE}\ndevtools::load_all(\".\")\nlibrary(tidyverse)\noptions(\n  pillar.print_max = 5,\n  pillar.print_min = 5\n)\n```\n\n## Motivating example\n\nI will demonstrate a typical single cell analysis workflow that takes a hierarchical set of labels, stores them in a `treelabel` vector, and analyzes the abundance changes of cell types.\n\nI will illustrate the process using the \"pbmcsca\" dataset from Seurat and score each cell using Azimuth. Note that `treelabel` is compatible with any data storage format (e.g., `SingleCellExperiment` or `Seurat`) and can handle both manual cell type labels based on clustering or automated cell scores produced by, for example, Azimuth or Celltypist.\n\n```{r message=FALSE, warning=FALSE, paged.print=FALSE, results='hide'}\n# This can take a minute to run through\nlibrary(Seurat)\n# Might need to call `SeuratData::InstallData(\"pbmcsca\")` first\npbmcsca \u003c- SeuratData::LoadData(\"pbmcsca\")\nazimuth_res \u003c- Azimuth::RunAzimuth(pbmcsca, reference = \"pbmcref\")\n# Select most important columns to make the output easier to read.\nazimuth_res@meta.data \u003c- select(azimuth_res@meta.data, c(\"orig.ident\", \"Experiment\", \"Method\", starts_with(\"predicted.celltype.\")))\n```\n\nTake a look at the Azimuth output. It contains six new columns that all start with \"predicted.celltype\". These columns contain the labels and confidence scores from the automated mapping. These will serve as the input to `treelabel` which turns the six columns into one!\n\n```{r, paged.print=FALSE}\nazimuth_res@meta.data |\u003e\n  as_tibble(rownames = \"cell_id\") \n```\n\nTo create the `treelabel` vector, we need to define our cell type hierarchy. We use the `igraph` packages for this.\n\n```{r, paged.print=FALSE}\n# Define the cell type label hierarchy\npbmcsca_tree \u003c- igraph::graph_from_literal(\n  root - NK : `T cell` : Mono : DC : B,\n  `T cell` - `CD8 T` : `CD4 T`,\n  `CD4 T` - CTL : `CD4 Naive` : `CD4 TCM` : `CD4 TEM` : Treg,\n  `CD8 T` - `CD8 Naive` : `CD8 TCM` : `CD8 TEM`,\n  DC - cDC1 : cDC2 : pDC,\n  Mono - `CD14 Mono` : `CD16 Mono`\n)\n\n# Convert the Azimuth result into a treelabel vector: \n# * We first append '.label' to the Azimuth column to make the `pivot_longer` simpler.\n# * We then filter the cell types to the ones we list in the `pbmcsca_tree`.\n# * Lastly, we convert the label and score column into a treelabel vector.\ncelltype_annotations \u003c- azimuth_res@meta.data |\u003e\n  as_tibble(rownames = \"cell_id\") |\u003e\n  dplyr::rename_with(.cols = matches(\"^predicted.celltype.l\\\\d$\"), \\(x) paste0(x, \".label\")) |\u003e\n  pivot_longer(starts_with(\"predicted.celltype\"), names_prefix = \"predicted\\\\.celltype\\\\.\", names_sep = \"\\\\.\" ,names_to = c(\"level\", \".value\")) |\u003e\n  filter(label %in% igraph::V(pbmcsca_tree)$name) |\u003e\n  treelabel_from_dataframe(pbmcsca_tree, id = \"cell_id\", label = \"label\", score = \"score\", name = \"azimuth_celltypes\")\n```\n\nThe `azimuth_celltypes` column is an S3 vector (build ontop of the `vctrs` package) which works a bit like a `factor`. Each entry contains the full information about the hierarchical cell type labels. By default, `treelabel` prints the most precise cell type label that is available, but the information about the other levels is still accessible\n\n```{r, paged.print=FALSE}\nvec \u003c- head(celltype_annotations$azimuth_celltypes, n = 8)\nvec\n\n# The fourth element is a T cell, a CD8 T, and a CD8 TEM!\nvec[4]\ntl_eval(vec[4], `T cell`)\ntl_eval(vec[4], `CD8 T`)\ntl_eval(vec[4], `CD8 TEM`)\n```\n\nWe can join the `celltype_annotations` with the full meta data to replace the old `predicted.celltype` column with our new `treelabel`.\n\n```{r, paged.print=FALSE}\nmeta_data \u003c- azimuth_res@meta.data |\u003e\n  as_tibble(rownames = \"cell_id\") |\u003e\n  dplyr::select(- starts_with(\"predicted.celltype\")) |\u003e\n  left_join(celltype_annotations, by = \"cell_id\")\n\nmeta_data\n```\n\nThe `pbmcsca` contains results from ten different sequencing experiments. We can for example count how often each cell type was seen per method\n\n```{r, paged.print=FALSE}\n# Summing the confidence scores does not exactly give you the counts\nmeta_data |\u003e\n  summarize(as_tibble(tl_score_matrix(sum(azimuth_celltypes, na.rm=TRUE))), .by = c(Method)) \n\n# Instead, you can say that only labels where the score exceeds a thresholds count.\nmeta_data |\u003e\n  mutate(azimuth_celltypes = tl_modify(azimuth_celltypes, .scores \u003e 0.8)) |\u003e\n  summarize(as_tibble(tl_score_matrix(sum(azimuth_celltypes, na.rm=TRUE))), .by = c(Method))\n```\n\nThe `test_abundance_changes` function makes it easy to test if the number of cells of a cell type changes between conditions. Importantly, you need to have multiple independent replicates. The `pbmcsca` unfortunately does not have that, so I will just simulate random patient IDs to demonstrate how the function works.\n\n```{r, paged.print=FALSE}\n# Apply threshold and make Experiment a factor\ninput_dat \u003c- meta_data |\u003e\n  mutate(Experiment = as.factor(Experiment)) |\u003e\n  mutate(patient_id = sample(paste0(\"sample_\", 1:5), size = n(), replace = TRUE)) |\u003e\n  mutate(azimuth_celltypes = tl_modify(azimuth_celltypes, .scores \u003e 0.8)) \n\n# The function takes many arguments. See `?test_abundance_changes` for all details\ntest_abundance_changes(input_dat, design = ~ Experiment, aggregate_by = patient_id) \n\n# We can also run `test_abundance_changes` inside dplyr::reframe (an alternative to `summarize`)\n# and calculate the abundance changes for each Method separately.\n# Setting `reference = `T cell` will test if the number of T cell subtypes changes as a proportion\n# of all T cells.\ninput_dat |\u003e\n  reframe(test_abundance_changes(data = across(everything()), design = ~ Experiment, aggregate_by = patient_id, \n                                 reference = `T cell`, contrast = cond(Experiment = 'pbmc2') - cond(Experiment = 'pbmc1')),\n          .by = Method) \n```\n\n### Compatibility with Bioconductor and Seurat\n\n`treelabel` works directly with the BioConductor data structures `SingleCellExperiment`, `SummarizedExperiment`, and `DataFrame`.\n\n```{r}\n# Load an example SingleCellExperiment object\nsuppressMessages({\n  sce \u003c- ExperimentHub::ExperimentHub()[[\"EH2259\"]]\n})\n# Make a simple tree with only one level\nkang_tree \u003c- igraph::graph_from_edgelist(cbind(\"root\", levels(sce$cell)))\n# Add treelabel column to colData\ncolData(sce)$treelabel \u003c- treelabel(sce$cell, kang_tree)\ncolData(sce)\n```\n\nThe `treelabel` vectors can also be used with Seurat data. Here, we match the provided annotations to the names from the `pbmcsca_tree`.\n\n\n```{r eval=FALSE, include=TRUE}\n# Load pbmcsca again\nlibrary(Seurat)\npbmcsca \u003c- SeuratData::LoadData(\"pbmcsca\")\n```\n\n```{r}\n# We will re-use the `pbmcsca_tree` from above. The provided annotations in pbmcsca$CellType\n# are in a slightly different format, so we manually convert them.\nrename_pbmcsca_celltypes \u003c- c(\n  \"B cell\" = \"B\", \"CD14+ monocyte\" = \"CD14 Mono\", \"CD16+ monocyte\" = \"CD16 Mono\",\n  \"CD4+ T cell\" = \"CD4 T\", \"Cytotoxic T cell\" = \"CD8 T\",  \"Dendritic cell\" = \"DC\",\n  \"Megakaryocyte\" = \"Mono\", \"Natural killer cell\" = \"NK\", \n  \"Plasmacytoid dendritic cell\" = \"pDC\", \"Unassigned\" = \"root\"\n)\n\npbmcsca@meta.data$tl_manual \u003c- treelabel(rename_pbmcsca_celltypes[pbmcsca$CellType], pbmcsca_tree)\npbmcsca@meta.data[1:5,c(\"orig.ident\", \"CellType\", \"tl_manual\")]\n```\n\n\n## Technical documentation\n\nWe define our label hierarchy using [`igraph`](https://r.igraph.org/articles/igraph.html).\n\n```{r tree_plot}\ntree \u003c- igraph::graph_from_literal(\n  root - ImmuneCell : EndothelialCell : EpithelialCell,\n  ImmuneCell - TCell : BCell,\n  TCell - CD4_TCell : CD8_TCell\n)\nplot(tree, layout = igraph::layout_as_tree(tree, root = \"root\"),\n     vertex.size = 40, vertex.label.cex = 0.6)\n```\n\n### Constructors\n\nThe easiest way to make a `treelabel` vector is to make one from a character vector. You call the `treelabel` constructor and provide the labels and the reference tree\n\n```{r}\nchar_vec \u003c- c(\"BCell\", \"EndothelialCell\", \"CD4_TCell\", NA, \"BCell\", \"EpithelialCell\", \"ImmuneCell\")\nvec \u003c- treelabel(char_vec, tree = tree)\nvec\n```\n\n\nIf you have some uncertainty associated with each label, you can also use a named `numeric` vector to make a `treelabel` vector.\n\n```{r}\nnum_vec \u003c- c(\"BCell\" = 0.99, \"EndothelialCell\" = 0.6, \"CD4_TCell\" = 0.8, NA, \"BCell\" = 0.78, \"EpithelialCell\" = 0.9, \"ImmuneCell\" = 0.4)\nvec \u003c- treelabel(num_vec, tree = tree)\nvec\n```\n\n\nSome tools provide the confidence scores for each vertex in the tree. In this case, you can provide the annotations as a `list ` or a `data.frame`\n\n```{r}\nlst \u003c- list(\n  c(BCell = 0.99, ImmuneCell = 1),\n  c(root = 1, EndothelialCell = 0.65),\n  c(CD4_TCell = 0.8, TCell = 0.95, ImmuneCell = 0.95),\n  NULL, # will be treated as NA\n  c(ImmuneCell = 0.4)\n)\n\nvec \u003c- treelabel(lst, tree)\nvec\n```\n\nLastly, you can convert a \"tidy\" data frame to a treelabel. The `treelabel_from_dataframe` works differently from the other constructors, as it returns a `data.frame` with an ID column and a `treelabel` column. The function cannot directly return a `treelabel` vector because the order of the rows in the data.frame could be scrambled, in which case it is unclear how cells and elements in the treelabel relate.\n\n```{r}\ndf \u003c- data.frame(\n  cell_id = c(\"cell 1\", \"cell 1\", \"cell 2\", \"cell 3\", \"cell 3\", \"cell 3\"),\n  annot = c(\"BCell\", \"ImmuneCell\", NA, \"TCell\", \"CD4_TCell\", \"ImmuneCell\"),\n  confidence = c(0.99, 1, NA, 0.95, 0.8, 0.95)\n)\ndf \u003c- treelabel_from_dataframe(df, tree, id = \"cell_id\", label = \"annot\", score = \"confidence\")\ndf\n```\n\n\n### Working with the `treelabel` vector\n\nThe `treelabel` vectors can be indexed or concatenated like any regular R vector:\n\n```{r}\nvec\nlength(vec)\nvec[2]\nvec[1:4]\nc(vec, vec[1:3])\n```\n\nYou can extract the tree from a `treelabel` and the name of the tree root.\n\n```{r}\ntl_tree(vec)\ntl_tree_root(vec)\n```\n\nThe easiest way to get the score for a particular label inside a `treelabel` vector is to use `$`\n\n```{r}\nvec$ImmuneCell\nvec$CD4_TCell\n```\n\n### Testing the identity\n\nThe printing function builds on the `tl_name`, which returns the vertex furthest from the root that is not `NA`. We can change this threshold. For example, for the third cell the *CD4_TCell* label does not pass the `0.9` threshold, but the *TCell* label does.\n\n```{r, paged.print=FALSE}\ntibble(vec, tl_name(vec), tl_name(vec, threshold = 0.9))\n```\n\nYou can also evaluate arbitrary expressions using `tl_eval`.\n\n```{r, paged.print=FALSE}\ntibble(vec) |\u003e mutate(is_tcell = tl_eval(vec, TCell \u003e 0.9))\n```\n\n`treelabel` is clever about evaluating these expressions. If, for example, we ask if the cell might be a T cell (i.e., `TCell \u003e 0.2`), the second and fifth entries switch from `FALSE` to `NA`.\n\n```{r, paged.print=FALSE}\ntibble(vec) |\u003e mutate(maybe_tcell = tl_eval(vec, TCell \u003e 0.2))\n```\n\nTo understand why, let's look at how `treelabel` internally stores the data.  Internally, the scores are stored as a matrix with one column for each label, and the scores that were not specified are stored as `NA`.\n\n```{r}\ntl_score_matrix(vec)\n```\n\nFor each missing element, we can give a lower and upper bound for the value. For the fifth element the confidence that it is an `ImmuneCell` is `tl_get(vec[5], \"ImmuneCell\")` = `r tl_get(vec[5], \"ImmuneCell\")`. This means that each child can also be at most `0.4`.\n\nThe general formula is that the score for a vertex `v` that is `NA` can be at most (in pseudocode): `max(0, score(parent(v)) - sum(children(parent(v)), na.rm=TRUE))`.\n\n```{r}\n# tl_atmost is clever\ntl_atmost(vec) |\u003e tl_score_matrix()\n# tl_atleast simply replaces `NA`'s with zeros\ntl_atleast(vec) |\u003e tl_score_matrix()\n```\n\nThe `tl_eval` function evaluates its arguments for `tl_atmost(x)` and `tl_atleast(x)`. If the results agree, that value is returned; if not, `tl_eval` returns `NA`. A word of caution: this function can give surprising results if multiple label references occur in the expression.\n\n```{r}\nt1 \u003c- treelabel(list(c(\"TCell\" = 0.8)), tree)\n# Ideally both function calls would return `NA`\ntl_eval(t1, CD4_TCell \u003e CD8_TCell) \ntl_eval(t1, CD4_TCell \u003c CD8_TCell) \n```\n\n\n### Arithmetic\n\nYou can combine two vectors or summarize across elements. You can do whatever calculations you want, and `treelabel` will try to make the right thing happen. You can also do problematic things like produce negative values. `treelabel` currently does not stop you, but this breaks one of the assumptions of `treelabel`.\n\n```{r, paged.print=FALSE}\nvec2 \u003c- treelabel(c(\"BCell\" = 0.8, \"EpithelialCell\" = 0.3, \"TCell\" = 0.9, \"CD8_TCell\" = 0.2, \"TCell\" = 0.8), tree)\ntibble(vec, vec2) |\u003e\n  mutate(arithmetic_mean = (vec + vec2) / 2,\n         geometric_mean = (vec * vec2)^(1/2),\n         rounding = round(vec))\n```\n\n### Modification\n\nYou can modify the elements of a `treelabel` vector. The easiest is way is to use `if_else` (note `ifelse` does not work!!) and mix the content of two vectors. Alternatively, you can set elements to `NA`.\n\n```{r}\nhigh_quality_res \u003c- c(TRUE, FALSE, FALSE, FALSE, TRUE)\n# Combine two vectors or set one to 'NA'\nif_else(high_quality_res, vec, vec2)\nif_else(high_quality_res, vec, NA)\n```\n\nIf you want to modify the content within a tree, that is change the value of individual vertices, you can use the `tl_modify` function.\n\n```{r}\n# The effect of tl_modify is best understood by considering the underlying score matrix\ntl_score_matrix(vec)[,1:3]\ntl_score_matrix(tl_modify(vec, ImmuneCell = 0.3))[,1:3]\ntl_score_matrix(tl_modify(vec, ImmuneCell = ImmuneCell / 3))[,1:3]\ntl_score_matrix(tl_modify(vec, ImmuneCell = root - ImmuneCell/3))[,1:3]\ntl_score_matrix(tl_modify(vec, ImmuneCell = NA, .propagate_NAs_down = TRUE))\ntl_score_matrix(tl_modify(vec, ImmuneCell = NA, .propagate_NAs_down = FALSE))\n```\n\n#### Tree modifications\n\nSometimes you don't want to change the values within the tree, but change the tree structure or only work on a selected branch. The `tl_tree_modify` allows you to set a completely new tree structure and only retain values for vertices that occurr in both the new and old tree.\n\n```{r}\nsubtree \u003c- igraph::graph_from_literal(\n  root - CD4_TCell : CD8_TCell : EndothelialCell : EpithelialCell\n)\n\ntl_score_matrix(vec)\ntl_tree_modify(vec, subtree) |\u003e tl_score_matrix()\n```\n\nSometimes, you only want to work on a single branch of the tree. You can do this using the `tl_tree_filter` and `tl_tree_cut` functions.\n\n```{r}\n# Select all T cells\ntl_tree_cut(vec, new_root = \"TCell\")\n\n# This does the same, but leaves the old root\ntl_tree_filter(vec, \\(names) grepl(\"TCell\", names))\n```\n\n\n\n### Consensus construction\n\n`treelabel` provides functions to make it easy to apply expression across `treelabel` columns. These functions are built on top of [`dplyr::across`](https://dplyr.tidyverse.org/reference/across.html). They take as the first argument a specification of columns (e.g., `where(is_treelabel)` or `starts_with(\"label_\")`). The second argument is evaluated internally with `tl_eval`.\n\n```{r, paged.print=FALSE}\ndat \u003c- tibble(cell_id = paste0(\"cell_\", 1:5), vec, vec2)\n\ndat |\u003e mutate(is_immune = tl_across(where(is_treelabel), ImmuneCell \u003e 0.7))\ndat |\u003e mutate(immune_counts = tl_sum_across(where(is_treelabel), ImmuneCell \u003e 0.7))\ndat |\u003e mutate(mean_immune_score = tl_mean_across(where(is_treelabel), ImmuneCell))\ndat |\u003e filter(tl_if_all(where(is_treelabel), ImmuneCell \u003e 0.7))\n```\n\nIn addition, we can also work on the whole matrix of values per tree label and combine them.\n\n```{r, paged.print=FALSE}\ndat |\u003e  mutate(consensus = tl_mean_across(c(vec,vec2)))\n\ndat |\u003e \n  mutate(across(c(vec, vec2), \\(x) tl_modify(x, .scores \u003e 0.5))) |\u003e\n  mutate(consensus = tl_sum_across(c(vec,vec2)))\n```\n\n\n\n### Pretty plotting\n\nThe following visualization is inspired by the default tree visualization in D3.\n\n```{r}\n#' Calculate layout of tree using igraph and return results as two tibbles.\nprepare_tree_for_plotting \u003c- function(tree, tree_root = \"root\"){\n  tree \u003c- .make_tree(tree, root = tree_root)\n  \n  layout \u003c- igraph::layout_as_tree(tree, root = tree_root)\n  \n  children \u003c- lapply(igraph::V(tree), \\(v){\n    igraph::neighbors(tree, v, mode = \"out\")$name\n  })\n  \n  vertices \u003c- igraph::V(tree)$name\n  nodes \u003c- tibble(node = vertices,\n         distance_to_root = max(layout[,2]) - layout[,2],\n         position = layout[,1],\n         is_leaf = vapply(children, \\(x) length(x) == 0, FUN.VALUE = logical(1L)))\n  \n  edges \u003c- edges \u003c- tibble(node = vertices,\n       child = unname(children)) |\u003e\n    unnest(child) |\u003e\n    left_join(nodes, by = c(\"node\" = \"node\")) |\u003e\n    left_join(nodes, by = c(\"child\" = \"node\"), suffix = c(\".node\", \".child\"))\n\n  list(nodes = nodes, edges = edges)\n}\n```\n\nMake the plot. The [ggbezier](https://github.com/const-ae/ggbezier) is not on CRAN yet.\n\n```{r ggplot_code, fig.height=2}\npl_tree \u003c- prepare_tree_for_plotting(tree)\n\nggplot(data = pl_tree$nodes, aes(x = distance_to_root, y = position)) +\n  ggbezier::geom_bezier(data = pl_tree$edges |\u003e  pivot_longer(c(ends_with(\".node\"), ends_with(\".child\")), names_sep = \"\\\\.\", names_to = c(\".value\", \"side\")),\n                        aes(x = distance_to_root, y = position, x_handle1 = distance_to_root - 0.4, \n                            x_handle2 = distance_to_root + 0.4, y_handle1 = position, y_handle2 = position, group = paste0(node, \"-\", child)),\n                        show_handles = FALSE, color = \"lightgrey\", linewidth = 0.3) +\n  geom_point(aes(color = I(ifelse(is_leaf, \"lightgrey\", \"#4e4e4e\")))) +\n  shadowtext::geom_shadowtext(aes(label = node, hjust = ifelse(is_leaf, 0, 1), x = distance_to_root + ifelse(is_leaf, 0.03, -0.03)), \n                              color = \"black\", bg.colour = \"white\") +\n  scale_x_continuous(expand = expansion(add = c(0.5, 0.9))) +\n  theme_void()\n```\n\n\n\n## Session Info\n\n```{r}\nsessionInfo()\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconst-ae%2Ftreelabel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconst-ae%2Ftreelabel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconst-ae%2Ftreelabel/lists"}