Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/slowkow/hlabud

🐶 hlabud: HLA genotype analysis in R
https://github.com/slowkow/hlabud

bioinformatics genetics hla hla-database immunoinformatics immunology

Last synced: 19 days ago
JSON representation

🐶 hlabud: HLA genotype analysis in R

Awesome Lists containing this project

README

        

---
title: "hlabud"
output:
md_document:
standalone: true
toc: false
variant: "markdown_github"
html_document:
toc: false
self_contained: true
---

# hlabud

[![R-CMD-check](https://github.com/slowkow/hlabud/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/slowkow/hlabud/actions/workflows/R-CMD-check.yaml)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11093557.svg)](https://doi.org/10.5281/zenodo.11093557)

hlabud provides methods to retrieve sequence alignment data from [IMGTHLA] and convert the data into convenient R matrices ready for downstream analysis. See the [usage examples](https://slowkow.github.io/hlabud/articles/examples.html) to learn how to use the data with logistic regression and dimensionality reduction. We also share tips on how to [visualize the 3D molecular structure](https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html) of HLA proteins and highlight specific amino acid residues.

[IMGTHLA]: https://github.com/ANHIG/IMGTHLA

For example, let's consider a simple question about two HLA genotypes.

What amino acid positions are different between two genotypes?

```{r}
library(hlabud)
a <- hla_alignments("DRB1")
a$release
dosage(a$onehot, c("DRB1*03:01:05", "DRB1*03:02:03"))
```

What nucleotides are different?

```{r}
n <- hla_alignments("DRB1", type = "nuc")
n$release
dosage(n$onehot, c("DRB1*03:01:05", "DRB1*03:02:03"))
```

# Installation

The quickest way to get hlabud is to install from GitHub:

```{r, eval=FALSE}
# install.packages("devtools")
devtools::install_github("slowkow/hlabud")
```

# Examples

See the [usage examples](https://slowkow.github.io/hlabud/articles/examples.html) to get some ideas for how to use hlabud in your analyses.

- [Get a one-hot encoded matrix for all HLA-DRB1 alleles](https://slowkow.github.io/hlabud/articles/examples.html#get-a-one-hot-encoded-matrix-for-all-hla-drb1-alleles)

- [Convert genotypes to a dosage matrix](https://slowkow.github.io/hlabud/articles/examples.html#convert-genotypes-to-a-dosage-matrix)

- [Logistic regression association for amino acid positions](https://slowkow.github.io/hlabud/articles/examples.html#logistic-regression-association-for-amino-acid-positions)

- [UMAP embedding of 3,516 HLA-DRB1 alleles](https://slowkow.github.io/hlabud/articles/examples.html#umap-embedding-of-3516-hla-drb1-alleles)

- [Get HLA allele frequencies from Allele Frequency Net Database (AFND)](https://slowkow.github.io/hlabud/articles/examples.html#get-hla-allele-frequencies-from-allele-frequency-net-database-afnd)

- [Compute HLA divergence with the Grantham distance matrix](https://slowkow.github.io/hlabud/articles/examples.html#compute-hla-divergence-with-the-grantham-distance-matrix)

- [Download and unpack all data from the latest IMGTHLA release](https://slowkow.github.io/hlabud/articles/examples.html#download-and-unpack-all-data-from-the-latest-imgthla-release)












# Citation

`hlabud` provides access to the data in IMGT/HLA database. Therefore, if you use `hlabud` then please cite the IMGT/HLA paper:

- Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. [IPD-IMGT/HLA Database.](https://pubmed.ncbi.nlm.nih.gov/31667505/) Nucleic Acids Res. 2020;48: D948–D955. https://doi.org/10.1093/nar/gkz950

`hlabud` also provides access to the data in Allele Frequency Net Database (AFND). Therefore, if you use `hlabud::hla_frequencies()` then please cite the AFND paper:

- Gonzalez-Galarza FF, McCabe A, Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. [Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools.](https://pubmed.ncbi.nlm.nih.gov/31722398) Nucleic Acids Res. 2020;48: D783–D788. https://doi.org/10.1093/nar/gkz1029

Additionally, you can also cite the `hlabud` package like this:

- Slowikowski K. hlabud: HLA analysis in R. Zenodo. https://doi.org/10.5281/zenodo.11093557

# Related work

I recommend this article for anyone new to HLA, because the beautiful figures
help to build intuition:

- La Gruta NL, Gras S, Daley SR, Thomas PG, Rossjohn J. [Understanding the drivers of MHC restriction of T cell receptors.](https://pubmed.ncbi.nlm.nih.gov/29636542/) Nat Rev Immunol. 2018;18: 467–478.

Learn about the conventions for HLA nomenclature:

- Marsh SGE, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, et al. [Nomenclature for factors of the HLA system, 2010.](https://pubmed.ncbi.nlm.nih.gov/20356336/) Tissue Antigens. 2010;75: 291–455.

[HATK] is set of Python scripts for processing and analyzing IMGT-HLA data.
Here is the related article:

- Choi W, Luo Y, Raychaudhuri S, Han B. [HATK: HLA analysis toolkit](https://pubmed.ncbi.nlm.nih.gov/32735319). Bioinformatics. 2021;37: 416–418. doi:10.1093/bioinformatics/btaa684

[HATK]: https://github.com/WansonChoi/HATK

For case-control analysis of HLA genotype data, consider the
[BIGDAWG](https://CRAN.R-project.org/package=BIGDAWG) R package available on
CRAN. Here is the related article:

- Pappas DJ, Marin W, Hollenbach JA, Mack SJ. [Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): An integrated case-control analysis pipeline.](https://pubmed.ncbi.nlm.nih.gov/26708359) Hum Immunol. 2016;77: 283–287.

[HLAdivR] is another R package for calculating HLA divergence.

[HLAdivR]: https://github.com/rbentham/HLAdivR/