Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/slowkow/hlabud
🐶 hlabud: HLA genotype analysis in R
https://github.com/slowkow/hlabud
bioinformatics genetics hla hla-database immunoinformatics immunology
Last synced: 19 days ago
JSON representation
🐶 hlabud: HLA genotype analysis in R
- Host: GitHub
- URL: https://github.com/slowkow/hlabud
- Owner: slowkow
- License: gpl-3.0
- Created: 2023-04-05T19:51:58.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-05T18:50:41.000Z (7 months ago)
- Last Synced: 2024-10-16T04:08:07.658Z (2 months ago)
- Topics: bioinformatics, genetics, hla, hla-database, immunoinformatics, immunology
- Language: TeX
- Homepage: https://slowkow.github.io/hlabud/
- Size: 25.8 MB
- Stars: 16
- Watchers: 3
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
title: "hlabud"
output:
md_document:
standalone: true
toc: false
variant: "markdown_github"
html_document:
toc: false
self_contained: true
---# hlabud
[![R-CMD-check](https://github.com/slowkow/hlabud/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/slowkow/hlabud/actions/workflows/R-CMD-check.yaml)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.11093557.svg)](https://doi.org/10.5281/zenodo.11093557)hlabud provides methods to retrieve sequence alignment data from [IMGTHLA] and convert the data into convenient R matrices ready for downstream analysis. See the [usage examples](https://slowkow.github.io/hlabud/articles/examples.html) to learn how to use the data with logistic regression and dimensionality reduction. We also share tips on how to [visualize the 3D molecular structure](https://slowkow.github.io/hlabud/articles/visualize-hla-structure.html) of HLA proteins and highlight specific amino acid residues.
[IMGTHLA]: https://github.com/ANHIG/IMGTHLA
For example, let's consider a simple question about two HLA genotypes.
What amino acid positions are different between two genotypes?
```{r}
library(hlabud)
a <- hla_alignments("DRB1")
a$release
dosage(a$onehot, c("DRB1*03:01:05", "DRB1*03:02:03"))
```What nucleotides are different?
```{r}
n <- hla_alignments("DRB1", type = "nuc")
n$release
dosage(n$onehot, c("DRB1*03:01:05", "DRB1*03:02:03"))
```# Installation
The quickest way to get hlabud is to install from GitHub:
```{r, eval=FALSE}
# install.packages("devtools")
devtools::install_github("slowkow/hlabud")
```# Examples
See the [usage examples](https://slowkow.github.io/hlabud/articles/examples.html) to get some ideas for how to use hlabud in your analyses.
- [Get a one-hot encoded matrix for all HLA-DRB1 alleles](https://slowkow.github.io/hlabud/articles/examples.html#get-a-one-hot-encoded-matrix-for-all-hla-drb1-alleles)
- [Convert genotypes to a dosage matrix](https://slowkow.github.io/hlabud/articles/examples.html#convert-genotypes-to-a-dosage-matrix)
- [Logistic regression association for amino acid positions](https://slowkow.github.io/hlabud/articles/examples.html#logistic-regression-association-for-amino-acid-positions)
- [UMAP embedding of 3,516 HLA-DRB1 alleles](https://slowkow.github.io/hlabud/articles/examples.html#umap-embedding-of-3516-hla-drb1-alleles)
- [Get HLA allele frequencies from Allele Frequency Net Database (AFND)](https://slowkow.github.io/hlabud/articles/examples.html#get-hla-allele-frequencies-from-allele-frequency-net-database-afnd)
- [Compute HLA divergence with the Grantham distance matrix](https://slowkow.github.io/hlabud/articles/examples.html#compute-hla-divergence-with-the-grantham-distance-matrix)
- [Download and unpack all data from the latest IMGTHLA release](https://slowkow.github.io/hlabud/articles/examples.html#download-and-unpack-all-data-from-the-latest-imgthla-release)
# Citation
`hlabud` provides access to the data in IMGT/HLA database. Therefore, if you use `hlabud` then please cite the IMGT/HLA paper:
- Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. [IPD-IMGT/HLA Database.](https://pubmed.ncbi.nlm.nih.gov/31667505/) Nucleic Acids Res. 2020;48: D948–D955. https://doi.org/10.1093/nar/gkz950
`hlabud` also provides access to the data in Allele Frequency Net Database (AFND). Therefore, if you use `hlabud::hla_frequencies()` then please cite the AFND paper:
- Gonzalez-Galarza FF, McCabe A, Santos EJMD, Jones J, Takeshita L, Ortega-Rivera ND, et al. [Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools.](https://pubmed.ncbi.nlm.nih.gov/31722398) Nucleic Acids Res. 2020;48: D783–D788. https://doi.org/10.1093/nar/gkz1029
Additionally, you can also cite the `hlabud` package like this:
- Slowikowski K. hlabud: HLA analysis in R. Zenodo. https://doi.org/10.5281/zenodo.11093557
# Related work
I recommend this article for anyone new to HLA, because the beautiful figures
help to build intuition:- La Gruta NL, Gras S, Daley SR, Thomas PG, Rossjohn J. [Understanding the drivers of MHC restriction of T cell receptors.](https://pubmed.ncbi.nlm.nih.gov/29636542/) Nat Rev Immunol. 2018;18: 467–478.
Learn about the conventions for HLA nomenclature:
- Marsh SGE, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, et al. [Nomenclature for factors of the HLA system, 2010.](https://pubmed.ncbi.nlm.nih.gov/20356336/) Tissue Antigens. 2010;75: 291–455.
[HATK] is set of Python scripts for processing and analyzing IMGT-HLA data.
Here is the related article:- Choi W, Luo Y, Raychaudhuri S, Han B. [HATK: HLA analysis toolkit](https://pubmed.ncbi.nlm.nih.gov/32735319). Bioinformatics. 2021;37: 416–418. doi:10.1093/bioinformatics/btaa684
[HATK]: https://github.com/WansonChoi/HATK
For case-control analysis of HLA genotype data, consider the
[BIGDAWG](https://CRAN.R-project.org/package=BIGDAWG) R package available on
CRAN. Here is the related article:- Pappas DJ, Marin W, Hollenbach JA, Mack SJ. [Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): An integrated case-control analysis pipeline.](https://pubmed.ncbi.nlm.nih.gov/26708359) Hum Immunol. 2016;77: 283–287.
[HLAdivR] is another R package for calculating HLA divergence.
[HLAdivR]: https://github.com/rbentham/HLAdivR/