https://github.com/murrellgroup/chmmairraanalyses
https://github.com/murrellgroup/chmmairraanalyses
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/murrellgroup/chmmairraanalyses
- Owner: MurrellGroup
- Created: 2025-02-18T19:11:35.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-08-20T13:29:12.000Z (9 months ago)
- Last Synced: 2025-08-20T15:29:44.545Z (9 months ago)
- Language: Jupyter Notebook
- Size: 6.98 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Analysis notebooks and scripts for "Detection of PCR chimeras in adaptive immune receptor repertoire sequencing using hidden Markov models"
## Description
Notebooks and scripts used to produces the figures and tables in the CHMMAIRRa paper. The figures generated from simulated data require only the databases already included in the repository. The figures generated from real data require preprocessing the datasets with [IgDiscover](https://gitlab.com/gkhlab/igdiscover22) according to the instructions in the [IgDiscover_preprocessing](./IgDiscover_preprocessing/) folder.
## Dependencies
Julia 1.10.5 for running the notebooks.
- I recommend using [juliaup](https://github.com/JuliaLang/juliaup) to install Julia
- All Julia package dependencies are listed in the [Manifest.toml](Manifest.toml) file.
- Re-create the environment with the following commands in julia:
```
using Pkg; Pkg.activate("."); Pkg.instantiate()
```
[IgDiscover v1.0.4](https://gitlab.com/gkhlab/igdiscover22) for preprocessing the real datasets.
## Notebooks
- [PCR_conditions.ipynb](./notebooks/PCR_conditions.ipynb) : Plots this paper's PCR parameter modification dataset.
- [databases.ipynb](./notebooks/databases.ipynb) : Plots pairwise edit distances between database V alleles.
- [simulations.ipynb](./notebooks/simulations.ipynb) : Simulation of TRB and IGH VDJ datasets to produce ROCs.
- [recombinations.ipynb](./notebooks/recombinations.ipynb) : Plots recombination information from real datasets. Produces the heatmaps, recombination percentage scatterplots, and database subsampling scatterplots.
- [benchmark_speed.ipynb](./notebooks/benchmark_speed.ipynb) : Benchmarks the speed of CHMMAIRRa, USEARCH, and VSEARCH on real and simulated datasets.
- [lineages.ipynb](./notebooks/lineages.ipynb) : Plots lineage information from a real dataset.
- [summarize_seqcounts.ipynb](./notebooks/summarize_seqcounts.ipynb) : Gathers sequence count data from all datasets.
## Scripts
- [run_CHMMAIRRa.jl](./scripts/run_CHMMAIRRa.jl) : Runs CHMMAIRRa on all 5 real datasets in the paper (4 published and 1 new).
- [run_CHMMAIRRa_db_subsampling.jl](./scripts/run_CHMMAIRRa_db_subsampling.jl) : Runs CHMMAIRRa on specific real TCR and IGH libraries with subsampled databases.
- [run_benckmarks.jl](./scripts/run_benckmarks.jl) : Runs CHMMAIRRa and uchime on varying sizes of simulated IGH and TRB datasets.
## IgDiscover preprocessing
- [IgDiscover_preprocessing](./IgDiscover_preprocessing/) : This folder contains instructions for preprocessing the real datasets with IgDiscover. One .md file for each of 5 datasets. Also contains descriptions of where to find the raw fastq data.