Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/StephenHwang/MEMO
MEMO: MEM-based pangenome indexing for k-mer queries
https://github.com/StephenHwang/MEMO
Last synced: 3 months ago
JSON representation
MEMO: MEM-based pangenome indexing for k-mer queries
- Host: GitHub
- URL: https://github.com/StephenHwang/MEMO
- Owner: StephenHwang
- License: mit
- Created: 2023-04-05T22:13:34.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-06-11T23:28:05.000Z (5 months ago)
- Last Synced: 2024-06-12T03:21:48.725Z (5 months ago)
- Language: Python
- Homepage:
- Size: 12.4 MB
- Stars: 10
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-pangenomes - MEMO
README
# MEMO: MEM-based pangenome indexing for _k_-mer queries ![GitHub release (latest by date)](https://img.shields.io/github/v/release/StephenHwang/MEMO) ![GitHub](https://img.shields.io/github/license/StephenHwang/MEMO?color=green)
Maximal Exact Match Ordered (MEMO) is a pangenome indexing method based on maximal exact matches (MEMs) between genomes. A single MEMO index can handle arbitrary-length-_k_ _k_-mer queries over pangenomic windows. MEMO performs membership queries for per-genome _k_-mer presence/absence and conservation queries for the number of genomes containing the _k_-mers in a window. MEMO achieves smaller index sizes and faster queries than _k_-mer-based approaches like KMC3 and PanKmer. See the small example here on running MEMO for visualizing sequence conservation.
## Installation
### Docker/Singularity Container
MEMO is available as a Docker image on DockerHub.
```sh
### Docker:
docker pull hwangstephen/memo:latest
docker run hwangstephen/memo:latest memo -h
### Singularity:
singularity pull docker://hwangstephen/memo:latest
./memo_latest.sif memo -h
```### Build from source
MEMO relies on the following dependencies:
- Python:
- python (>=3.10)
- pandas
- plotnine
- pyarrow
- numba
- numpy
- Others:
- MONI
- samtools
- seqtkCompile MONI from repo:
```
sudo apt-get install -y build-essential cmake git python3 zlib1g-dev
git clone https://github.com/maxrossi91/moni
mkdir build
cd build
cmake ..
make
make install
```After downloading/building the required dependencies, clone and run MEMO from its repo:
```
git clone https://github.com/StephenHwang/MEMO.git
cd MEMO/src
./memo -h
```## Usage
### Index Creation
To create a MEMO conservation index, specify a list of genomes `-g` and an output location `-o` and prefix `-p`. To create the MEMO membership index, include the `-m` flag.
Each line in the `genome_list.txt` is the path to each genome in the pangenome; the first genome listed is the pangenome pivot.
```sh
./memo index \
-g genome_list.txt \
-o output_dir \
-p output_prefix
```### Querying k-mer membership and conservation
Once you have created your indexes, specify your length-_k_ `k`, genomic region `-r`, and the total number of genomes in your genome (inclusive of pivot) `-n`. Then run `memo query` for the conservation query. To run the membership query, include the `-m` flag.
```sh
./memo query \
-b index.parquet \
-k k \
-n num_genomes \
-r chr:start-end \
-o memo_c_out.txt
```### Visualizing sequence conservation
31-mer sequence conservation of the Human Leucocyte Antigen locus in the HPRC pangenome.
After the conservation query, use MEMO to visualize sequence conservation:
```sh
./memo view \
-i memo_c_out.txt \
-o out.png \
-n num_genomes \
-b num_bins
```## Citing MEMO
>Stephen Hwang, Nathaniel K. Brown, Omar Y. Ahmed, Katharine M. Jenike, Sam Kovaka, Michael C. Schatz, Ben Langmead. MEM-based pangenome indexing for k-mer queries (2024). bioRxiv.