An open API service indexing awesome lists of open source software.

https://github.com/pinellolab/crisprapido

use WFA2 to scan for CRISPR guide targets
https://github.com/pinellolab/crisprapido

Last synced: 11 months ago
JSON representation

use WFA2 to scan for CRISPR guide targets

Awesome Lists containing this project

README

          

# CRISPRapido

![CRISPRapido Logo](crisprapido.png)

CRISPRapido is a reference-free tool for comprehensive detection of CRISPR off-target sites using complete genome assemblies. Unlike traditional approaches that rely on reference genomes and variant files, CRISPRapido directly analyzes haplotype-resolved assemblies to identify potential off-targets arising from any form of genetic variation. By leveraging the efficient Wavefront Alignment (WFA) algorithm and parallel processing, CRISPRapido enables fast scanning of whole genomes while considering both mismatches and DNA/RNA bulges. The tool is particularly valuable for therapeutic applications, where comprehensive off-target analysis is critical for safety assessment. CRISPRapido can process both complete assemblies and raw sequencing data, providing flexibility for different analysis scenarios while maintaining high computational efficiency through its robust Rust implementation.

## Features

- Fast parallel scanning of genomic sequences
- Support for both gzipped and plain FASTA files
- Configurable mismatch and bulge tolerances
- Automatic reverse complement scanning
- PAF-format output compatible with downstream analysis tools
- Multi-threaded processing for improved performance

## Installation

You need to build `WFA2-lib` first, which is a submodule of this repository. To do so, run:

```bash
git clone --recursive https://github.com/pinellolab/crisprapido.git
cd crisprapido/WFA2-lib
make clean all
cd ..
```

Then, you can install CRISPRapido using Cargo:

```shell
# Point to your pre-built WFA2-lib directory
export WFA2LIB_PATH="./WFA2-lib"

# Install CRISPRapido
cargo install --git https://github.com/pinellolab/crisprapido.git
```

### For GUIX's users

```bash
git clone --recursive https://github.com/pinellolab/crisprapido.git
cd crisprapido/WFA2-lib
guix shell -C -D -f guix.scm
export CC=gcc; make clean all
exit
cd ..
env -i bash -c 'WFA2LIB_PATH="./WFA2-lib" PATH=/usr/local/bin:/usr/bin:/bin ~/.cargo/bin/cargo install --path .'
```

## Usage

```bash
crisprapido -r -g [OPTIONS]
```

### Required Arguments

- `-r, --reference `: Input reference FASTA file (supports .fa and .fa.gz)
- `-g, --guide `: Guide RNA sequence (without PAM)

### Optional Arguments

- `-m, --max-mismatches `: Maximum number of mismatches allowed (default: 4)
- `-b, --max-bulges `: Maximum number of bulges allowed (default: 1)
- `-z, --max-bulge-size `: Maximum size of each bulge in bp (default: 2)
- `-w, --window-size `: Size of sequence window to scan (default: 4x guide length)
- `-t, --threads `: Number of threads to use (default: number of logical CPUs)
- `--no-filter`: Disable all filtering (report every alignment)

## Output Format

CRISPRapido outputs results in the Pairwise Alignment Format (PAF), which is widely used for representing genomic alignments. Each line represents a potential off-target site with the following tab-separated fields:

| Column | Field | Description |
|--------|-------|-------------|
| 1 | Query name | "Guide" (the guide RNA sequence) |
| 2 | Query length | Length of the guide RNA |
| 3 | Query start | 0-based start position in the guide sequence |
| 4 | Query end | 0-based end position in the guide sequence |
| 5 | Strand | '+' (forward) or '-' (reverse complement) |
| 6 | Target name | Reference sequence name (e.g., chromosome) |
| 7 | Target length | Length of the target reference sequence |
| 8 | Target start | 0-based start position in reference |
| 9 | Target end | 0-based end position in reference |
| 10 | Matches | Number of matching bases |
| 11 | Block length | Total alignment block length |
| 12 | Mapping quality | Always 255 for CRISPRapido |

Additionally, CRISPRapido includes these custom tags:

| Tag | Description |
|-----|-------------|
| `as:i` | Alignment score (lower is better) |
| `nm:i` | Number of mismatches |
| `ng:i` | Number of gaps (indels) |
| `bs:i` | Biggest gap size in bases |
| `cg:Z` | CIGAR string representing alignment details |

### Example Output

```
Guide 20 0 20 + chr1 248956422 10050 10070 19 21 255 as:i:6 nm:i:1 ng:i:0 bs:i:0 cg:Z:19=1X
```

This indicates:
- A 20bp guide RNA aligned to chromosome 1
- Position 10050-10070 on the forward strand
- 19 bases match with 1 mismatch (nm:i:1)
- No gaps (ng:i:0)
- Alignment score of 6 (as:i:6)
- CIGAR string shows 19 matches followed by 1 mismatch

### PAF Format Specification

For more details on the PAF format, see the [official specification](https://github.com/lh3/miniasm/blob/master/PAF.md) from the developers of miniasm.

## Example

```bash
crisprapido -r genome.fa -g ATCGATCGATCG -m 3 -b 1 -z 2
```

## Testing

Run the test suite:

```bash
# Point to your pre-built WFA2-lib directory
export WFA2LIB_PATH="./WFA2-lib"

cargo test
```

Enable debug output during development:

```bash
cargo run --features debug
```

## License

See LICENSE file

## Citation

Stay tuned!