https://github.com/danderson123/amira
A tool to detect acquired AMR genes directly from long read sequencing data.
https://github.com/danderson123/amira
amr assembly bacteria bacterial-genome-analysis epidemiology genotyping graph
Last synced: about 1 month ago
JSON representation
A tool to detect acquired AMR genes directly from long read sequencing data.
- Host: GitHub
- URL: https://github.com/danderson123/amira
- Owner: Danderson123
- License: apache-2.0
- Created: 2022-12-01T13:40:28.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-10T17:06:16.000Z (about 1 month ago)
- Last Synced: 2025-04-15T15:04:52.780Z (about 1 month ago)
- Topics: amr, assembly, bacteria, bacterial-genome-analysis, epidemiology, genotyping, graph
- Language: Python
- Homepage:
- Size: 15.2 MB
- Stars: 8
- Watchers: 3
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Amira
## Introduction
Amira is an AMR gene detection tool designed to work directly from bacterial long read sequences. Amira makes it easy to reliably identify the AMR genes in a bacterial sample, reduces the time taken to get meaningful results and allows more accurate detection of AMR genes than assembly.
## Overview
Amira leverages the full length of long read sequences to differentiate multi-copy genes by their local genomic context. This is done by first identifying the genes on each sequencing read and using the gene calls to construct a *de Bruijn* graph (DBG) in gene space. Following error correction, the reads containing different copies of multi-copy AMR genes can be clustered together based on their path in the graph, then assembled to obtain the nucleotide sequence.
## Prerequisites
Amira requires Python and three additional non-Python tools for optimal functionality:
- **Python >=3.9,<3.13**.
- **Poetry** to manage the Python dependencies.
- **Pandora** to identify the genes on each sequencing read.
- **minimap2** for sequence alignment.
- **samtools** for processing alignments.
- **racon** for allele polishing.
- **Jellyfish** for cellular copy number estimation.## Installation
Follow these steps to install Amira and its dependencies.
### With Singularity (preferred method)
Amira and its dependencies can be run through Singularity. First build the container with:
```bash
sudo singularity build amira.img Singularity.def
```You can then run amira with:
```bash
singularity exec amira.img amira --help
```
### With PyPIAmira can be installed from PyPI. You will need to install the non-Python dependencies separately if you opt for this method.
```bash
pip install amira-amr
```
Amira can then be run with:
```bash
amira --help
```### From source
#### Step 1: Clone the Amira Repository
Open a terminal and run the following command to clone the repository and navigate into it:
```bash
git clone https://github.com/Danderson123/Amira && cd Amira
```
#### Step 2: Install Poetry
Amira’s dependencies are managed with Poetry. Install Poetry by running:
```bash
pip install poetry
```
#### Step 3: Install Python Dependencies
Once Poetry is installed, use it to set up Amira’s python dependencies.
```bash
poetry install
```
You will need to install the non-Python dependencies separately if you opt for this method.### Installing Non-Python Dependencies
Amira requires Pandora, minimap2 and racon. Follow the links below for instructions on building binaries for each tool:- [Pandora installation guide](https://github.com/iqbal-lab-org/pandora?tab=readme-ov-file#installation)
- [minimap2 installation guide](https://github.com/lh3/minimap2)
- [samtools installation guide](https://www.htslib.org/download/)
- [racon installation guide](https://github.com/isovic/racon)
- [Jellyfish installation guide](https://github.com/gmarcais/Jellyfish)After installation, make a note of the paths to these binaries as they will be required when running Amira.
## Pre-built species-specific panRGs
[Pandora](https://github.com/iqbal-lab-org/pandora) uses species-specific reference pan-genomes (panRGs) to identify the genes on each sequencing read (see above for instructions to install Pandora). Click the relevant link below to download a panRG to run Amira on your favorite bacterial species. If we do not currently support a species you are interested in then we are more than happy to build one, please let us know via a GitHub issue!
* [*Escherichia coli*](https://drive.google.com/file/d/13c_bUXnBEs9iEPPobou7-xEgkz_t08YP/view?usp=sharing)
* [*Klebsiella pneumoniae*](https://drive.google.com/file/d/1DYG3QW3nrQfSckIX9Vjbhbqz5bRd9W3j/view?usp=drive_link)
* [*Enterococcus faecium*](https://drive.google.com/file/d/1AzzFNRbH6VXPj5CX2txlcxhW8AhL9HSh/view?usp=sharing)**Note**: These panRGs can currently detect all acquired AMR genes in the NCBI Bacterial Antimicrobial Resistance Reference Gene Database as of 21st August 2024.
## Running Amira
Once you have installed the Python dependencies, Pandora, Racon and Minimap2, and downloaded the panRG for the species you interested in, Amira can be run with this command.
```bash
amira --reads --output --species --panRG-path --pandora-path --minimap2-path --samtools-path --cores
```## For developers
Amira can also be run on the output of Pandora directly, or from JSON files listing the genes and gene positions on each sequencing read.
### Running with Pandora
After installing Pandora, you can call the genes on your sequencing reads using this command:
```bash
pandora map -t --min-gene-coverage-proportion 0.5 --max-covg 10000 -o pandora_map_output
```
Amira can then be run directly on the output of Pandora using this command:
```bash
amira --pandoraSam pandora_map_output/*.sam --pandoraConsensus pandora_map_output/pandora.consensus.fq.gz --panRG-path --reads --output amira_output --species --minimum-length-proportion 0.5 --maximum-length-proportion 1.5 --cores --racon-path --minimap2-path --samtools-path
```### Running from JSON
To run Amira from the JSON files, you can use this command:
```
amira --pandoraJSON --gene-positions --pandoraConsensus --reads --output --panRG-path --species --racon-path --minimap2-path --cores
```#### JSON example
Some example JSON data can be downloaded from [here](https://drive.google.com/drive/folders/1mQ8JmzVhFiNkgRy5_1iFQrqV2TLNnlQ4). Amira can then be run using this command:
```
amira --pandoraJSON test_data/gene_calls_with_gene_filtering.json --gene-positions test_data/gene_positions_with_gene_filtering.json --pandoraConsensus test_data/pandora.consensus.fq.gz --reads test_data/SRR23044220_1.fastq.gz --output amira_output --species Escherichia_coli --panRG-path . --racon-path --minimap2-path --samtools-path --debug --cores
```### Additional options
For additional options and configurations, run:
```bash
amira --help
```
## Citation
TBD## Contributing
If you’d like to contribute to Amira, please follow these steps:1. Fork the repository.
2. Create a new branch for your feature or bugfix (git checkout -b feature-name).
3. Commit your changes (git commit -m "Description of feature").
4. Push to the branch (git push origin feature-name).
5. Submit a pull request.## License
This project is licensed under the Apache License 2.0 License. See the LICENSE file for details.## Contact
For questions, feedback, or issues, please open an issue on GitHub or contact [Daniel Anderson]().## Additional Resources
* [Pandora Documentation](https://github.com/iqbal-lab-org/pandora/wiki/Usage)