Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Chrisjrt/hafeZ
A tool for identifying active prophage elements through read mapping
https://github.com/Chrisjrt/hafeZ
Last synced: about 2 months ago
JSON representation
A tool for identifying active prophage elements through read mapping
- Host: GitHub
- URL: https://github.com/Chrisjrt/hafeZ
- Owner: Chrisjrt
- License: gpl-3.0
- Created: 2020-12-16T20:01:22.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2023-10-09T03:43:34.000Z (over 1 year ago)
- Last Synced: 2024-08-05T10:09:23.898Z (5 months ago)
- Language: Python
- Homepage:
- Size: 771 KB
- Stars: 11
- Watchers: 3
- Forks: 3
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-phages - hafeZ - A tool for identifying inducible prophage elements through read (Annotation / Prophage prediction)
README
# hafeZ
A tool for identifying inducible prophage elements through read mapping"*I caught the happy virus last night when I was out singing beneath the stars.*"
-Hafez# Installation
## Bioconda
```
mamba create -n hafeZ -c conda-forge -c bioconda -c defaults hafez
```## Source
If installing from source, hafeZ requires the following dependencies to also be installed:
### Python dependencies
- pandas
- numpy
- matplotlib
- scipy
- Biopython
- pyrodigal
- pysam### Other dependencies
- minimap2
- samtools
- mosdepth
- hmmer3
- blast
- hhsuite# Quick start
## Help
To access the help menu use the `-h` option:
```
hafeZ.py -h
```## Initial setup
As hafeZ uses the pVOGs database this must first be retrieved and formatted before use. This can be done using the following command:
```
hafeZ.py -G hafeZ_db/ -T phrogs
``````diff
- NOTE: Although both pvogs and phrogs are valid options for the -T/--db_type flag DO NOT USE pVOGS as currently the website hosting the database is down and will therefore not download the database.- So, currently only use phrogs
```## illumina reads
hafeZ accepts illumina reads in both .fastq and .fastq.gz format. To use hafeZ with illumina reads:
```
hafeZ.py -f assembly.fasta -r1 read_1.fastq.gz -r2 read_2.fastq.gz -o output_folder -D hageZ_db -T phrogs
```## Test dataset
A test dataset can be obtained and ran using the following:
```
mkdir test
wget -P test/ ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR455/005/ERR4552545/ERR4552545_1.fastq.gz
wget -P test/ ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR455/005/ERR4552545/ERR4552545_2.fastq.gz
wget -P test/ https://www.ebi.ac.uk/ena/browser/api/fasta/CP015406.2?download=true
mv test/CP015406.2?download=true test/CP015406.2.fasta
./hafeZ.py -r1 test/ERR4552545_1.fastq.gz -r2 test/ERR4552545_2.fastq.gz -o test/output -O -f test/CP015406.2.fasta -t 8 -D hafeZ_db/ -Z -T phrogs
```## Outputs
If a putative active prophage is found hafeZ produces six main ouputs:
- hafeZ_all_roi_seqs.fasta = file containing the DNA sequences of all the regions of interest identified
- hafeZ_summary_all_rois.tsv = file containing a summary of info related to each region of interest
- hafeZ_hmm_hits.tsv = file containing a list of all region if interest orfs and a description of the pvogs/phrogs they hit
- hafeZ_prophage_for_xxx.png = image of zscores per base within contigs where a region of interest was identified with the region highlights (one file per contig containing a region of interest)
- hafeZ_orfs_aa_XXX.faa = fasta file containing amino acid sequence of each orf within the roi
- hafeZ_orfs_dna_XXX.fasta = fasta file containing the dna sequence of each orf within the roiN.B. if the -Z option is used an additional input, zscores_for_contigXXX.png, will also be generated which shows the Z-scores of each contig examined (i.e. if input genome contains 100 contigs there will be 100 zscore .png files output. This can be useful if the user wants to manuallly inspect for any potential rois that may be missed under default paramaters. )
If no putative active prophages are found hafeZ will output only an empty hafeZ_summary_all_rois.tsv file.
## Caveat
hafeZ is currently only optimised to use paired end illumina reads as inputs. Future updates will allow use of single end illumina reads, nanopore reads, and pacbio reads, but these have not optimised yet.
# Citation
If you publish results from hafeZ please cite the following:
hafeZ: Active prophage identification through read mapping (bioRxiv)
https://doi.org/10.1101/2021.07.21.453177https://github.com/Chrisjrt/hafeZ