https://github.com/nylander/hmmer-parser

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/nylander/hmmer-parser
Owner: nylander
License: mit
Created: 2019-05-08T09:06:18.000Z (about 7 years ago)
Default Branch: main
Last Pushed: 2023-09-19T14:28:29.000Z (over 2 years ago)
Last Synced: 2023-09-19T17:59:20.874Z (over 2 years ago)
Language: Perl
Size: 45.9 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# hmmer-parser.pl

## USAGE

./hmmer-parser.pl [options] -i output-file-from-hmmsearch-or-nhmmer

## DESCRIPTION

Parses output from [hmmer](http://hmmer.org/) searches (hmmsearch or nhmmer).

Prints hits in fasta format with descriptions added to the fasta header (tab separated).

Will print to stdout or to file.

Prints hits in fasta format with descriptions added to the fasta header (tab separated).

Example output fasta header:

>TRINITY_DN52935_c2_g1_i1 Query:COI Identity:83.35 E-value:1.2e-277 Score:926.9 Result:PRESENT

A label "PRESENT" will be there if identity requirements are fulfilled:

if ( (percent identity in HSP >= PERCENTAGE) and (percent coverage of HSP to query >= COVERAGE) )

If not "PRESENT", then the tag can be labeled as "ABSENT" or "TRUNCATED" depending on the
values of percent identity in HSP and the percent coverage of HSP to query.

The values of PERCENTAGE and COVERAGE can be set by options -p and -c and will only affect
the tag "Result" in the output fasta headers.

## OPTIONS

-i Infile. Mandatory.
-m, -n= Maximum number of hits to show.
Default is "1".
-s= Sort output sequences on either "E-value", "Score", or "Identity".
"Score" is default.
-p= Minimum percentage for residual identity in alignment.
Default is "80".
-c= Minimum coverage ((length of query in alignment pair/original length of query)*100).
Default is "80".
-o Outfile.
-v Be verbose (or --noverbose).

## REQUIREMENTS

BioPerl, Bio::SearchIO::hmmer.

Example installation on Ubuntu 22.04:

$ sudo apt install \
libbio-perl-perl \
libbio-perl-run-perl \
libbio-searchio-hmmer-perl

## WORKED EXAMPLES

### hmmsearch (HMMER 3.3.2); one coi sequence against 24 coi sequences

$ scripts/fas2sto.pl data/ref-coi.fas > data/ref-coi.sto
$ hmmbuild --cpu 4 data/ref-coi.hmm data/ref-coi.sto
$ hmmsearch --cpu 4 data/ref-coi.hmm \
data/coi.fas > data/coi-vs-ref-coi.hmmsearch.out

Parse

$ ./hmmer-parser.pl \
-i data/coi-vs-ref-coi.hmmsearch.out \
-o data/coi-vs-ref-coi.hmmsearch.hmm-parser.fas

### nhmmer (HMMER 3.3.2); coi HMM-profile (calculated from a coi multiple sequence alignment) against one genome (nt)

$ wget -O - \
"https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/335/GCF_000002335.3_Tcas5.2/GCF_000002335.3_Tcas5.2_genomic.fna.gz" | \
gunzip -c > data/GCF_000002335.3.fna
$ scripts/fas2sto.pl data/ref-coi.fas > data/ref-coi.sto
$ hmmbuild --cpu 4 data/ref-coi.hmm data/ref-coi.sto
$ nhmmer --cpu 4 \
-o data/GCF_000002335.ref-coi-vs-GCF_000002335.3.nhmmer.out \
data/ref-coi.hmm \
data/GCF_000002335.3.fna

Parse

$ ./hmmer-parser.pl \
-i data/GCF_000002335.ref-coi-vs-GCF_000002335.3.nhmmer.out \
-o data/GCF_000002335.ref-coi-vs-GCF_000002335.3.nhmmer.hmm-parser.fas

## NOTES

Tested on output from hmmsearch and nhmmer from HMMer v.3.1b2 and v.3.3.2.
Beware of change in output format between HMMer versions.

## AUTHOR

Johan Nylander

## COMPANY

NRM

## LICENSE

MIT. See [LICENSE file](LICENSE)

## DOWNLOAD

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nylander/hmmer-parser

Awesome Lists containing this project

README