Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nchenche/herbifun


https://github.com/nchenche/herbifun

Last synced: 27 days ago
JSON representation

Awesome Lists containing this project

README

        

About
=====
- hmmbuilder.py : build hmm profile with an iterative search protocole
- requirement:
- hmmer (must be installed: sudo apt install hmmer)
- muscle (must be installed: sudo apt install muscle)
- usearch (already included)

- annotater.py: annotate your favorite proteins with your personnal specified rules
- require hmmer

---------------------------------------
Installation
---------------------------------------
1 - create a virtual environment:
virtualenv -p python2.7 ~/hmm_tools

2 - activate your virtual environment:
source ~/hmm_tools/bin/activate

From now, everything you will install for python will be specifically located in this environment (see https://virtualenv.pypa.io/en/stable).

3 - go in HMMbuilder-0.0.0/ and install the package:
python setup.py install

Now, both hmmbuilder.py and annotater.py should be executable. Let's test this.
After typing 'hmmbuilder.py -h' you should see:

"
usage: hmmbuilder.py [-h] -seqdb [SEQDB] -fasta [FASTA] -dir [DIR]
[-identity IDENTITY] [-cov COV] [-cval CVAL] [-ival IVAL]
[-acc ACC]

Iterative building of hmm profiles

optional arguments:
-h, --help show this help message and exit
-seqdb [SEQDB] Sequences used to learn hmm profile (fasta format)
-fasta [FASTA] Sequence(s) used as first seed (fasta format)
-dir [DIR] Output directory
-identity IDENTITY Sequence identity threshold to remove redundancy in
seeds'sequences
-cov COV Minimum percentage of coverage alignment between hmm hit
and hmm profile (0.0)
-cval CVAL hmmer conditional e-value cutoff (0.01)
-ival IVAL hmmer independant e-value cutoff (0.01)
-acc ACC hmmer mean probability of the alignment accuracy between
each residues of the target and the corresponding hmm
state (0.6)
"

After typing 'annotater.py -h' you should see:

"
usage: annotater.py [-h] -proteome [PROTEOME] -hmmdb [HMMDB] -rules [RULES]
-dir [DIR] [-cov COV] [-cval CVAL] [-ival IVAL] [-acc ACC]

Iterative building of hmm profiles

optional arguments:
-h, --help show this help message and exit
-proteome [PROTEOME] Proteome fasta file
-hmmdb [HMMDB] HMM profile database
-rules [RULES] File containing rules
-dir [DIR] Output directory
-cov COV Minimum percentage of coverage alignment between hmm
hit and hmm profile (0.0)
-cval CVAL hmmer conditional e-value cutoff (0.01)
-ival IVAL hmmer independant e-value cutoff (0.01)
-acc ACC hmmer mean probability of the alignment accuracy
between each residues of the target and the
corresponding hmm state (0.6)

"

Note: you can exit from the virtual environment by typing 'deactivate'. Once it's done, annotater.py and hmmbuilder.py won't be executable until you reactivate the virtual environment (source ~/hmm_tools/bin/activate).

---------------------------------------
Example of usage for hmmbuilder.py (datas in datas/)
---------------------------------------
Go to datas/ and type:
hmmbuilder.py -seqdb mgg_70-15_8.fasta -fasta A.msa -dir ./

The output directory will look like this:
A_hmmbuild_2018-08-14_18-10-50/
├── A.hmm -> (resulting hmm profile)
├── A_hmmbuild_2018-08-14_18-10-50.log -> (log file)
├── A.msa -> (list of sequences (fasta format) used for the resulting hmm)
├── A.seed -> (sequence alignment of A.msa)
└── runs_output -> (output files for each iteration)
├── A-1_nr.clw
├── A-1_nr.domtblout
├── A-1_nr.hmm
├── A-1_nr.msa
├── A-2_hybrid.msa
├── A-2_new.msa
├── A-2_nr.clw
├── A-2_nr.domtblout
├── A-2_nr.hmm
├── A-2_nr.msa
├── A-3_hybrid.msa
├── A-3_new.msa
├── A-3_nr.clw
├── A-3_nr.domtblout
├── A-3_nr.hmm
├── A-3_nr.msa
...

---------------------------------------
Example of usage for annotater.py (datas in datas/)
---------------------------------------
Note:

1 - you must have an HMM profile database generated.

For this, once you have generated all your desired hmm profiles, concatenate them:
cat A_hmmbuild_date-time/A.hmm AT_hmmbuild_date-time/AT.hmm KS_hmmbuild_date-time/KS.hmm PP_hmmbuild_date-time/PP.hmm > database.hmm
and then:
hmmpress database.hmm

2 - you'll need to create a file containing the rules.

The file must contain 3 fields separated by '|'.
- 1st field: class name you want to give to your protein
- 2nd field: Description name of your protein (or anything you want)
- 3rd field: domain(s) required to annotate your protein (each domain must be comma separated)

For instance, go to datas/ and type:
cat annotation.rules

#Class | Name | dom1,dom2
PKS | Polyketide Synthase | KS,AT,PP
PKS-like | Polyketide Synthase | KS, AT
dom_PP | PP-binding domain | PP
dom_KS | Ketoacyl synthase domain | KS
dom_AT | Acyltransferase domain | AT
NRPS | Non-Ribosomal Peptide Synthase | C,A,PP

Once you have all required files, you can type:
annotater.py -proteome mgg_70-15_8.fasta -hmmdb database.hmm -rules annotation.rules -dir annotated/

The ouputs are xml files in annotated/. For instance:
cat annotated/PKS_MGG_00241T0.xml

mgg_70-15_8.fasta
MGG_00241T0
PKS
MEPKANGQSMESTKLFLFGDQTIEFRFPDAQHCREVWATLSE...KALERFLS
2152

KS
424
381
804
486.3
1.3e-149


AT
92
914
1005
50.0
5.1e-17


AT
244
1016
1259
181.0
6.7e-57


PP
66
1728
1793
37.8
2.7e-13