Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nchenche/herbifun
https://github.com/nchenche/herbifun
Last synced: 27 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/nchenche/herbifun
- Owner: nchenche
- Created: 2019-01-11T09:58:42.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-01-06T09:16:27.000Z (about 4 years ago)
- Last Synced: 2023-10-20T09:24:34.647Z (about 1 year ago)
- Language: Clarion
- Size: 10.3 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
README
About
=====
- hmmbuilder.py : build hmm profile with an iterative search protocole
- requirement:
- hmmer (must be installed: sudo apt install hmmer)
- muscle (must be installed: sudo apt install muscle)
- usearch (already included)
- annotater.py: annotate your favorite proteins with your personnal specified rules
- require hmmer---------------------------------------
Installation
---------------------------------------
1 - create a virtual environment:
virtualenv -p python2.7 ~/hmm_tools2 - activate your virtual environment:
source ~/hmm_tools/bin/activateFrom now, everything you will install for python will be specifically located in this environment (see https://virtualenv.pypa.io/en/stable).
3 - go in HMMbuilder-0.0.0/ and install the package:
python setup.py installNow, both hmmbuilder.py and annotater.py should be executable. Let's test this.
After typing 'hmmbuilder.py -h' you should see:"
usage: hmmbuilder.py [-h] -seqdb [SEQDB] -fasta [FASTA] -dir [DIR]
[-identity IDENTITY] [-cov COV] [-cval CVAL] [-ival IVAL]
[-acc ACC]Iterative building of hmm profiles
optional arguments:
-h, --help show this help message and exit
-seqdb [SEQDB] Sequences used to learn hmm profile (fasta format)
-fasta [FASTA] Sequence(s) used as first seed (fasta format)
-dir [DIR] Output directory
-identity IDENTITY Sequence identity threshold to remove redundancy in
seeds'sequences
-cov COV Minimum percentage of coverage alignment between hmm hit
and hmm profile (0.0)
-cval CVAL hmmer conditional e-value cutoff (0.01)
-ival IVAL hmmer independant e-value cutoff (0.01)
-acc ACC hmmer mean probability of the alignment accuracy between
each residues of the target and the corresponding hmm
state (0.6)
"After typing 'annotater.py -h' you should see:
"
usage: annotater.py [-h] -proteome [PROTEOME] -hmmdb [HMMDB] -rules [RULES]
-dir [DIR] [-cov COV] [-cval CVAL] [-ival IVAL] [-acc ACC]Iterative building of hmm profiles
optional arguments:
-h, --help show this help message and exit
-proteome [PROTEOME] Proteome fasta file
-hmmdb [HMMDB] HMM profile database
-rules [RULES] File containing rules
-dir [DIR] Output directory
-cov COV Minimum percentage of coverage alignment between hmm
hit and hmm profile (0.0)
-cval CVAL hmmer conditional e-value cutoff (0.01)
-ival IVAL hmmer independant e-value cutoff (0.01)
-acc ACC hmmer mean probability of the alignment accuracy
between each residues of the target and the
corresponding hmm state (0.6)"
Note: you can exit from the virtual environment by typing 'deactivate'. Once it's done, annotater.py and hmmbuilder.py won't be executable until you reactivate the virtual environment (source ~/hmm_tools/bin/activate).
---------------------------------------
Example of usage for hmmbuilder.py (datas in datas/)
---------------------------------------
Go to datas/ and type:
hmmbuilder.py -seqdb mgg_70-15_8.fasta -fasta A.msa -dir ./The output directory will look like this:
A_hmmbuild_2018-08-14_18-10-50/
├── A.hmm -> (resulting hmm profile)
├── A_hmmbuild_2018-08-14_18-10-50.log -> (log file)
├── A.msa -> (list of sequences (fasta format) used for the resulting hmm)
├── A.seed -> (sequence alignment of A.msa)
└── runs_output -> (output files for each iteration)
├── A-1_nr.clw
├── A-1_nr.domtblout
├── A-1_nr.hmm
├── A-1_nr.msa
├── A-2_hybrid.msa
├── A-2_new.msa
├── A-2_nr.clw
├── A-2_nr.domtblout
├── A-2_nr.hmm
├── A-2_nr.msa
├── A-3_hybrid.msa
├── A-3_new.msa
├── A-3_nr.clw
├── A-3_nr.domtblout
├── A-3_nr.hmm
├── A-3_nr.msa
...---------------------------------------
Example of usage for annotater.py (datas in datas/)
---------------------------------------
Note:1 - you must have an HMM profile database generated.
For this, once you have generated all your desired hmm profiles, concatenate them:
cat A_hmmbuild_date-time/A.hmm AT_hmmbuild_date-time/AT.hmm KS_hmmbuild_date-time/KS.hmm PP_hmmbuild_date-time/PP.hmm > database.hmm
and then:
hmmpress database.hmm2 - you'll need to create a file containing the rules.
The file must contain 3 fields separated by '|'.
- 1st field: class name you want to give to your protein
- 2nd field: Description name of your protein (or anything you want)
- 3rd field: domain(s) required to annotate your protein (each domain must be comma separated)For instance, go to datas/ and type:
cat annotation.rules#Class | Name | dom1,dom2
PKS | Polyketide Synthase | KS,AT,PP
PKS-like | Polyketide Synthase | KS, AT
dom_PP | PP-binding domain | PP
dom_KS | Ketoacyl synthase domain | KS
dom_AT | Acyltransferase domain | AT
NRPS | Non-Ribosomal Peptide Synthase | C,A,PPOnce you have all required files, you can type:
annotater.py -proteome mgg_70-15_8.fasta -hmmdb database.hmm -rules annotation.rules -dir annotated/The ouputs are xml files in annotated/. For instance:
cat annotated/PKS_MGG_00241T0.xmlmgg_70-15_8.fasta
MGG_00241T0
PKS
MEPKANGQSMESTKLFLFGDQTIEFRFPDAQHCREVWATLSE...KALERFLS
2152
KS
424
381
804
486.3
1.3e-149
AT
92
914
1005
50.0
5.1e-17
AT
244
1016
1259
181.0
6.7e-57
PP
66
1728
1793
37.8
2.7e-13