https://github.com/gamcil/synthaser_scripts
Python scripts used in synthaser manuscript
https://github.com/gamcil/synthaser_scripts
Last synced: 4 months ago
JSON representation
Python scripts used in synthaser manuscript
- Host: GitHub
- URL: https://github.com/gamcil/synthaser_scripts
- Owner: gamcil
- Created: 2021-05-17T07:21:39.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-08-09T08:36:40.000Z (almost 5 years ago)
- Last Synced: 2025-10-25T00:40:28.472Z (8 months ago)
- Language: HTML
- Size: 17.2 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Scripts used in synthaser manuscript
| File | Description |
| ---- | ----------- |
| ``sum_bitscores.py`` | Script to sum bitscores of identical query/target sequence pairs in BLAST results. |
| ``extract_PKS.py`` | Script to extract PKS/NRPS sequences from MIBiG GenBank/JSON files. |
| ``mibig/*`` | synthaser output for MIBiG synthases |
| ``ks_domains/*`` | Files generated during KS domain network construction |
## Methods
### Comparison to MIBiG domain architectures
Download MIBiG database GenBank and JSON dumps and extract contents:
wget https://dl.secondarymetabolites.org/mibig/mibig_json_2.0.tar.gz
wget https://dl.secondarymetabolites.org/mibig/mibig_gbk_2.0.tar.gz
tar xzvf mibig_json_2.0.tar.gz
tar xzvf mibig_gbk_2.0.tar.gz
Retrieve all annotated PKS sequences using ``extract_PKS.py``:
python3 extract_PKS.py \
mibig_gbk_2.0/ \ # GenBank folder
mibig_json_2.0/ \ # JSON folder
mibig_table.tsv \ # Output, table with MIBiG metadata
--fasta mibig_synthases.fasta # Output, FASTA file with PKS
Setup synthaser:
pip install --user synthaser
Run synthaser on PKS sequences, saving HTML plot and search session:
synthaser search \
--query_file mibig_synthases.fasta \
--json_file mibig_synthases.json \
--output mibig_predictions.txt \
--plot mibig.html \
--long_form
The MIBiG metadata table (``mibig_table.tsv``) was then merged with
the synthaser predictions table (``mibig_predictions.txt``). MIBiG domain
architectures were copied from the 'NRPS/PKS domains' tab of each
MIBiG entry, added to the table and compared to the predictions in the
synthaser output.
### Creation of the Aspergillus KS network
Retrieve sequences from NCBI containing the ``cond_enzymes`` conserved
domain family, removing any unnecessary information from FASTA description
lines:
esearch -db cdd -query 238201 |\
elink -target protein |\
efilter -query "Aspergillus"[ORGN] -source genbank |\
efetch -format fasta |\
sed 's/ .*$//g' - > synthases.faa
Analyse sequences using synthaser:
synthaser search \
--query_file synthases.faa \
--json_file synthases.json \
--output architectures.txt \
--long_form
Extract KS domains from the search session:
synthaser extract \
synthases.json \ # Session file
synthases_ \ # Output file prefix, e.g. synthases_KS.faa
--mode domain \ # Specify domain extraction
--types KS # Specify KS domains
Build DIAMOND database from extracted KS domains:
diamond makedb --in synthases_KS.faa --db KS
Perform all vs all alignments:
diamond blastp --query domains.faa \
--db KS.dmnd \
--more-sensitive \
--outfmt "6 qseqid sseqid bitscore" \
--out KS_alignments.tsv
Sum bitscores of all non-overlapping high-scoring segment pairs (HSPs):
python3 sum_bitscores.py KS_alignments.tsv summed.tsv
The summed alignment table (``summed.tsv``) was then imported into CytoScape
v3.7.2 to build a similarity network. Domain architecture predictions from
synthaser (``architectures.txt``) were imported and connected to their
corresponding nodes, which were then coloured based on an alphabetical ordering
of architectures.