{"id":23505689,"url":"https://github.com/gamcil/synthaser_scripts","last_synced_at":"2026-02-20T22:40:56.907Z","repository":{"id":113168335,"uuid":"368092730","full_name":"gamcil/synthaser_scripts","owner":"gamcil","description":"Python scripts used in synthaser manuscript","archived":false,"fork":false,"pushed_at":"2021-08-09T08:36:40.000Z","size":18023,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-25T00:40:28.472Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gamcil.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-17T07:21:39.000Z","updated_at":"2025-02-20T04:55:10.000Z","dependencies_parsed_at":"2023-03-13T04:45:53.768Z","dependency_job_id":null,"html_url":"https://github.com/gamcil/synthaser_scripts","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gamcil/synthaser_scripts","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Fsynthaser_scripts","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Fsynthaser_scripts/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Fsynthaser_scripts/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Fsynthaser_scripts/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gamcil","download_url":"https://codeload.github.com/gamcil/synthaser_scripts/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gamcil%2Fsynthaser_scripts/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29667095,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-20T19:49:36.704Z","status":"ssl_error","status_checked_at":"2026-02-20T19:44:05.372Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-25T09:39:04.480Z","updated_at":"2026-02-20T22:40:56.888Z","avatar_url":"https://github.com/gamcil.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Scripts used in synthaser manuscript\n\n| File | Description |\n| ---- | ----------- |\n| ``sum_bitscores.py`` | Script to sum bitscores of identical query/target sequence pairs in BLAST results. |\n| ``extract_PKS.py`` | Script to extract PKS/NRPS sequences from MIBiG GenBank/JSON files. |\n| ``mibig/*`` | synthaser output for MIBiG synthases |\n| ``ks_domains/*`` | Files generated during KS domain network construction |\n\n## Methods\n### Comparison to MIBiG domain architectures\nDownload MIBiG database GenBank and JSON dumps and extract contents:\n\n\twget https://dl.secondarymetabolites.org/mibig/mibig_json_2.0.tar.gz\n\twget https://dl.secondarymetabolites.org/mibig/mibig_gbk_2.0.tar.gz\n\ttar xzvf mibig_json_2.0.tar.gz\n\ttar xzvf mibig_gbk_2.0.tar.gz\n\nRetrieve all annotated PKS sequences using ``extract_PKS.py``:\n\n\tpython3 extract_PKS.py \\\n\t\tmibig_gbk_2.0/ \\               # GenBank folder\n\t\tmibig_json_2.0/ \\              # JSON folder\n\t\tmibig_table.tsv \\              # Output, table with MIBiG metadata\n\t\t--fasta mibig_synthases.fasta  # Output, FASTA file with PKS \n\nSetup synthaser:\n\n\tpip install --user synthaser\n\nRun synthaser on PKS sequences, saving HTML plot and search session:\n\n\tsynthaser search \\\n\t\t--query_file mibig_synthases.fasta \\\n\t\t--json_file mibig_synthases.json \\\n\t\t--output mibig_predictions.txt \\\n\t\t--plot mibig.html \\\n\t\t--long_form\n\nThe MIBiG metadata table (``mibig_table.tsv``) was then merged with\nthe synthaser predictions table (``mibig_predictions.txt``). MIBiG domain\narchitectures were copied from the 'NRPS/PKS domains' tab of each\nMIBiG entry, added to the table and compared to the predictions in the\nsynthaser output.\n\n### Creation of the Aspergillus KS network\nRetrieve sequences from NCBI containing the ``cond_enzymes`` conserved\ndomain family, removing any unnecessary information from FASTA description\nlines:\n\n\tesearch -db cdd -query 238201 |\\\n\t\telink -target protein |\\\n\t\tefilter -query \"Aspergillus\"[ORGN] -source genbank |\\\n\t\tefetch -format fasta |\\\n\t\tsed 's/ .*$//g' - \u003e synthases.faa\n\nAnalyse sequences using synthaser:\n\n\tsynthaser search \\\n\t\t--query_file synthases.faa \\\n\t\t--json_file synthases.json \\\n\t\t--output architectures.txt \\\n\t\t--long_form\n\nExtract KS domains from the search session:\n\n\tsynthaser extract \\\n\t\tsynthases.json \\  # Session file\n\t\tsynthases_  \\     # Output file prefix, e.g. synthases_KS.faa\n\t\t--mode domain \\   # Specify domain extraction\n\t\t--types KS        # Specify KS domains\n\nBuild DIAMOND database from extracted KS domains:\n\n\tdiamond makedb --in synthases_KS.faa --db KS\n\nPerform all vs all alignments:\n\n\tdiamond blastp --query domains.faa \\\n\t\t--db KS.dmnd \\\n\t\t--more-sensitive \\\n\t\t--outfmt \"6 qseqid sseqid bitscore\" \\\n\t\t--out KS_alignments.tsv\n\nSum bitscores of all non-overlapping high-scoring segment pairs (HSPs):\n\n\tpython3 sum_bitscores.py KS_alignments.tsv summed.tsv\n\nThe summed alignment table (``summed.tsv``) was then imported into CytoScape\nv3.7.2 to build a similarity network. Domain architecture predictions from\nsynthaser (``architectures.txt``) were imported and connected to their\ncorresponding nodes, which were then coloured based on an alphabetical ordering\nof architectures.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgamcil%2Fsynthaser_scripts","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgamcil%2Fsynthaser_scripts","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgamcil%2Fsynthaser_scripts/lists"}