{"id":38895300,"url":"https://github.com/vestalisvirginis/synphage","last_synced_at":"2026-01-17T14:58:51.684Z","repository":{"id":206382514,"uuid":"591363724","full_name":"vestalisvirginis/synphage","owner":"vestalisvirginis","description":"Pipeline to create phage genome synteny graphics from genbank files","archived":false,"fork":false,"pushed_at":"2024-10-28T11:10:16.000Z","size":87065,"stargazers_count":13,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-11-28T02:17:11.226Z","etag":null,"topics":["blastn","python3","synteny","unit-testing"],"latest_commit_sha":null,"homepage":"https://vestalisvirginis.github.io/synphage/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vestalisvirginis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-20T15:20:24.000Z","updated_at":"2025-08-05T18:25:38.000Z","dependencies_parsed_at":"2024-05-27T12:06:40.020Z","dependency_job_id":null,"html_url":"https://github.com/vestalisvirginis/synphage","commit_stats":null,"previous_names":["vestalisvirginis/synphage"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/vestalisvirginis/synphage","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vestalisvirginis%2Fsynphage","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vestalisvirginis%2Fsynphage/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vestalisvirginis%2Fsynphage/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vestalisvirginis%2Fsynphage/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vestalisvirginis","download_url":"https://codeload.github.com/vestalisvirginis/synphage/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vestalisvirginis%2Fsynphage/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28510928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T13:38:16.342Z","status":"ssl_error","status_checked_at":"2026-01-17T13:37:44.060Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blastn","python3","synteny","unit-testing"],"created_at":"2026-01-17T14:58:51.591Z","updated_at":"2026-01-17T14:58:51.672Z","avatar_url":"https://github.com/vestalisvirginis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# synphage\n\nPipeline to create phage genome synteny graphics from genbank files\n\nThis library provides an intuitive tool for creating synteny graphics highlighting the conserved genes between multiple genome sequences.  \nThis tool is primarily designed to work with phage genomes or other short sequences of interest, although it works with bacterial genomes as well.\n\nDespite numerous synteny tools available on the market, this tool has been conceived because none of the available tools allows to visualise gene conservation in multiple sequences at one glance (as typically cross-links are drawn only between two consecutive sequences for a better readability).\n\nAs a result `synphage` was born.  \n\nIn addition to show conserved genes across multiple sequences, the originality of this library stands in the fact that when working on the same set of genomes the initial blast and computation need to be run only once. Multiple graphics can then be generated from these data, comparing all the genomes or only a set of genomes from the analysed dataset. Moreover, the generated data is also available to the user as a table, where individual genes or groups of genes can easily be checked by name for conservation or uniqueness.\n\n\n## Stats \n[![PyPI version](https://badge.fury.io/py/synphage.svg)](https://badge.fury.io/py/synphage)\n[![ci](https://github.com/vestalisvirginis/synphage/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/vestalisvirginis/synphage/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/github/vestalisvirginis/synphage/graph/badge.svg?token=HX32HRFS3V)](https://codecov.io/github/vestalisvirginis/synphage)\n[![](https://img.shields.io/pypi/dm/synphage.svg?style=popout-square)](https://pypi.org/project/synphage/)\n[![License](https://img.shields.io/github/license/vestalisvirginis/synphage.svg?style=popout-square)](https://opensource.org/licenses/Apache-2.0)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n\n## Install\n\n`synphage` is available via [pip install](https://pypi.org/project/synphage/) or as [docker image](https://hub.docker.com/r/vestalisvirginis/synphage).  \nFor more detailed instruction, consult [synphage installation guide](https://vestalisvirginis.github.io/synphage/installation/).  \n\n### Via pip \n``` bash\npip install synphage\n```\n[See complete documention](https://vestalisvirginis.github.io/synphage/installation#/pip-install)\n\n### Via docker\n```bash\ndocker pull vestalisvirginis/synphage:\u003ctag\u003e\n```\n\u003e[!NOTE]\n\u003eReplace `\u003ctag\u003e` with the [latest image tag](https://hub.docker.com/r/vestalisvirginis/synphage/tags).  \n[See complete documention](https://vestalisvirginis.github.io/synphage/installation#/docker-install)\n\n### Additional dependencies\n\nsynphage relies on one non-python dependency that needs to be manually installed when synphage is installed with pip:\n- [Blast+](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/) \u003e= 2.12.0  \n\nInstall `Blast+` using your package manager of choice, e.g. for linux ubuntu:\n``` bash\napt update\napt install -y ncbi-blast+\n```\n\nor by downloading an [executables](https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/) appropriate for your system. For help, check the complete [installation documentation](https://www.ncbi.nlm.nih.gov/books/NBK569861/).  \n\n\n## Usage\n\n### Setup  \n\nsynphage requires the user to specify the following environment variables:\n- `INPUT_DIR` : to specify the path to the folder containing the user's `GenBank files`. If not set, this path will be defaulted to the temp folder. This path can also be modified at run time.  \n- `OUTPUT_DIR`: to specify the path to the folder where the data generated during the run will be stored. If not set, this path will be defaulted to the temp folder.  \n- `EMAIL` (optional): to connect to the NCBI database.  \n- `API_KEY` (optional): to connect to the NCBI database and download files.  \n\n\u003e[!TIP]\n\u003eThese variables can be set with a `.env` file located in your working directory (Dagster will automatically load them from the .env file when initialising the pipeline) or can be passed in the terminal before starting to run synphage:  \n\u003e**.env**\n\u003e``` .env\n\u003eINPUT_DIR=path/to/my/data/\n\u003eOUTPUT_DIR=path/to/synphage/data\n\u003eEMAIL=user.email@email.com\n\u003eAPI_KEY=UserApiKey\n\u003e```\n\u003e**bash**\n\u003e``` bash\n\u003eexport INPUT_DIR=\u003cpath_to_data_folder\u003e\n\u003eexport OUTPUT_DIR=\u003cpath_to_synphage_folder\u003e\n\u003eexport EMAIL=user.email@email.com\n\u003eexport API_KEY=UserApiKey\n\u003e```\n\n\u003e[!NOTE]  \n\u003eFor docker users, the `INPUT_DIR` is defaulted to `/user_files` and `OUTPUT_DIR` is defaulted to `/data`.  \n\u003e For more detailed explainations on using `synphage docker image`, check our [documentation](https://vestalisvirginis.github.io/synphage/installation/#run-synphage-container).\n\n\n### Running Synphage\n\nA step-by-step example, performed on a group of closely related *Lactococcus* phages is available in the [documentation](https://vestalisvirginis.github.io/synphage/phages/).\n\n#### Starting Dagster\n\n`synphage` uses [Dagster](https://dagster.io). In order to run synphage jobs, you need to start dagster first.\n\nSet up the environment variable DAGSTER_HOME in order to keep a trace of your previous run (optional). For more information, see [Dagster documentation](https://docs.dagster.io/deployment/dagster-instance). \n\n```bash\nexport DAGSTER_HOME=\u003cdagster_home_directory\u003e\n\ndagster dev -h 0.0.0.0 -p 3000 -m synphage\n```\n\nFor docker users:\n```bash\ndocker run -p 3000 vestalisvirginis/synphage:\u003ctag\u003e\n```\nFor more information and options, check [running synphage container](https://vestalisvirginis.github.io/synphage/installation/#run-synphage-container).\n\n#### Running the jobs\n\nsynphage pipeline is composed of `four steps` that need to be run `sequencially`.\n[See complete documention](https://vestalisvirginis.github.io/synphage/pipeline)\n\n##### Step 1: Loading the data into the pipeline\nData is loaded into the pipeline from the `input_folder` set by the user `and/or` `downloaded` from the NCBI.  \n- `step_1a_get_user_data` : load user's data\n- `step_1b_download` : download data from the NCBI\n\n\u003e [!IMPORTANT]\n\u003e - Only one of the jobs is required to successfully run step 1.\n\u003e - Configuration is required for `step_1b_download` job: `search_key`, that receives the keywords for querying the NCBI database.  \n\n###### Query config options :\nField Name | Description | Default Value\n ------- | ----------- | ----\n`search_key` | Keyword(s) for NCBI query | Myoalterovirus\n\n\n\u003e [!TIP]\n\u003e Both jobs can be run if the user needs both, local and downloaded files.\n\n##### Step 2: Data validation\nCompleteness of the data is validated at this step.\n- `step_2_make_validation` : perform checks and transformations on the dataset that are required for downstream processing\n\n\u003e [!IMPORTANT]\n\u003e This step is required and cannot be skipped.\n\n##### Step 3: Blasting the data\nThe blast is performed at this step of the pipeline and three different `options` are available:  \n- `step_3a_make_blastn` : run a Nucleotide BLAST on the dataset\n- `step_3b_make_blastp` : run a Protein BLAST on the dataset\n- `step_3c_make_all_blast` : run both, Nucleotide and Protein BLAST simultaneously  \n\n\u003e [!IMPORTANT]\n\u003e Only one of the above jobs is required to successfully run step 3.\n\n\u003e [!TIP]\n\u003e Both `step_3a_make_blastn` and `step_3b_make_blastp` jobs can be run sequencially, mainly in the case where the user decide to run the second job based on the results obtained for the first one.\n\n\n##### Step 4: Synteny plot\nThe graph is created during this last step. The step 4 can be run multiple times with different configurations and different sets of data, as long as the data have been processed once through steps 1, 2 and 3.\n- `step_4_make_plot` : use data generated at step 3 and the genbank files to plot the synteny diagram  \n  \n\u003e [!IMPORTANT]\n\u003e Configuration is require for `step_4_make_plot` job: `graph_type`, that receives either `blastn` or `blastp` as value for specifying what dataset to use for the plot. Default value is set to `blastn`. For more information about the configuration at step 4, check the [documentation](https://vestalisvirginis.github.io/synphage/configurations/).\n\n\u003e [!TIP]\n\u003e Different synteny plots can be generated from the same set of genomes. In this case the three first steps only need to be run once and the fourth step, `step_4_make_plot`, can be triggered separately for each graphs.\n\u003e For modifying the sequences to be plotted (selected sequences, order, orientation), the sequences.csv file generated at step3 can be modify and saved under a different name. This new `.csv` can be passed in the job configuration `sequence_file`.\n\u003e \n\u003e *sequences.csv*\n\u003e ``` txt \n\u003e genome_1.gb,0\n\u003e genome_2.gb,1\n\u003e genome_3.gb,0\n\u003e ```\n\n###### Plotting config options\n\nThe appearance of the plot can be modified through the configuration. \n\nField Name | Description | Default Value\n ------- | ----------- | ----\n`title` | Generated plot file title | synteny_plot\n`graph_type` | Type of dataset to use for the plot | blastn\n`colours` | Gene identity colour bar | [\"#fde725\", \"#90d743\", \"#35b779\", \"#21918c\", \"#31688e\", \"#443983\", \"#440154\"] \n`gradient` | Nucleotide identity colour bar | #B22222\n`graph_shape` | Linear or circular representation | linear\n`graph_pagesize` | Output document format | A4\n`graph_fragments` | Number of fragments | 1\n`graph_start` | Sequence start | 1\n`graph_end` | Sequence end | length of the longest genome\n\n\n## Output\n\nsynphage's output consists of four to six main parquet files (depending if blastn and blastp were both executed) and the synteny graphic. However all the data generated by the synphage pipeline are made available in your data directory.\n\n### Generated data architecture\n\n```\n.\n├── \u003cpath_to_synphage_folder\u003e/\n│   ├── download/\n│   ├── fs/\n│   ├── genbank/\n│   ├── gene_identity/\n│   │   ├── fasta_n/\n│   │   ├── blastn_database/\n│   │   └── blastn/\n│   ├── protein_identity/\n│   │   ├── fasta_p/\n│   │   ├── blastp_database/\n│   │   └── blastp/\n│   ├── tables/\n│   │   ├── genbank_db.parquet\n│   │   ├── processed_genbank_df.parquet\n│   │   ├── blastn_summary.parquet\n│   │   ├── blastp_summary.parquet\n│   │   ├── gene_uniqueness.parquet\n│   │   └── protein_uniqueness.parquet\n│   ├── sequences.csv\n│   └── synteny/\n│      ├── colour_table.parquet\n│      ├── synteny_graph.png\n│      └── synteny_graph.svg\n└── ...\n```\n\n\n### Tables\n\nThe `tables` folder contains the four to six main parquet files generated by the pipeline.\n1. `genbank_db.parquet` : original data parsed from the GenBank files. \n2. `processed_genbank_df.parquet` : data processed during the validation step. It contains two additional columns:\n   - `gb_type` : specifying what type of data is used as unique identifier of the coding elements\n   - `key`: unique identifier based on the columns: `filename`, `id` and `locus_tag`.\n3. `blastn_summary.parquet` : data parsed from the `blastn` output json files. It contains the collection of the best match for each sequence against each genomes. The percentage of identity between two sequences are then used for calculating the plot cross-links between the sequences.  \n4. `blastp_summary.parquet` : data parsed from the `blastp` output json files. It contains the collection of the best match for each sequence against each genomes. The percentage of identity between two sequences are then used for calculating the plot cross-links between the sequences.  \n5. `gene_uniqueness.parquet` : combines both `processed_genbank_df.parquet` and `blastn_summary.parquet` in a single parquet file, allowing the user to quickly know how many matches their sequence(s) of interest has/have retrieved. These data are then used to compute the colour code used for the synteny plot. The result of the computation is recorded in the `colour_table.parquet`. This file is over-written between each `plot` run. \n6. `protein_uniqueness.parquet` : combines both `processed_genbank_df.parquet` and `blastp_summary.parquet` in a single parquet file, allowing the user to quickly know how many matches their sequence(s) of interest has/have retrieved. These data are then used to compute the colour code used for the synteny plot. The result of the computation is recorded in the `colour_table.parquet`. This file is over-written between each `plot` run. \n\n\n### Synteny plot\n\nThe `synteny plot` is generated as `.svg file` and `.png file`, and contains the sequences indicated in the `sequences.csv` file. The genes are colour-coded according to their abundance (percentage) among the plotted sequences. The cross-links between each consecutive sequence indicates the percentage of similarities between those two sequences.\n \n\n## Documentation\n\nVisit [https://vestalisvirginis.github.io/synphage/](https://vestalisvirginis.github.io/synphage/) for complete [installation instruction](https://vestalisvirginis.github.io/synphage/installation/), [guidelines to navigate the pipeline](https://vestalisvirginis.github.io/synphage/pipeline/) and [step-by-step example](https://vestalisvirginis.github.io/synphage/phages/).\n\n\n## Support\n\n**Where to ask for help?**\n\nOpen a [discussion](https://github.com/vestalisvirginis/synphage/discussions).\n\n\n## Roadmap\n\n- ~~[x] create config options for the plot at run time~~\n- ~~[x] integrate the NCBI search~~\n- ~~[x] implement blastp~~  \n- [ ] create possibility to add ref sequence with special colour coding\n- [ ] create interactive plot  \n- [ ] Help us in a discussion?\n\n\n## Status\n\n`[2024-07-20]`   ✨ __New features!__  \n- `Checks` : to validate the quality of the data\n- `Blastp` is finally implemented\n\n  \n`[2024-01-11]`   ✨ __New feature!__   to simplify the addition of new sequences into the genbank folder\n - `download` : download genomes to be analysed from the NCBI database \n\n\n## Contributing \n\nWe accept different types of contributions, including some that don't require you to write a single line of code. For detailed instructions on how to get started with our project, see [CONTRIBUTING](CONTRIBUTING.md) file.\n\n\n## Authors\n- [vestalisvirginis](https://github.com/vestalisvirginis) / Virginie Grosboillot / 🇫🇷 \n\n\n\u003c!-- ## Contributors\n\u003ca href=\"https://github.com/vestalisvirginis/synphage/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=vestalisvirginis/synphage\" /\u003e\n\u003c/a\u003e --\u003e\n\n## License\nApache License 2.0\nFree for commercial use, modification, distribution, patent use, private use.\nJust preserve the copyright and license.\n\n\n\u003e Made with ❤️ in Ljubljana 🇸🇮","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvestalisvirginis%2Fsynphage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvestalisvirginis%2Fsynphage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvestalisvirginis%2Fsynphage/lists"}