{"id":43932692,"url":"https://github.com/bionf/prottrace","last_synced_at":"2026-02-07T00:18:49.804Z","repository":{"id":47448024,"uuid":"101048153","full_name":"BIONF/protTrace","owner":"BIONF","description":"A simulation based framework to estimate the evolutionary traceability of protein.","archived":false,"fork":false,"pushed_at":"2021-09-30T15:37:41.000Z","size":26253,"stargazers_count":4,"open_issues_count":1,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-09-09T16:36:15.276Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BIONF.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-22T09:51:21.000Z","updated_at":"2025-07-05T20:02:54.000Z","dependencies_parsed_at":"2022-08-23T09:10:14.952Z","dependency_job_id":null,"html_url":"https://github.com/BIONF/protTrace","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/BIONF/protTrace","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BIONF%2FprotTrace","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BIONF%2FprotTrace/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BIONF%2FprotTrace/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BIONF%2FprotTrace/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BIONF","download_url":"https://codeload.github.com/BIONF/protTrace/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BIONF%2FprotTrace/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29181310,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-06T23:15:33.022Z","status":"ssl_error","status_checked_at":"2026-02-06T23:15:09.128Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-07T00:18:49.109Z","updated_at":"2026-02-07T00:18:49.785Z","avatar_url":"https://github.com/BIONF.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# protTrace - A simulation based framework to estimate the evolutionary traceability of protein.\n[![language: Python](https://img.shields.io/badge/language-Python-blue.svg?style=flat)](https://www.python.org/)\n[![presented at: GCB2018](https://img.shields.io/badge/presented%20at-GCB2018-green.svg?style=flat)](http://gcb2018.de/)\n[![published in: BioRxiv](https://img.shields.io/badge/published%20in-BioRxiv-ff69b4.svg?style=flat)](https://doi.org/10.1101/302109)\n[![license: GPL-3.0](https://img.shields.io/badge/license-GNU--GPL3.0-lightgrey.svg)](https://opensource.org/licenses/GPL-3.0)\n\n# Table of Contents\n* [Scientific Context](#scientific-context)\n* [Workflow](#Workflow)\n* [Installation \u0026amp; Usage](#installation--usage)\n   * [protTrace \u0026amp; Accessory Software](#prottrace--accessory-software)\n   * [Configuring protTrace](#configuring-prottrace)\n   * [Calling protTrace](#calling-prottrace)\n* [Input Data](#input-data)\n* [Test Run](#test-run)\n* [WIKI](#wiki)\n* [Bugs](#bugs)\n* [Acknowledgements](#acknowledgements)\n* [License](#license)\n* [Contact](#contact)\n# Scientific context\n*ProtTrace* is a simulation based approach to assess for a protein, the seed, over what evolutionary distances its orthologs can \nbe found by means of sharing a significant sequence similarity. By doing so, it helps to differentiate between the true absence \nof an ortholog in a given species, and its non-detection due to a limited search sensitivity. *ProtTrace* was presented 2018 at the German Conference on Bioinformatics (GCB). The high resolution PDF of the corresponding poster is available from [HERE](https://github.com/BIONF/protTrace/wiki/images/Poster-ProtTrace.v2.pdf).\n![Add Text](https://github.com/BIONF/protTrace/wiki/images/Poster-ProtTrace.v2.png \"The evolutionary traceability of a protein\")\n\n\n# Workflow\nThe workflow of protTrace to infer the evolutionary traceability of a seed protein is shown in the figure below (mouse over to see details). It consists of three main steps\n  1. **Parameterization:** The compilation of an orthologous group for this protein. In the standard setting, OMA orthologous groups are used. The sequences in the ortholog group are then used to infer the parameters of substitution and the insertion- and deletion process. \n  1. **Traceability calculation:** The in-silico evolution of the seed protein using the simulation software [REvolver](https://academic.oup.com/mbe/article/29/9/2133/1074669), and the determination of the traceability curve.\n  1. **Visualization:** The inference of the traceability index for the protein in 233 species from all domains of life, and the generation of a colored tree.\nA high resolution PDF of the image is available [HERE](https://github.com/BIONF/protTrace/wiki/images/Workflow-ProtTrace.v1.cap.pdf).\n\n![Alt Text](https://github.com/BIONF/protTrace/wiki/images/Workflow-ProtTrace.v1.png \"Workflow of the protTrace analysis** The workflow is distinguished into the categories Parameterization, Traceability calculation, and visualization. Boxes in green denote input files, boxes in orange represent meta-information, which is generated in the course of the analysis, and yellow boxes indicate output files that are generated as a result of an analysis. Arrows represent individual analysis steps, where the arrow style indicates whether the analysis step is obligatory (solid), or optional (dashed). Analysis steps that require the calling of external programs are indicated by the program name next to the corresponding arrow. Obligatory dependencies on 3d party software are represented by bold face black program names, those that are optional are indicated by grey font color.\") \n\n# Installation \u0026 Usage\nPlease refer to the [protTrace WIKI](https://github.com/BIONF/protTrace/wiki) for a full description of the installation and usage guidlines. The WIKI will also explain how to set up a [virtual machine running protTrace](https://github.com/BIONF/protTrace/wiki/protTrace-VirtualMachine). Below, we will provide a quick excerpt.\n\n*protTrace* is written in Python 2.7, some helper scripts in Perl and R. Find below a the 3rd party software that is required by protTrace:\n  * The ProtTrace package contains scripts written in different languages. In order to run ProtTrace you need the following resources:\n  * Python v2.7.13 or higher. **Note, ProtTrace will not run under Python 3**\n     * Install also the [DendroPy module ](https://www.dendropy.org/) (can be done via [Conda](https://github.com/BIONF/protTrace/wiki/EnvironmentSetUp)).\n  * Perl v5 or higher including the following modules\n     * Getopt::Long\n     * List::Util\n     * LWP::Simple\n  * Java v1.7 or higher\n  * R v3 or higher\n  * [wget](https://www.gnu.org/software/wget/)\n  \n## protTrace \u0026 Accessory Software\n\n| Program name | Version | Description | Mandatory | BioConda |\n|------------- | ------- | ----------- | --------- | -------- |\n|[MAFFT](https://mafft.cbrc.jp/alignment/software/) |v6 or higher|Multiple Sequence alignment|yes|[yes](https://bioconda.github.io/recipes/mafft/README.html)|\n|[NCBI Blast](https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web\u0026PAGE_TYPE=BlastDocs\u0026DOC_TYPE=Download)|v2.7 or higher|Sequence similarity based search|yes|[yes](https://bioconda.github.io/recipes/blast/README.html)|\n|[HMMER](http://hmmer.org/)|3.2 or higher|Sequence similarity based search using Hidden Markov Mode|yes|[yes](https://bioconda.github.io/recipes/hmmer/README.html)|\n|[IQTREE](http://www.iqtree.org/)|1.6.7.1 or higher|Phylogenetic tree reconstruction|yes|[yes](https://anaconda.org/bioconda/iqtree)|\n|[HaMStR OneSeq](https://github.com/BIONF/hamstr)|v1 or higher|targeted ortholog search|no|no|\n\nFor the start, we suggest to omit the optional use of HaMStR, since the use of this software comes along with some strict naming conventions.\n\nOnce that is out of the way (we suggest to use the [conda package management system](https://github.com/BIONF/protTrace/wiki/EnvironmentSetUp) for this) you can just clone this repository to get a copy of *protTrace*.\n\n```\ngit clone https://github.com/BIONF/protTrace\n```\n\n## Configuring protTrace\nTo configure protTrace simply move into the protTrace directory and run the [configure script](https://github.com/BIONF/protTrace/wiki#the-configuration-script)\n\n```\nperl bin/create_conf.pl -name=prog.conf -getOMA -getPfam\n```\n\nThis will check if all dependencies are existing, it will allow you to set all parameters required for the protTrace run, and eventually\nwill download the required data from the [OMA database](https://omabrowser.org) and from the [Pfam database](http://pfam.xfam.org/). \n     * If you are confident that you have this data already available, you can omit either or both of the options **-getOMA** and **-getPfam**. You will then have to tell protTrace via the *create_conf.pl* script\nwhere this data is located. \n     * **Make sure** to adhere to the [formatting requirements for the OMA data](https://github.com/BIONF/protTrace/wiki/PreparingOMA),\n     and that you ran *hmmpress* on the Pfam database.\n\nOnce everything is set, you are ready to run protTest\n\n## Calling protTest\nEnter the protTest directory and type\n```\npython bin/protTrace.py -h\n```\nthis should obtain\n```\nUSAGE:  protTrace.py -i \u003comaIdsFile\u003e | -f \u003cfastaSeqsFile\u003e -c \u003cconfigFile\u003e [-h]\n        -i              Text file containing protein OMA ids (1 id per line)\n        -f              List of input protein sequences in fasta format\n        -c              Configuration file for setting program's dependencies\n```\n# Input Data\n*protTest* can use either OMA protein ids, or a protein sequence in fasta format as input\n\nIn `toy_example/` you can find two files, test.ids and test.fasta for performing a test run with protTrace.\n\nWe describe the input in the section [Test Run](https://github.com/BIONF/protTrace/wiki#test-run) of our [WIKI](https://github.com/BIONF/protTrace/wiki/home).\n\n# Test Run\nWe provide in the directory *toy_example* two files for testing protTrace\n  1. *test.ids*: This file contains the OMA protein id of a yeast protein [DIM1](https://omabrowser.org/oma/info/YEAST05874/). To run this test:\n     1. create a config file **prot.conf** using the *create_conf.pl* script. We recommend to leave all values as default for the start\n     1. place the config file into the directory *toy_example*\n     1. enter the directory *toy_example* and run protTrace by typing\n     ```\n     python ../bin/protTrace.py -i test.id -c prot.conf\n     ```\n     The output that will be generated by this run is described in the [WIKI](https://github.com/BIONF/protTrace/wiki#oma-id-as-input)\n  1. *test.fasta*: This file contains the protein sequence of human ZNT3. \n     1. create or modify the config file **prog.conf** using the *create_conf.pl* script. Make sure to set in the section \n     [General Options](https://github.com/BIONF/protTrace/wiki/Config-File#general-options) the entry **species** to **HUMAN**\n     1. place the config file into the directory *toy_example*\n     1. enter the directory *toy_example* and run protTrace by typing\n     ```\n     python ../bin/protTrace.py -f test.fasta -c prot.conf\n     ```\n     The output that will be generated by this run is described in the [WIKI](https://github.com/BIONF/protTrace/wiki#protein-sequence-as-input)\n\n# WIKI\nRead the [WIKI](https://github.com/BIONF/protTrace/wiki/home) to explore the functionality of protTrace.\n\n# Bugs\nAny bug reports or comments, suggestions are highly appreciated. Please open an issue on GitHub or be in touch via email.\n\n# Acknowledgements\nWe would like to thank the members of [Ebersberger group](http://www.bio.uni-frankfurt.de/43045195/ak-ebersberger) for many valuable suggestions and ...bug reports :)\n\n# Contributors\n* [Arpit Jain](https://github.com/aj87)\n* [Ingo Ebersberger](https://github.com/BIONF)\n* Dominik Perisa\n\n# License\nThis tool is released under [GNU-GPL3.0 license](https://opensource.org/licenses/GPL-3.0).\n\n# How-To Cite\nArpit Jain, Arndt von Haeseler, Ingo Ebersberger The evolutionary Traceability of protein (2018) [BioRxiv](https://doi.org/10.1101/302109) \n\n# Contact\nIngo Ebersberger\nebersberger@bio.uni-frankfurt.de\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbionf%2Fprottrace","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbionf%2Fprottrace","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbionf%2Fprottrace/lists"}