https://github.com/fomightez/hhsuite3-binder
Repo for running command line-based HH-suite3 software in Jupyter environment provided via Binder.
https://github.com/fomightez/hhsuite3-binder
bioinformatics bioinformatics-analysis bioinformatics-scripts genomics pandas pandas-dataframes protein-sequences python
Last synced: 4 months ago
JSON representation
Repo for running command line-based HH-suite3 software in Jupyter environment provided via Binder.
- Host: GitHub
- URL: https://github.com/fomightez/hhsuite3-binder
- Owner: fomightez
- License: mit
- Created: 2020-11-17T18:22:08.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-11-09T20:33:50.000Z (over 1 year ago)
- Last Synced: 2024-12-28T13:24:24.198Z (6 months ago)
- Topics: bioinformatics, bioinformatics-analysis, bioinformatics-scripts, genomics, pandas, pandas-dataframes, protein-sequences, python
- Language: Jupyter Notebook
- Homepage:
- Size: 590 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# hhsuite3-binder
[](https://mybinder.org/v2/gh/fomightez/hhsuite3-binder/main?filepath=index.ipynb)
*tl;dr:*
Click any `launch binder` badge on this page to run command line-based HH-suite3 software inside your browser.------
***HH-suite3 for fast remote homology detection and deep protein annotation demonstrated in your browser via Jupyter.***
This repository is for running HH-suite3 programs, such as hhblits and hhsearch, or analyzing the output of HH-suite3 programs in Jupyter environment provided by [MyBinder.org](https://mybinder.org/).
In regards to analyzing the output of HH-suite3 programs, having HH-suite3 working inside the Jupyter environment with interactive Python adds some convenient features that are illustrated. A utility script for moving command line-based HH-suite3 results files into Python is also demonstrated, along several ways it can be used to mine additional information from the the output of HH-suite3 programs using Python/Jupyter.-------
Software
--------The [HH-suite3](https://github.com/soedinglab/hh-suite/wiki) software will be installed already in each active session launched from this repository. The HH-suite3 software is available directly from the authors [here](https://github.com/soedinglab/hh-suite).
The HH-suite3 software references are listed in full [here](https://github.com/soedinglab/hh-suite/wiki#user-guide) under 'References'.
Users of HH-suite3 here should probably cite:
- Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger S J, and Söding J (2019)
HH-suite3 for fast remote homology detection and deep protein annotation, *BMC Bioinformatics*, 473. [doi: 10.1186/s12859-019-3019-7](https://doi.org/10.1186/s12859-019-3019-7)***Clarifying Software Attribution: I, Wayne, am not involved in the HH-suite3 software at all. Those in [the lab of Johannes Söding](https://www.mpibpc.mpg.de/soeding) are the developers and source of HH-suite3. See their materials. I simply set up this repository to make the software useable on the command line without installation headaches and in a full-featured, browser-based computational environment.***
I, Wayne Decatur, did Jupyter/Python-based utilities for use with the results from command line HH-suite3 results files; these available [here](https://github.com/fomightez/sequencework/tree/master/hhsuite3-utilities) and utilized in the notebooks in this repository to process the results and allow easily converting the results to other Python-friendly forms.
Usage
-----This repository is set up to allow running the command line version of HH-suite3 software after pressing the `launch binder` button above or below. The target use case is when you want to learn about using HH-suite3, especially `hhblits`. Importantly, the resources needed for `hhblits` to make a good HHM for a sequence goes beyond what MyBinder provides. You'll need to find more computer resources and power to build on what you learn here. Instead of using a good representative of sequence space provided by the latest release of [the Uniclust30 database](https://uniclust.mmseqs.com/), we'll either use a smaller database as an example or bring in pre-made MSAs or HHMs. The Uniclust30 database (currently the `2020_06` version) is generously provided by the software authors via a webserver [here](https://toolkit.tuebingen.mpg.de/tools/hhblits) for making rich MSAs for a sequence.
In the notebooks that can be launched, I have added some examples illustrating how to use the program and process the results easily with Python and convert to other forms. **Additionally, useful resources for using command line HH-suite3 are in those notebooks or analyzing the output from the HH-suite3 programs are presented.** Alternatively, the notebook with most of resources can be viewed statically [here?????](?????). The ['Credits/Resources'????? section right at the top](?????) is a good place to start.
**The Binder-launchable version too limiting for your needs?**
The authors have made the software installable via conda, see [here](https://anaconda.org/bioconda/hhsuite). If you need other installation options, such as configured for a cluster, other installation options are type is discussed [here](https://github.com/soedinglab/hh-suite/wiki#installation-of-the-hhsuite-and-its-databases).
Web-based automated searching for remote homologs via [HHpred](https://toolkit.tuebingen.mpg.de/tools/hhpred) is also available via [the MPI Bioinformatics Toolkit](https://toolkit.tuebingen.mpg.de/).
Related
-------- My [hhsuite3-utilities sub-repo](https://github.com/fomightez/sequencework/tree/master/hhsuite3-utilities)
- [Collection of Jupyter notebooks by Gorbalenya-Lab that were used for their LAMPA paper](https://github.com/Gorbalenya-Lab/hh-suite-notebooks/). LAMPA paper: [LAMPA, LArge Multidomain Protein Annotator, and its application to RNA virus polyproteins.Gulyaeva AA, Sigorskih AI, Ocheredko ES, Samborskiy DV, Gorbalenya AE. Bioinformatics. 2020 May 1;36(9):2731-2739. doi: 10.1093/bioinformatics/btaa065. PMID: 32003788](https://pubmed.ncbi.nlm.nih.gov/32003788/) which describes using HH-suite to annotate multi-domain proteins where the domain boundaries aren't initially known.
Technical Details
-----------------This repository is set up to make use of the binder service offered by [MyBinder.org](https://mybinder.org/). See their site for more information about Binder.
I borrrowed the 'warning' highlight/introductory text about notebooks at the top of the included notebook from Tim Sherratt's notebook [here](https://github.com/GLAM-Workbench/te-papa-api/blob/main/Exploring-the-Te-Papa-collection-API.ipynb).
Click this button below to begin using HH-suite3 (or BLAST, as well):
[](https://mybinder.org/v2/gh/fomightez/hhsuite3-binder/main?filepath=index.ipynb)