https://github.com/eead-csic-compbio/get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis
https://github.com/eead-csic-compbio/get_homologues

annotation bacteria clustering fasta genbank pangene pangenome plants transcriptome

Last synced: 4 months ago
JSON representation

GET_HOMOLOGUES: a versatile software package for pan-genome analysis

Host: GitHub
URL: https://github.com/eead-csic-compbio/get_homologues
Owner: eead-csic-compbio
License: other
Created: 2015-11-26T14:42:05.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2026-01-26T15:58:43.000Z (4 months ago)
Last Synced: 2026-01-27T04:21:31.988Z (4 months ago)
Topics: annotation, bacteria, clustering, fasta, genbank, pangene, pangenome, plants, transcriptome
Language: Perl
Homepage:
Size: 80.4 MB
Stars: 123
Watchers: 15
Forks: 26
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.txt
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          ## GET_HOMOLOGUES 

A versatile software package for pan-genome analysis, including both GET_HOMOLOGUES and GET_HOMOLOGUES-EST. It includes algorithms designed for:

 * Clustering coding sequences in homologous (possibly orthologous) groups, on the grounds of sequence similarity. By default GET_HOMOLOGUES compares protein sequences, while GET_HOMOLOGUES-EST aligns nucleotide sequences (CDS or transcripts).

 * Definition of pan- and core-genomes by calculation of overlapping sets of protein or nucleotide sequences.

GET_HOMOLOGUES has been used mostly with bacterial data (see citing 

[papers](https://scholar.google.es/scholar?start=0&hl=en&as_sdt=2005&cites=5259912818944685430)).

 

Instead, GET_HOMOLOGUES-EST has been used mostly with plants (see citing

[papers](https://scholar.google.es/scholar?oi=bibs&hl=en&cites=14330917787074873427&as_sdt=5)) and 

was originally benchmarked with genomes and transcriptomes of 

[*Arabidopsis thaliana* and *Hordeum vulgare*](http://floresta.eead.csic.es/plant-pan-genomes) and the pan-genomes of 

[*Brachypodium distachyon*](https://brachypan.jgi.doe.gov) and 

[*Brachypodium hybridum*](http://floresta.eead.csic.es/plant-pan-genomes/Bhybridum) 

([press release](https://jgi.doe.gov/more-the-merrier-making-case-for-plant-pan-genomes)).

- [Installation](#installation)

- [Documentation](#documentation)

- [Citation](#citation)

- [Credits](#credits)

- [Graphical summary](#graphical-summary)

- [Related software](#related-software)

- [Bugs](#bugs)

- [Funding](#funding)

- [Badges](#badges)

## Installation

[![Build Status](https://github.com/eead-csic-compbio/get_homologues/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/eead-csic-compbio/get_homologues/actions/workflows/ci.yml)

[![Anaconda-Server Badge](https://anaconda.org/bioconda/get_homologues/badges/version.svg)](https://anaconda.org/bioconda/get_homologues)

[![DockerHub](https://badgen.net/badge/icon/docker?icon=docker&label)](https://hub.docker.com/r/csicunam/get_homologues)

Installation instructions, including the [bioconda package](https://anaconda.org/bioconda/get_homologues), are available in the

[manual](http://eead-csic-compbio.github.io/get_homologues/manual/manual.html#SECTION00030000000000000000)

and the [README.txt](./README.txt) file.

Check also the [Docker image](https://hub.docker.com/r/csicunam/get_homologues).

## Documentation

Manuals are available at:

|version|HTML|

|-------|----|

|original, for the analysis of bacterial pan-genomes|[manual](http://eead-csic-compbio.github.io/get_homologues/manual/)|

|EST, for the analysis of intra-species eukaryotic pan-genomes|[manual-est](http://eead-csic-compbio.github.io/get_homologues/manual-est/)|

In addition, there are two **tutorials** are available:

* [Pangenome analysis of plant transcripts and coding sequences](http://eead-csic-compbio.github.io/get_homologues/plant_pangenome/protocol.html), published in 2022.

* [From genomes to pangenomes: understanding variation among individuals and species](http://eead-csic-compbio.github.io/get_homologues/tutorial/pangenome_tutorial.html), which includes step by step instructions for both bacterial and plant data, first released in 2017.

There's also a 30min [video](https://www.youtube.com/watch?v=JcxQ97TzzzQ) covering GET_HOMOLOGUES, GET_HOMOLOGUES-EST 

and the related software [GET_PANGENES](https://github.com/Ensembl/plant-scripts/tree/master/pangenes).

## Citation

The original GET_HOMOLOGUES, **suitable for bacterial genomes**, was described in:

[Contreras-Moreira B, Vinuesa P (2013) Appl. Environ. Microbiol. 79:7696-7701](http://aem.asm.org/content/79/24/7696.long)

[Vinuesa P, Contreras-Moreira B (2015) Methods in Molecular Biology Volume 1231, 203-232](http://link.springer.com/protocol/10.1007%2F978-1-4939-1720-4_14)

GET_HOMOLOGUES-EST, adapted to the study of **intra-specific eukaryotic pan-genomes and pan-transcriptomes**, was described in:

[Contreras-Moreira B, Cantalapiedra CP et al (2017) Front. Plant Sci. 10.3389/fpls.2017.00184](http://journal.frontiersin.org/article/10.3389/fpls.2017.00184/full)

[Contreras-Moreira B, Rodriguez del Rio A et al (2022) Methods in Molecular Biology https://doi.org/10.1007/978-1-0716-2429-6_9](https://link.springer.com/protocol/10.1007/978-1-0716-2429-6_9)

## Credits 

GET_HOMOLOGUES is designed, created and maintained at the [Computational and 

Structural Biology](https://www.eead.csic.es/compbio) group at Estación Experimental de Aula Dei, 

Consejo Superior de Investigaciones Científicas (EEAD-CSIC) and at the 

[Center for Genomic Sciences](http://www.ccg.unam.mx/~vinuesa) 

of Universidad Nacional Autónoma de México (CCG/UNAM).

The program was written mostly by Bruno Contreras-Moreira and Pablo Vinuesa,

with contributions from Carlos P Cantalapiedra, Alvaro Rodríguez del Rio, Rubén

Sancho, Roland Wilhelm, David A Wilkinson and many others (see [CHANGES.txt](./CHANGES.txt)). 

It also includes code and binaries from other authors:

* [OrthoMCL v1.4](www.orthomcl.org), PubMed:[12952885](https://pubmed.ncbi.nlm.nih.gov/12952885))

* [mcl v14-137](http://micans.org/mcl), PubMed=[11917018](https://pubmed.ncbi.nlm.nih.gov/11917018))

* [COGtriangles v2.1](https://sourceforge.net/projects/cogtriangles), PubMed=[20439257](https://pubmed.ncbi.nlm.nih.gov/20439257))

* [NCBI Blast-2.17.0+](https://blast.ncbi.nlm.nih.gov), PubMed=[9254694](https://pubmed.ncbi.nlm.nih.gov/9254694),[20003500](https://pubmed.ncbi.nlm.nih.gov/20003500)

* [BioPerl v1.5.2](https://bioperl.org), PubMed=[12368254](https://pubmed.ncbi.nlm.nih.gov/12368254))

* [HMMER 3.1b2](http://hmmer.org)

* [Pfam](http://pfam.xfam.org), PubMed=[19920124](https://pubmed.ncbi.nlm.nih.gov/19920124))

* [PHYLIP 3.695](http://evolution.genetics.washington.edu/phylip) 

* [Transdecoder r20140704](http://transdecoder.github.io), PubMed=[23845962](https://pubmed.ncbi.nlm.nih.gov/23845962))

* [MVIEW 1.60.1](https://github.com/desmid/mview), PubMed=[9632837](https://pubmed.ncbi.nlm.nih.gov/9632837))

* [diamond 0.8.25](https://github.com/bbuchfink/diamond), PubMed=[25402007](https://pubmed.ncbi.nlm.nih.gov/25402007))

## Graphical summary

![**Legend.** Main features of GET_HOMOLOGUES.](./pics/summary.jpg)

![**Legend.** Flowchart and features of GET_HOMOLOGUES-EST.](./pics/EST.jpg)

## Related software

[GET_PHYLOMARKERS](https://github.com/vinuesa/get_phylomarkers) uses 

twin nucleotide & peptide clusters produced by GET_HOMOLOGUES to compute robust multi-gene and pangenome phylogenies.

Check the [manual](https://vinuesa.github.io/get_phylomarkers), 

the [tutorial](https://link.springer.com/protocol/10.1007/978-1-0716-2429-6_9),

and the [Docker image](https://hub.docker.com/r/csicunam/get_homologues).

A related piece of software was released in 2023 called [GET_PANGENES](https://github.com/Ensembl/plant-scripts/tree/master/pangenes),

which takes FASTA and GFF files as input and explicitely considers gene collinearity by computing whole genome alignments.

## Bugs

The code is regularly patched (see [CHANGES.txt](./CHANGES.txt)) in each release. 

We kindly ask you to report errors or bugs as 

[GitHub issues](https://github.com/eead-csic-compbio/get_homologues/issues)

and to acknowledge the use of the software in scientific publications.

## Funding

Fundación ARAID, Consejo Superior de Investigaciones Científicas, DGAPA-PAPIIT UNAM, CONACyT, FEDER, MINECO, DGA-Obra Social La Caixa.

![logo CSIC](pics/logoCSIC.png) ![logo ARAID](pics/logoARAID.gif) ![logo UNAM](pics/logoUNAM.png)

## Badges

[GET_HOMOLOGUES](https://bio.tools/get_homologues) is part of the [INB/ELIXIR-ES](https://inb-elixir.es) resources portfolio:

![logo_ELIXIRES](pics/logoELIXIRES.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/eead-csic-compbio/get_homologues

Awesome Lists containing this project

README