https://github.com/eead-csic-compbio/get_homologues
GET_HOMOLOGUES: a versatile software package for pan-genome analysis
https://github.com/eead-csic-compbio/get_homologues
annotation bacteria clustering fasta genbank pangene pangenome plants transcriptome
Last synced: 4 months ago
JSON representation
GET_HOMOLOGUES: a versatile software package for pan-genome analysis
- Host: GitHub
- URL: https://github.com/eead-csic-compbio/get_homologues
- Owner: eead-csic-compbio
- License: other
- Created: 2015-11-26T14:42:05.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2026-01-26T15:58:43.000Z (4 months ago)
- Last Synced: 2026-01-27T04:21:31.988Z (4 months ago)
- Topics: annotation, bacteria, clustering, fasta, genbank, pangene, pangenome, plants, transcriptome
- Language: Perl
- Homepage:
- Size: 80.4 MB
- Stars: 123
- Watchers: 15
- Forks: 26
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.txt
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
## GET_HOMOLOGUES
A versatile software package for pan-genome analysis, including both GET_HOMOLOGUES and GET_HOMOLOGUES-EST. It includes algorithms designed for:
* Clustering coding sequences in homologous (possibly orthologous) groups, on the grounds of sequence similarity. By default GET_HOMOLOGUES compares protein sequences, while GET_HOMOLOGUES-EST aligns nucleotide sequences (CDS or transcripts).
* Definition of pan- and core-genomes by calculation of overlapping sets of protein or nucleotide sequences.
GET_HOMOLOGUES has been used mostly with bacterial data (see citing
[papers](https://scholar.google.es/scholar?start=0&hl=en&as_sdt=2005&cites=5259912818944685430)).
Instead, GET_HOMOLOGUES-EST has been used mostly with plants (see citing
[papers](https://scholar.google.es/scholar?oi=bibs&hl=en&cites=14330917787074873427&as_sdt=5)) and
was originally benchmarked with genomes and transcriptomes of
[*Arabidopsis thaliana* and *Hordeum vulgare*](http://floresta.eead.csic.es/plant-pan-genomes) and the pan-genomes of
[*Brachypodium distachyon*](https://brachypan.jgi.doe.gov) and
[*Brachypodium hybridum*](http://floresta.eead.csic.es/plant-pan-genomes/Bhybridum)
([press release](https://jgi.doe.gov/more-the-merrier-making-case-for-plant-pan-genomes)).
- [Installation](#installation)
- [Documentation](#documentation)
- [Citation](#citation)
- [Credits](#credits)
- [Graphical summary](#graphical-summary)
- [Related software](#related-software)
- [Bugs](#bugs)
- [Funding](#funding)
- [Badges](#badges)
## Installation
[](https://github.com/eead-csic-compbio/get_homologues/actions/workflows/ci.yml)
[](https://anaconda.org/bioconda/get_homologues)
[](https://hub.docker.com/r/csicunam/get_homologues)
Installation instructions, including the [bioconda package](https://anaconda.org/bioconda/get_homologues), are available in the
[manual](http://eead-csic-compbio.github.io/get_homologues/manual/manual.html#SECTION00030000000000000000)
and the [README.txt](./README.txt) file.
Check also the [Docker image](https://hub.docker.com/r/csicunam/get_homologues).
## Documentation
Manuals are available at:
|version|HTML|
|-------|----|
|original, for the analysis of bacterial pan-genomes|[manual](http://eead-csic-compbio.github.io/get_homologues/manual/)|
|EST, for the analysis of intra-species eukaryotic pan-genomes|[manual-est](http://eead-csic-compbio.github.io/get_homologues/manual-est/)|
In addition, there are two **tutorials** are available:
* [Pangenome analysis of plant transcripts and coding sequences](http://eead-csic-compbio.github.io/get_homologues/plant_pangenome/protocol.html), published in 2022.
* [From genomes to pangenomes: understanding variation among individuals and species](http://eead-csic-compbio.github.io/get_homologues/tutorial/pangenome_tutorial.html), which includes step by step instructions for both bacterial and plant data, first released in 2017.
There's also a 30min [video](https://www.youtube.com/watch?v=JcxQ97TzzzQ) covering GET_HOMOLOGUES, GET_HOMOLOGUES-EST
and the related software [GET_PANGENES](https://github.com/Ensembl/plant-scripts/tree/master/pangenes).
## Citation
The original GET_HOMOLOGUES, **suitable for bacterial genomes**, was described in:
[Contreras-Moreira B, Vinuesa P (2013) Appl. Environ. Microbiol. 79:7696-7701](http://aem.asm.org/content/79/24/7696.long)
[Vinuesa P, Contreras-Moreira B (2015) Methods in Molecular Biology Volume 1231, 203-232](http://link.springer.com/protocol/10.1007%2F978-1-4939-1720-4_14)
GET_HOMOLOGUES-EST, adapted to the study of **intra-specific eukaryotic pan-genomes and pan-transcriptomes**, was described in:
[Contreras-Moreira B, Cantalapiedra CP et al (2017) Front. Plant Sci. 10.3389/fpls.2017.00184](http://journal.frontiersin.org/article/10.3389/fpls.2017.00184/full)
[Contreras-Moreira B, Rodriguez del Rio A et al (2022) Methods in Molecular Biology https://doi.org/10.1007/978-1-0716-2429-6_9](https://link.springer.com/protocol/10.1007/978-1-0716-2429-6_9)
## Credits
GET_HOMOLOGUES is designed, created and maintained at the [Computational and
Structural Biology](https://www.eead.csic.es/compbio) group at Estación Experimental de Aula Dei,
Consejo Superior de Investigaciones Científicas (EEAD-CSIC) and at the
[Center for Genomic Sciences](http://www.ccg.unam.mx/~vinuesa)
of Universidad Nacional Autónoma de México (CCG/UNAM).
The program was written mostly by Bruno Contreras-Moreira and Pablo Vinuesa,
with contributions from Carlos P Cantalapiedra, Alvaro Rodríguez del Rio, Rubén
Sancho, Roland Wilhelm, David A Wilkinson and many others (see [CHANGES.txt](./CHANGES.txt)).
It also includes code and binaries from other authors:
* [OrthoMCL v1.4](www.orthomcl.org), PubMed:[12952885](https://pubmed.ncbi.nlm.nih.gov/12952885))
* [mcl v14-137](http://micans.org/mcl), PubMed=[11917018](https://pubmed.ncbi.nlm.nih.gov/11917018))
* [COGtriangles v2.1](https://sourceforge.net/projects/cogtriangles), PubMed=[20439257](https://pubmed.ncbi.nlm.nih.gov/20439257))
* [NCBI Blast-2.17.0+](https://blast.ncbi.nlm.nih.gov), PubMed=[9254694](https://pubmed.ncbi.nlm.nih.gov/9254694),[20003500](https://pubmed.ncbi.nlm.nih.gov/20003500)
* [BioPerl v1.5.2](https://bioperl.org), PubMed=[12368254](https://pubmed.ncbi.nlm.nih.gov/12368254))
* [HMMER 3.1b2](http://hmmer.org)
* [Pfam](http://pfam.xfam.org), PubMed=[19920124](https://pubmed.ncbi.nlm.nih.gov/19920124))
* [PHYLIP 3.695](http://evolution.genetics.washington.edu/phylip)
* [Transdecoder r20140704](http://transdecoder.github.io), PubMed=[23845962](https://pubmed.ncbi.nlm.nih.gov/23845962))
* [MVIEW 1.60.1](https://github.com/desmid/mview), PubMed=[9632837](https://pubmed.ncbi.nlm.nih.gov/9632837))
* [diamond 0.8.25](https://github.com/bbuchfink/diamond), PubMed=[25402007](https://pubmed.ncbi.nlm.nih.gov/25402007))
## Graphical summary


## Related software
[GET_PHYLOMARKERS](https://github.com/vinuesa/get_phylomarkers) uses
twin nucleotide & peptide clusters produced by GET_HOMOLOGUES to compute robust multi-gene and pangenome phylogenies.
Check the [manual](https://vinuesa.github.io/get_phylomarkers),
the [tutorial](https://link.springer.com/protocol/10.1007/978-1-0716-2429-6_9),
and the [Docker image](https://hub.docker.com/r/csicunam/get_homologues).
A related piece of software was released in 2023 called [GET_PANGENES](https://github.com/Ensembl/plant-scripts/tree/master/pangenes),
which takes FASTA and GFF files as input and explicitely considers gene collinearity by computing whole genome alignments.
## Bugs
The code is regularly patched (see [CHANGES.txt](./CHANGES.txt)) in each release.
We kindly ask you to report errors or bugs as
[GitHub issues](https://github.com/eead-csic-compbio/get_homologues/issues)
and to acknowledge the use of the software in scientific publications.
## Funding
Fundación ARAID, Consejo Superior de Investigaciones Científicas, DGAPA-PAPIIT UNAM, CONACyT, FEDER, MINECO, DGA-Obra Social La Caixa.
  
## Badges
[GET_HOMOLOGUES](https://bio.tools/get_homologues) is part of the [INB/ELIXIR-ES](https://inb-elixir.es) resources portfolio:
