{"id":18189938,"url":"https://github.com/kullrich/oggmap","last_synced_at":"2026-03-01T03:34:02.325Z","repository":{"id":204459278,"uuid":"711897263","full_name":"kullrich/oggmap","owner":"kullrich","description":"oggmap is a python package to extract orthologous maps (short: orthomap or in other words the evolutionary age of a given orthologous group) from OrthoFinder/eggNOG results. Oggmap results (gene ages per orthologous group) can be further used to calculate weigthed expression data (transcriptome evolutionary index) from scRNA sequencing objects. ","archived":false,"fork":false,"pushed_at":"2025-04-02T11:19:34.000Z","size":13263,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-11-08T00:11:52.306Z","etag":null,"topics":["evolution","evomed","phylogeny","phylotranscriptomics","single-cell-analysis","transcriptomics"],"latest_commit_sha":null,"homepage":"https://oggmap.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kullrich.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-10-30T11:53:31.000Z","updated_at":"2025-04-02T11:19:38.000Z","dependencies_parsed_at":"2023-11-01T15:28:55.380Z","dependency_job_id":"80fcd483-d049-4e42-8db3-a57a7eaa11ba","html_url":"https://github.com/kullrich/oggmap","commit_stats":null,"previous_names":["kullrich/oggmap"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/kullrich/oggmap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kullrich%2Foggmap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kullrich%2Foggmap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kullrich%2Foggmap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kullrich%2Foggmap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kullrich","download_url":"https://codeload.github.com/kullrich/oggmap/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kullrich%2Foggmap/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29959390,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T01:47:18.291Z","status":"online","status_checked_at":"2026-03-01T02:00:07.437Z","response_time":124,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["evolution","evomed","phylogeny","phylotranscriptomics","single-cell-analysis","transcriptomics"],"created_at":"2024-11-03T04:04:37.468Z","updated_at":"2026-03-01T03:34:02.298Z","avatar_url":"https://github.com/kullrich.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# oggmap\n\n[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/kullrich/oggmap/build_check.yml?branch=main)](https://github.com/kullrich/oggmap/actions/workflows/build_check.yml)\n[![PyPI](https://img.shields.io/pypi/v/oggmap?color=blue)](https://pypi.org/project/oggmap/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/oggmap)](https://pypi.org/project/oggmap/)\n[![PyPI - Wheel](https://img.shields.io/pypi/wheel/oggmap)](https://pypi.org/project/oggmap/)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n[![docs-badge](https://readthedocs.org/projects/oggmap/badge/?version=latest)](https://oggmap.readthedocs.io/en/latest/?badge=latest)\n[![DOI](https://img.shields.io/badge/DOI-10.1093/bioinformatics/btad657-blue)](https://doi.org/10.1093/bioinformatics/btad657)\n\n## orthologous maps - evolutionary age index\n\n[`oggmap`](https://github.com/kullrich/oggmap) is a python package to extract orthologous maps\n(short: `orthomap` or in other words the evolutionary age of a given orthologous group) from OrthoFinder or eggNOG results.\nOggmap results (gene ages per orthologous group) can be further used to calculate and visualize weighted expression data\n(transcriptome evolutionary index) from scRNA sequencing objects.\n\n![oggmap steps](docs/tutorials/img/oggmap_steps.png)\n![zebrafish example](docs/tutorials/img/zebrafish_tei.png)\n![nematode example](docs/tutorials/img/nematode_pi.png)\n\n## Documentation\n\nOnline documentation can be found [here](https://oggmap.readthedocs.io/en/latest/).\n\nWhen using `oggmap` in published research, please cite:\n\n- Ullrich KK, Glytnasi NE, \"oggmap: a Python package to extract gene ages per orthogroup and link them with single-cell RNA data\", Bioinformatics, 2023, 39(11). [https://doi.org/10.1093/bioinformatics/btad657](https://doi.org/10.1093/bioinformatics/btad657)\n\n## Installing `oggmap`\n\nMore installation options can be found [here](https://oggmap.readthedocs.io/en/latest/installation/index.html).\n\n### oggmap installation using conda and pip\n\nWe recommend installing `oggmap` in an independent conda environment to avoid dependent software conflicts.\nPlease make a new python environment for `oggmap` and install dependent libraries in it.\n\nIf you do not have a working installation of Python 3.10 (or later),\nconsider installing [Anaconda](https://docs.anaconda.com/anaconda/install/) or\n[Miniconda](https://docs.conda.io/en/latest/miniconda.html).\n\nTo create and activate the environment run:\n\n```shell\n$ git clone https://github.com/kullrich/oggmap.git\n$ cd oggmap\n$ conda env create --file environment.yml\n$ conda activate oggmap_env\n```\n\nThen to install `oggmap` via PyPI:\n\n```shell\n$ pip install oggmap\n```\n\n## Quick usage\n\nDetailed tutorials how to use `oggmap` can be found [here](https://oggmap.readthedocs.io/en/latest/tutorials/index.html).\n\n### Update/download local ncbi taxonomic database:\n\nThe following command downloads or updates your local copy of the\nNCBI's taxonomy database (~150MB). The database is saved at `-dbname`\nset to default `taxadb.sqlite`.\n\n```shell\n$ oggmap ncbitax -u -outdir taxadb -type taxa -dbname taxadb.sqlite\n$ rm -rf taxadb\n```\n\n```python\n\u003e\u003e\u003e from oggmap import ncbitax\n\u003e\u003e\u003e update_parser = ncbitax.define_parser()\n\u003e\u003e\u003e update_args = update_parser.parse_args()\n\u003e\u003e\u003e update_args.outdir = 'taxadb'\n\u003e\u003e\u003e update_args.dbname = 'taxadb.sqlite'\n\u003e\u003e\u003e ncbitax.update_ncbi(update_args)\n```\n\n### Step 1 - Get query species taxonomic lineage information:\n\nYou can query a species lineage information based on its name or its\ntaxID. For example `Danio rerio` with taxID `7955`:\n\n```shell\n$ oggmap qlin -q \"Danio rerio\" -dbname taxadb.sqlite\n$ oggmap qlin -qt 7955 -dbname taxadb.sqlite\n```\n\n```python\n\u003e\u003e\u003e from oggmap import qlin\n\u003e\u003e\u003e qlin.get_qlin(q='Danio rerio',\n...     dbname = 'taxadb.sqlite')\n\u003e\u003e\u003e qlin.get_qlin(qt='7955',\n...     dbname = 'taxadb.sqlite')\n```\n\nYou can get the query species topology as a tree.\nFor example for `Danio rerio` with taxID `7955`:\n\n```python\n\u003e\u003e\u003e from io import StringIO\n\u003e\u003e\u003e from Bio import Phylo\n\u003e\u003e\u003e from oggmap import qlin\n\u003e\u003e\u003e query_topology = qlin.get_lineage_topo(qt='7955',\n...     dbname='taxadb.sqlite')\n\u003e\u003e\u003e output = StringIO()\n\u003e\u003e\u003e Phylo.write(query_topology, output, \"newick\")\n\u003e\u003e\u003e output.getvalue().strip()\n```\n\n### Step 2 - Get query species orthomap from OrthoFinder results:\n\nThe following code extracts the `orthomap` for `Danio rerio` based on pre-calculated \nOrthoFinder results and ensembl release-113:\n\nOrthoFinder results (-S diamond_ultra_sens) using translated, longest-isoform coding sequences\nfrom ensembl release-113 have been archived and can be found\n[here](https://zenodo.org/record/7242264#.Y1p19i0Rowc).\n\n```shell\n# download OrthoFinder example:\n$ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip\n$ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_Orthogroups.tsv.zip\n$ wget https://zenodo.org/records/14680521/files/ensembl_113_orthofinder_last_species_list.tsv    \n\n# extract orthomap:\n$ oggmap of2orthomap -seqname 7955.danio_rerio.pep -qt 7955 \\\\\n  -sl ensembl_113_orthofinder_last_species_list.tsv \\\\\n  -oc ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip \\\\\n  -og ensembl_113_orthofinder_last_Orthogroups.tsv.zip \\\\\n  -dbname taxadb.sqlite\n```\n\n```python\n\u003e\u003e\u003e from oggmap import datasets, of2orthomap, qlin\n\u003e\u003e\u003e datasets.ensembl113_last(datapath='.')\n\u003e\u003e\u003e query_orthomap, orthofinder_species_list, of_species_abundance = of2orthomap.get_orthomap(\n...     seqname='7955.danio_rerio.pep',\n...     qt='7955',\n...     sl='ensembl_113_orthofinder_last_species_list.tsv',\n...     oc='ensembl_113_orthofinder_last_Orthogroups.GeneCount.tsv.zip',\n...     og='ensembl_113_orthofinder_last_Orthogroups.tsv.zip',\n...     out=None,\n...     quiet=False,\n...     continuity=True,\n...     overwrite=True,\n...     dbname='taxadb.sqlite')\n\u003e\u003e\u003e query_orthomap\n```\n\n### Step 3 - Map OrthoFinder gene names and scRNA gene/transcript names:\n\nThe following code extracts the gene to transcript table for `Danio rerio`:\n\nGTF file obtained from [here](https://ftp.ensembl.org/pub/release-105/gtf/danio_rerio/Danio_rerio.GRCz11.105.gtf.gz).\n\n```shell\n# to get GTF from Mus musculus on Linux run:\n$ wget https://ftp.ensembl.org/pub/release-113/gtf/mus_musculus/Mus_musculus.GRCm39.113.chr.gtf.gz\n# on Mac:\n$ curl https://ftp.ensembl.org/pub/release-113/gtf/mus_musculus/Mus_musculus.GRCm39.113.chr.gtf.gz --remote-name\n\n# create t2g from GTF:\n$ oggmap gtf2t2g -i Mus_musculus.GRCm39.113.chr.gtf.gz \\\\\n  -o Mus_musculus.GRCm39.113.chr.gtf.t2g.tsv \\\\\n  -g -b -p -v -s\n```\n\n```python\n\u003e\u003e\u003e from oggmap import datasets, gtf2t2g\n\u003e\u003e\u003e gtf_file = datasets.zebrafish_ensembl113_gtf(datapath='.')\n\u003e\u003e\u003e query_species_t2g = gtf2t2g.parse_gtf(\n...     gtf=gtf_file,\n...     g=True, b=True, p=True, v=True, s=True, q=True)\n\u003e\u003e\u003e query_species_t2g\n```\n\nImport now, the scRNA dataset of the query species.\n\nexample: **Danio rerio** - [http://tome.gs.washington.edu](http://tome.gs.washington.edu)\n([Qui et al. 2022](https://www.nature.com/articles/s41588-022-01018-x))\n\n`AnnData` file can be found [here](https://doi.org/10.5281/zenodo.7243602).\n\n```python\n\u003e\u003e\u003e import scanpy as sc\n\u003e\u003e\u003e from oggmap import datasets, orthomap2tei\n\u003e\u003e\u003e # download zebrafish scRNA data here: https://doi.org/10.5281/zenodo.7243602\n\u003e\u003e\u003e # or download with datasets.qiu22_zebrafish(datapath='.')\n\u003e\u003e\u003e zebrafish_data = datasets.qiu22_zebrafish(datapath='.')\n\u003e\u003e\u003e zebrafish_data\n\u003e\u003e\u003e # check overlap of transcript table \u003cgene_id\u003e and scRNA data \u003cvar_names\u003e\n\u003e\u003e\u003e orthomap2tei.geneset_overlap(zebrafish_data.var_names, query_species_t2g['gene_id'])\n```\n\nThe `replace_by` helper function can be used to add a new column to the `orthomap` dataframe by matching e.g.\ngene isoform names and their corresponding gene names.\n\n```python\n\u003e\u003e\u003e # convert orthomap transcript IDs into GeneIDs and add them to orthomap\n\u003e\u003e\u003e query_orthomap['geneID'] = orthomap2tei.replace_by(\n...    x_orig = query_orthomap['seqID'],\n...    xmatch = query_species_t2g['transcript_id_version'],\n...    xreplace = query_species_t2g['gene_id'])\n\u003e\u003e\u003e # check overlap of orthomap \u003cgeneID\u003e and scRNA data\n\u003e\u003e\u003e orthomap2tei.geneset_overlap(zebrafish_data.var_names, query_orthomap['geneID'])\n```\n\n### Step 4 - Get transcriptome evolutionary index (TEI) values and add them to scRNA dataset:\n\nSince now the gene names correspond to each other in the `orthomap` and the scRNA adata object,\none can calculate the transcriptome evolutionary index (TEI) and add them to the scRNA dataset (adata object).\n\n```python\n\u003e\u003e\u003e # add TEI values to existing adata object\n\u003e\u003e\u003e orthomap2tei.get_tei(adata = zebrafish_data,\n...    gene_id = query_orthomap['geneID'],\n...    gene_age = query_orthomap['PSnum'],\n...    keep = 'min',\n...    layer = None,\n...    add = True,\n...    obs_name = 'tei',\n...    boot = False,\n...    bt = 10,\n...    normalize_total = False,\n...    log1p = False,\n...    target_sum = 1e6)\n```\n\n### Step 5 - Downstream analysis\n\nOnce the gene age data has been added to the scRNA dataset,\none can e.g. plot the corresponding transcriptome evolutionary index (TEI) values\nby any given observation pre-defined in the scRNA dataset.\n\n#### Boxplot TEI per stage:\n\n```python\n\u003e\u003e\u003esc.pl.violin(adata = zebrafish_data,\n...     keys = ['tei'],\n...     groupby = 'stage',\n...     rotation = 90,\n...     palette = 'Paired',\n...     stripplot = False,\n...     inner = 'box')\n```\n\n## oggmap via Command Line\n\n`oggmap` can also be used via the command line.\n\nCommand line documentation can be found [here](https://oggmap.readthedocs.io/en/latest/modules/oggmap.html).\n\n```shell\n$ oggmap -h\n```\n\n```\nusage: oggmap \u003csub-command\u003e\n\noggmap\n\noptions:\n  -h, --help            show this help message and exit\n\nsub-commands:\n  {cds2aa,gtf2t2g,ncbitax,of2orthomap,orthomcl2orthomap,plaza2orthomap,qlin}\n                        sub-commands help\n    cds2aa              translate CDS to AA and optional retain longest\n                        isoform \u003ccds2aa -h\u003e\n    gtf2t2g             extract transcript to gene table from GTF\n                        \u003cgtf2t2g -h\u003e\n    ncbitax             update local ncbi taxonomy database \u003cncbitax -h\u003e\n    of2orthomap         extract orthomap from OrthoFinder output for\n                        query species \u003cof2orthomap -h\u003e\n    orthomcl2orthomap   extract orthomap from orthomcl output for\n                        query species \u003corthomcl2orthomap -h\u003e\n    plaza2orthomap      extract orthomap from PLAZA gene family data\n                        for query species \u003cof2orthomap -h\u003e\n    qlin                get query lineage based on ncbi taxonomy \u003cqlin -h\u003e\n```\n\nTo retrieve e.g. the lineage information for `Danio rerio` run the following command:\n\n```shell\n$ oggmap qlin -q \"Danio rerio\" -dbname taxadb.sqlite\n```\n\n## Development Version\n\nTo work with the latest version [on GitHub](https://github.com/kullrich/oggmap):\nclone the repository and `cd` into its root directory.\n\n```shell\n$ git clone kullrich/oggmap\n$ cd oggmap\n```\n\nInstall `oggmap` into your current python environment:\n\n```shell\n$ pip install -e .\n```\n\n## Testing `oggmap`\n\n`oggmap` has an extensive test suite which is run each time a new contribution\nis made to the repository. To run the test suite locally run:\n\n```shell\n$ pytest tests\n```\n\n## Contributing Code\n\nIf you would like to contribute to `oggmap`, please file an issue so that one can establish a statement of need, avoid redundant work, and track progress on your contribution.\n\nBefore you do a pull request, you should always file an issue and make sure that someone from the `oggmap` developer team agrees that it's a problem, and is happy with your basic proposal for fixing it.\n\nOnce an issue has been filed and we've identified how to best orient your\ncontribution with package development as a whole,\n[fork](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo)\nthe [main repo](https://github.com/kullrich/oggmap/oggmap.git), branch off a\n[feature\nbranch](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-branches)\nfrom `main`,\n[commit](https://docs.github.com/en/desktop/contributing-and-collaborating-using-github-desktop/committing-and-reviewing-changes-to-your-project)\nand\n[push](https://docs.github.com/en/github/using-git/pushing-commits-to-a-remote-repository)\nyour changes to your fork and submit a [pull\nrequest](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/proposing-changes-to-your-work-with-pull-requests)\nfor `oggmap:main`.\n\nBy contributing to this project, you agree to abide by the Code of Conduct terms.\n\n## Bug reports\n\nPlease post troubles or questions on the GitHub repository [issue tracker](https://github.com/kullrich/oggmap/issues).\nAlso, please look at the closed issue pages. This might give an answer to your question.\n\n## Inquiry for collaboration or discussion\n\nPlease send e-mail to us if you want a discussion with us.\n\nPrincipal code developer: Kristian Ullrich\n\nE-mail address can be found [here](https://www.evolbio.mpg.de).\n\n## Code of Conduct - Participation guidelines\n\nThis repository adheres to the [Contributor Covenant](http://contributor-covenant.org) code of conduct for in any interactions you have within this project. (see [Code of Conduct](https://github.com/kullrich/oggmap/-/blob/master/CODE_OF_CONDUCT.md))\n\nSee also the policy against sexualized discrimination, harassment and violence for the Max Planck Society [Code-of-Conduct](https://www.mpg.de/11961177/code-of-conduct-en.pdf).\n\nBy contributing to this project, you agree to abide by its terms.\n\n## References\n\nsee references [here](https://oggmap.readthedocs.io/en/latest/references/index.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkullrich%2Foggmap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkullrich%2Foggmap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkullrich%2Foggmap/lists"}