{"id":13513256,"url":"https://github.com/pachterlab/gget","last_synced_at":"2026-02-27T04:06:02.598Z","repository":{"id":37527192,"uuid":"488684164","full_name":"pachterlab/gget","owner":"pachterlab","description":"🧬 gget enables efficient querying of genomic reference databases","archived":false,"fork":false,"pushed_at":"2026-02-25T22:53:33.000Z","size":313977,"stargazers_count":1100,"open_issues_count":22,"forks_count":85,"subscribers_count":8,"default_branch":"main","last_synced_at":"2026-02-26T01:17:15.076Z","etag":null,"topics":["alphafold","alphafold2","archs4","blast","databases","enrichment-analysis","enrichr","ensembl","genomics","gget","ncbi","proteomics","reference","rna-seq","transcriptomics","uniprot"],"latest_commit_sha":null,"homepage":"https://gget.bio","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pachterlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["lauraluebbert"]}},"created_at":"2022-05-04T17:31:31.000Z","updated_at":"2026-02-25T22:53:14.000Z","dependencies_parsed_at":"2023-09-23T13:57:50.889Z","dependency_job_id":"b2903abe-0e0c-44ba-8cbb-3e6970fec26a","html_url":"https://github.com/pachterlab/gget","commit_stats":{"total_commits":2633,"total_committers":18,"mean_commits":"146.27777777777777","dds":"0.41929358146600837","last_synced_commit":"c4ed5e3b8eca1a9c7881f9fe45c43645a82c078c"},"previous_names":[],"tags_count":49,"template":false,"template_full_name":null,"purl":"pkg:github/pachterlab/gget","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pachterlab%2Fgget","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pachterlab%2Fgget/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pachterlab%2Fgget/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pachterlab%2Fgget/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pachterlab","download_url":"https://codeload.github.com/pachterlab/gget/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pachterlab%2Fgget/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29884515,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-26T23:51:21.483Z","status":"online","status_checked_at":"2026-02-27T02:00:06.759Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alphafold","alphafold2","archs4","blast","databases","enrichment-analysis","enrichr","ensembl","genomics","gget","ncbi","proteomics","reference","rna-seq","transcriptomics","uniprot"],"created_at":"2024-08-01T04:00:48.191Z","updated_at":"2026-02-27T04:06:02.559Z","avatar_url":"https://github.com/pachterlab.png","language":"Python","funding_links":["https://github.com/sponsors/lauraluebbert"],"categories":["Uncategorized","Python","Software packages"],"sub_categories":["Uncategorized","Other applications"],"readme":"# gget\n[![pypi version](https://img.shields.io/pypi/v/gget)](https://pypi.org/project/gget)\n[![Downloads](https://static.pepy.tech/personalized-badge/gget?period=total\u0026units=international_system\u0026left_color=grey\u0026right_color=brightgreen\u0026left_text=downloads)](https://pepy.tech/project/gget)\n[![Conda](https://img.shields.io/conda/dn/bioconda/gget?logo=Anaconda)](https://anaconda.org/bioconda/gget)\n[![license](https://img.shields.io/pypi/l/gget)](LICENSE)\n[![status](https://github.com/pachterlab/gget/actions/workflows/ci.yml/badge.svg)](https://github.com/pachterlab/gget/blob/main/tests/pytest_results_py3.12.txt)\n[![status](https://github.com/lauraluebbert/test_gget_alphafold/actions/workflows/CI_alphafold.yml/badge.svg)](https://github.com/lauraluebbert/test_gget_alphafold)\n![Code Coverage](https://img.shields.io/badge/Coverage-83%25-green.svg)  \n\n`gget` is a free, open-source command-line tool and Python package that enables efficient querying of genomic databases. `gget`  consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.  \n  \n![alt text](https://github.com/pachterlab/gget/blob/main/figures/gget_overview.png?raw=true)\n    \nIf you use `gget` in a publication, please [cite*](https://pachterlab.github.io/gget/en/cite.html):    \n```\nLuebbert, L., \u0026 Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836\n```\nRead the article here: https://doi.org/10.1093/bioinformatics/btac836  \n\n# Installation\n```bash\nuv pip install gget\n```\nor\n```bash\npip install --upgrade gget\n```\n\nInstall from source:\n```bash\ngit clone https://github.com/pachterlab/gget.git\ncd gget\nuv pip install .\n```\n\nFor use in Jupyter Lab / Google Colab:\n```python\n# Python\nimport gget\n```\n# [🔗 Manual](https://pachterlab.github.io/gget) \n\n# 🪄 Quick start guide\nCommand line:\n```bash\n# Fetch all Homo sapiens reference and annotation FTPs from the latest Ensembl release\n$ gget ref homo_sapiens\n\n# Get Ensembl IDs of human genes with \"ace2\" or \"angiotensin converting enzyme 2\" in their name/description\n$ gget search -s homo_sapiens 'ace2' 'angiotensin converting enzyme 2'\n\n# Look up gene ENSG00000130234 (ACE2) and its transcript ENST00000252519\n$ gget info ENSG00000130234 ENST00000252519\n\n# Fetch the amino acid sequence of the canonical transcript of gene ENSG00000130234\n$ gget seq --translate ENSG00000130234\n\n# Quickly find the genomic location of (the start of) that amino acid sequence\n$ gget blat MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\n\n# BLAST (the start of) that amino acid sequence\n$ gget blast MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\n\n# Align multiple nucleotide or amino acid sequences against each other (also accepts path to FASTA file)  \n$ gget muscle MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS\n\n# Align one or more amino acid sequences against a reference (containing one or more sequences) (local BLAST) (also accepts paths to FASTA files)  \n$ gget diamond MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS -ref MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS  \n\n# Use Enrichr for an ontology analysis of a list of genes\n$ gget enrichr -db ontology ACE2 AGT AGTR1 ACE AGTRAP AGTR2 ACE3P\n\n# Get the human tissue expression of gene ACE2\n$ gget archs4 -w tissue ACE2\n\n# Get the protein structure (in PDB format) of ACE2 as stored in the Protein Data Bank (PDB ID returned by gget info)\n$ gget pdb 1R42 -o 1R42.pdb\n\n# Download virus genome datasets from NCBI Virus (e.g., Zika virus sequences)\n$ gget virus \"Zika virus\" --host \"Homo sapiens\" --nuc_completeness complete\n\n# Find Eukaryotic Linear Motifs (ELMs) in a protein sequence\n$ gget setup elm # setup only needs to be run once\n$ gget elm -o results MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\n\n# Fetch a scRNAseq count matrix (AnnData format) based on specified gene(s), tissue(s), and cell type(s) (default species: human)\n$ gget setup cellxgene # setup only needs to be run once\n$ gget cellxgene --gene ACE2 SLC5A1 --tissue lung --cell_type 'mucus secreting cell' -o example_adata.h5ad\n\n# Predict the protein structure of GFP from its amino acid sequence\n$ gget setup alphafold # setup only needs to be run once\n$ gget alphafold MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK\n```\nPython (Jupyter Lab / Google Colab):\n```python  \nimport gget\ngget.ref(\"homo_sapiens\")\ngget.search([\"ace2\", \"angiotensin converting enzyme 2\"], \"homo_sapiens\")\ngget.info([\"ENSG00000130234\", \"ENST00000252519\"])\ngget.seq(\"ENSG00000130234\", translate=True)\ngget.blat(\"MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\")\ngget.blast(\"MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\")\ngget.muscle([\"MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\", \"MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS\"])\ngget.diamond(\"MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\", reference=\"MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS\")\ngget.enrichr([\"ACE2\", \"AGT\", \"AGTR1\", \"ACE\", \"AGTRAP\", \"AGTR2\", \"ACE3P\"], database=\"ontology\", plot=True)\ngget.archs4(\"ACE2\", which=\"tissue\")\ngget.pdb(\"1R42\", save=True)\ngget.virus(\"Zika virus\", host=\"Homo sapiens\", nuc_completeness=\"complete\")\n\ngget.setup(\"elm\") # setup only needs to be run once\northo_df, regex_df = gget.elm(\"MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\")\n\ngget.setup(\"cellxgene\") # setup only needs to be run once\ngget.cellxgene(gene = [\"ACE2\", \"SLC5A1\"], tissue = \"lung\", cell_type = \"mucus secreting cell\")\n\ngget.setup(\"alphafold\") # setup only needs to be run once\ngget.alphafold(\"MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK\")\n```\nCall `gget` from R using [reticulate](https://rstudio.github.io/reticulate/):\n```r\nsystem(\"pip install gget\")\ninstall.packages(\"reticulate\")\nlibrary(reticulate)\ngget \u003c- import(\"gget\")\n\ngget$ref(\"homo_sapiens\")\ngget$search(list(\"ace2\", \"angiotensin converting enzyme 2\"), \"homo_sapiens\")\ngget$info(list(\"ENSG00000130234\", \"ENST00000252519\"))\ngget$seq(\"ENSG00000130234\", translate=TRUE)\ngget$blat(\"MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\")\ngget$blast(\"MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\")\ngget$muscle(list(\"MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\", \"MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS\"), out=\"out.afa\")\ngget$diamond(\"MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS\", reference=\"MSSSSWLLLSLVEVTAAQSTIEQQAKTFLDKFHEAEDLFYQSLLAS\")\ngget$enrichr(list(\"ACE2\", \"AGT\", \"AGTR1\", \"ACE\", \"AGTRAP\", \"AGTR2\", \"ACE3P\"), database=\"ontology\")\ngget$archs4(\"ACE2\", which=\"tissue\")\ngget$pdb(\"1R42\", save=TRUE)\ngget$virus(\"Zika virus\", host=\"Homo sapiens\", nuc_completeness=\"complete\")\n```\n#### [More tutorials](https://github.com/pachterlab/gget_examples)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpachterlab%2Fgget","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpachterlab%2Fgget","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpachterlab%2Fgget/lists"}