{"id":13707123,"url":"https://github.com/clindet/bget","last_synced_at":"2026-03-10T20:34:52.348Z","repository":{"id":56683696,"uuid":"202149340","full_name":"clindet/bget","owner":"clindet","description":"Portable command-line tool to query bioinformatics APIs, data, databases and files.","archived":false,"fork":false,"pushed_at":"2023-02-23T00:01:25.000Z","size":6423,"stargazers_count":89,"open_issues_count":6,"forks_count":20,"subscribers_count":3,"default_branch":"master","last_synced_at":"2026-01-14T18:30:28.384Z","etag":null,"topics":["bioinformatics","database","spider"],"latest_commit_sha":null,"homepage":"https://github.com/openbiox/bget","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clindet.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":["https://github.com/openbiox/wiki/blob/master/static/img/QRcode.png"]}},"created_at":"2019-08-13T13:21:08.000Z","updated_at":"2026-01-10T12:16:42.000Z","dependencies_parsed_at":"2023-07-15T06:17:48.836Z","dependency_job_id":null,"html_url":"https://github.com/clindet/bget","commit_stats":null,"previous_names":["jhuanglab/bget","openanno/bget","openbiox/bget"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/clindet/bget","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clindet%2Fbget","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clindet%2Fbget/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clindet%2Fbget/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clindet%2Fbget/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clindet","download_url":"https://codeload.github.com/clindet/bget/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clindet%2Fbget/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30352882,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T15:55:29.454Z","status":"ssl_error","status_checked_at":"2026-03-10T15:54:58.440Z","response_time":106,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","database","spider"],"created_at":"2024-08-02T22:01:20.517Z","updated_at":"2026-03-10T20:34:52.183Z","avatar_url":"https://github.com/clindet.png","language":"Go","funding_links":["https://github.com/openbiox/wiki/blob/master/static/img/QRcode.png"],"categories":["Go","Data Manipulation and Querying"],"sub_categories":[],"readme":"\u003cimg src=\"https://img.shields.io/badge/lifecycle-experimental-orange.svg\" alt=\"Life cycle: experimental\"\u003e \u003ca href=\"https://godoc.org/github.com/clindet/bget\"\u003e\u003cimg src=\"https://godoc.org/github.com/clindet/bget?status.svg\" alt=\"GoDoc\"\u003e\u003c/a\u003e\n\n# bget\n\nbget is an portable tool with several sub-commands to query bioinformatics APIs, data, databases and files. The Golang `http` library, `wget`, `curl`, `axel`, `git`, and `rsync` were supported as the query engine.\n\nSupported types:\n\n- Reference genomes\n- Source code of bioinformatics tools\n- Bioinformatics databases and files\n- Papers material\n- ......\n\nDownstream tool:\n\n- [bioctl](https://github.com/clindet/bioctl): convert, format, and other functions\n- [bioextr](https://github.com/clindet/bioextr): text-mining functions\n\n## Prerequisities\n\nFor website spider (optional):\n\n- Headless Chrome is required for some of website with JavaScript driven render pages. For windows users, you may need to create an alias of Chrome to make [chromedp](https://github.com/chromedp/chromedp) work.\n\n```bash\n# To resolve `[FATA] exec: \"google-chrome\": executable file not found in $PATH` error:\n# option 1: install Chrome in your OS\n## centos\nsudo yum install liberation-fonts\nsudo yum -y install libXss*\nsudo yum install libappindicator*\nwget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm \nsudo rpm -ivh google-chrome-stable_current_x86_64.rpm\n\n## ubuntu\nwget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb\nsudo dpkg -i google-chrome-stable_current_amd64.deb\nsudo apt install -f\n\n# option 2: run bget in the headless-shell docker container\ndocker run -d -p 9222:9222 --rm --name headless-shell -v /path_contains_bget/:/tmp/bget chromedp/headless-shell\ndocker exec -it headless-shell /bin/bash\n\n# set more timeout for poor network access\nbget doi 10.1016/j.devcel.2017.03.001 --suppl --timeout 100\n```\n\n\nFor raw sequencing data query (optional):\n\n- [sra-tools](https://github.com/ncbi/sra-tools) for SRA and dbGAP database: `bget i sratools`;\n- [pyega3](https://github.com/EGA-archive/ega-download-client) for EGA database: `pip3 install pyega3`;\n- [gdc-client](https://gdc.cancer.gov/access-data/gdc-data-transfer-tool) for GDC portal: `bget i gdc-client@1.5.0 -u`.\n\n## Installation\n\n```bash\n# windows\nwget https://github.com/clindet/bget/releases/download/v0.3.2/bget_0.3.2_Windows_64-bit.tar.gz\n\n# osx\nwget https://github.com/clindet/bget/releases/download/v0.3.2/bget_0.3.2_Darwin_64-bit.tar.gz\n\n# linux\nwget https://github.com/clindet/bget/releases/download/v0.3.2/bget_0.3.2_Linux_64-bit.tar.gz\n\n# get source and compile latest version\n# Golang Toolchain required: https://golang.org/dl/\ngo get -u github.com/clindet/bget\n```\n\n## Usage\n\nCommand line outputs see [here](https://clindet.github.io/bget/cli.html)\n\n### Query webiste API\n\n`bget api` can be used to query serveral website APIs, such as PubMed, Datasetdataset2tools, and GDC portal website.\n\nIn addition, you can use the downstream tool [bioctl](https://github.com/clindet/bioctl) to conduct the simple text-mining of PubMed abstract at the sentence level.\n\n```bash\n# NCBI eutils\n# query pubmed\nbget api ncbi -d pubmed -q B-ALL --format XML -e your_email@domain.com\n\n# query pubmed and convert it to json format that also extract all URLs and calculate the words connections\nbget api ncbi -q \"Galectins control MTOR and AMPK in response to lysosomal damage to induce autophagy OR MTOR-independent autophagy induced by interrupted endoplasmic reticulum-mitochondrial Ca2+ communication: a dead end in cancer cells. OR The PARK10 gene USP24 is a negative regulator of autophagy and ULK1 protein stability OR Coordinate regulation of autophagy and the ubiquitin proteasome system by MTOR.\" | bioctl cvrt --xml2json pubmed -\n\n# datasetdataset2tools API\n# query canned analysis accession\t, e.g. DCA00000060.\nbget api dta -a DCA00000060\n# query dataset accession number, e.g. GSE31106 \nbget api dta -s GSE31106 | bioctl fmt --json-pretty -\n# query via object type\nbget api dta --type dataset | bioctl fmt --json-pretty --indent 2 -\n# props of dataset accession, e.g. upregulated.\nbget api dta -g upregulated | json2csv -o out.csv\n\n# GDC portal API\n# retrive projects meta info from GDC portal\nbget api gdc -p\nbget api gdc -p --json-pretty\nbget api gdc -p -q TARGET-NBL --json-pretty\nbget api gdc -p --format tsv \u003e tcga_projects.tsv\nbget api gdc -p --format csv \u003e tcga_projects.csv\nbget api gdc -p --from 1 --szie 2\n# check GDC portal status (https://portal.gdc.cancer.gov/)\nbget api gdc -s\n# retrive cases info from GDC portal\nbget api gdc -c\n# retrive files info from GDC portal\nbget api gdc -f\n# retrive annotations info from GDC portal\nbget api gdc -a\n# query manifest for gdc-client\nbget api gdc -m -q \"5b2974ad-f932-499b-90a3-93577a9f0573,556e5e3f-0ab9-4b6c-aa62-c42f6a6cf20c\" -o my_manifest.txt\nbget api gdc -m -q \"5b2974ad-f932-499b-90a3-93577a9f0573,556e5e3f-0ab9-4b6c-aa62-c42f6a6cf20c\" \u003e my_manifest.txt\nbget api gdc -m -q \"5b2974ad-f932-499b-90a3-93577a9f0573,556e5e3f-0ab9-4b6c-aa62-c42f6a6cf20c\" -n\n# query data that only support the samll filesize\nbget api gdc -d -q \"5b2974ad-f932-499b-90a3-93577a9f0573\" -n\n\n# clinicaltrials.gov API\n# returns the date when the ClinicalTrials.gov dataset was posted.\nbget api cligov --info-dat-vers\n# returns the current version number of the ClinicalTrials.gov API\nbget api cligov --info-api-vers\n# returns detailed definitions.\nbget api cligov --info-api-defs\n# returns all available data elements for a single study record.\nbget api cligov --info-study-struct\n# returns all data elements.\nbget api cligov --info-study-fields\n# returns an annotated version of the Study Structure info URL.\nbget api cligov --info-study-stat\n# returns groups of weighted study fields, or \"search areas\"\nbget api cligov --info-search-area\n\t\n# query functions\nbget api cligov -q heart+attack --full-studies --format json\nbget api cligov -q heart+attack --fields NCTId,Condition,BriefTitle --study-fields\nbget api cligov -q heart+attack --field Condition --field-values\n\n# bio.tools API\n# query item detail\nbget api biots --tool signalp\n\n# search item\nbget api biots --name signalp\nbget api biots --topic Proteomics\nbget api biots --dtype 'Protein sequence'\nbget api biots --dfmt FASTA\nbget api biots --ofmt 'ClustalW format'\n\n# crossref\nbget api crf --doi 10.1073/pnas.1814397115\nbget api crf --doi 10.1073/pnas.1814397115 --xml2json --json-pretty --indent 1\n\n# mgrast\nbget api mgrast anno --info\n# retrieval of SwissProt taxonomy annotations with a cut-off 10^100 for dataset mgm4447943.3\nbget api mgrast anno --evalue 100 --type organism --source SwissProt --seq mgm4447943.3\n\nbget api mgrast compute --info\n\n# returns all data in the system. Warning: this request returns 8MB+ and takes 5+ seconds\nbget api covid19 --all\n# returns all countries and associated provinces\nbget api covid19 --cts\n# returns all cases by case type for a country from the first recorded case.\nbget api covid19 --ct --name China\n# returns all cases by case type for a country from the first recorded case.\nbget api covid19 --ct-d-one --name China\n# returns all cases by case type for a country.\nbget api covid19 --ct-d-one-total --name China\n# returns all cases by case type for a country from the first recorded case with the latest record being the live count.\nbget api covid19 --ct-st-d-one --name China --status confirmed\n# returns all cases by case type for a country from the first recorded case.\nbget api covid19 --ct-st-d-one-live --name China --status confirmed\n# returns all cases by case type for a country from the first recorded case\nbget api covid19 --ct-st-d-one-total --name China --status confirmed\n# returns all cases by case type for a country with the latest record being the live count.\nbget api covid19 --ct-st-live --name China --status confirmed\n# returns all cases by case type for a country.\nbget api covid19 --ct-st --name China --status confirmed\n# returns all cases by case type for a country.\nbget api covid19 --ct-st-total --name China --status confirmed\n# returns all cases of a country.\nbget api covid19 --ct-total --name China\n# returns all live cases by case type for a country.\nbget api covid19 --live-ct --name China\n# returns all live cases by case type for a country after a given date.\nbget api covid19 --live-ct-st-date --name China --status confirmed --date 2020-04-20T06:20:47Z\n# returns all live cases by case type for a country.\nbget api covid19 --live-ct-st --name China --status confirmed\n# a summary of new and total cases per country\nbget api covid19 --summary\nbget api covid19 --export\nbget api covid19 --webhook https://your_webhook.com\n```\n\n### Query DOI resources\n\n`bget doi` can be used to query DOI resources from website and journals that the supported items are continuely increasing.\n\n```bash\n## query zendo website with 3 thread\nbget doi 10.5281/zenodo.3363060 10.5281/zenodo.3357455 10.5281/zenodo.3351812 -t 3\n\n## query fulltext of publications (proxy may needed)\nbget doi 10.1016/j.devcel.2017.03.001 10.1016/j.stem.2019.07.009 10.1016/j.celrep.2018.03.072 -t 2\n\n## query publications with supplementary files\nbget doi 10.1038/s41586-019-1844-5 --suppl\n\n# query pdf and meta data using PubMed ID\ndois=`bget api ncbi --xml2json --json-pretty -q '30487223[pmid] or 30402350[pmid] or 29279377[pmid]' --size 3 -m 3 | grep / | grep 10. | sed 's/ .* \"//' | tr -d '\",' | sort -u` \u0026\u0026 echo ${dois} \u0026\u0026 bget doi ${dois} --print-meta --print-crossref\n```\n\nWe can query PDF of the manuscript via using Endnote or sci-hub. However, you can not easily get the supplementary files of scientific papers based on the two ways.\n\n![doi demo](https://github.com/clindet/bget/raw/master/docs/static/doi.gif)\n\nHere, we are developing and sharing an open-source tool bget with `doi` subcommand to query supplementary files of scientific papers. The journals with high impact factors or those integrative publishers are a higher priority in our development plan, see [here](http://clindet.github.io/bget/doi.html)\n\n**Warn**: It is noted that we do not want to distribute any pirated resources or cause unnecessary network congestion. We hope this tool can provide an optional method to more easily query related files of scientific papers. Please use it in a non-invasive way (i.e. high concurrency, long continuous request). If you do not follow the policies of the relevant website (i.e. continuous download or limited copyright), you will lose the authorization to use this tool.\n\n### Query files via alias key\n\n`bget i` can be used to query a set of files via the alias key, such as bwa, samtools, reffa/defuse, and db/annovar.\n\n```bash\n# download bwa source (with task env info)\nbget i bwa --verbose 2\n# get all available keys\nbget i -a\n# in JSON format\nbget i -a --format json\n# view all bwa and samtools available tags in table\nbget i bwa samtools -v\n# view all bwa and samtools available tags in json\nbget i bwa samtools -v --format json\n\n# force download defuse reference (with task env info and save log to file)\nbget i \"reffa/defuse@GRCh38 #97\" -t 10 -f\nbget i reffa/defuse@GRCh38 release=97 -t 10 -f\n# download annovar reference\nbget i db/annovar@clinvar_20170501 db/annovar@clinvar_20180603 builder=hg38\n\nbget i db/annovar -v --formt text\nbget i db/annovar version='clinvar_20131105, clinvar_20140211, clinvar_20140303, clinvar_20140702, clinvar_20140902, clinvar_20140929, clinvar_20150330, clinvar_20150629, clinvar_20151201, clinvar_20160302, clinvar_20161128, clinvar_20170130, clinvar_20170501, clinvar_20170905, clinvar_20180603, avsnp150, avsnp147, avsnp144, avsnp142, avsnp138, cadd, caddgt10, caddgt20, cadd13, cadd13gt10, cadd13gt20, cg69, cg46, cosmic70, cosmic68wgs, cosmic68, cosmic67wgs, cosmic67, cosmic65, cosmic64, dbnsfp35a, dbnsfp33a, dbnsfp31a_interpro, dbnsfp30a, dbscsnv11, eigen, esp6500siv2_ea, esp6500siv2_aa, esp6500siv2_all, exac03nontcga, exac03nonpsych, exac03, fathmm, gerp++gt2, gme, gnomad_exome, gnomad_genome, gwava, hrcr1, icgc21, intervar_20170202, kaviar_20150923, ljb26_all, mcap, mitimpact2, mitimpact24, nci60, popfreq_max_20150413, popfreq_all_20150413, revel, regsnpintron' builder=hg19 -t 10 -f\n```\n\n### Query FASTQ/CEL files from GEO/SRA/EGA/dbGAP/GDC\n\n`bget seq` can be used to query files from [Gene Expression Omnibus (GEO)](https://www.ncbi.nlm.nih.gov/geo), [Sequence Read Archive (SRA)](https://www.ncbi.nlm.nih.gov/sra/), and [GDC Data Portal](https://portal.gdc.cancer.gov/).\n\n```bash\n# download files from SRA databaes using prefetch\nbget seq ERR3324530 SRR544879\n\n# download files from GEO databaes, auto download SRA acc list and run info\nbget seq GSE23543 GSM1098572 -t 2\n\n# download files from dbGap database using krt files\nbget seq dbgap.krt using prefetch\n\n# download dataset from EGA databaes using pyega3\nbget seq EGAD00001000951\n\n# download file from EGA databaes using pyega3\nbget seq EGAF00000585895\n\n# download TCGA files using file id using gdc-client\nbget seq b7670817-9d6b-494e-9e22-8494e2fd430d\n\n# download TCGA files using manifest files using gdc-client\n# split for parallel\nsplit -a 3 --additional-suffix=.txt -l 100 gdc_manifest.2019-08-23-TCGA.txt -d\nfor i in x*.txt\ndo\n  head -n 1 x000.txt \u003e ${i}.tmp \u0026\u0026 cat ${i} \u003e\u003e ${i}.tmp \u0026\u0026mv ${i}.tmp ${i}\ndone\nsed -i '1d' x000.txt\nbget seq *.txt -t 5\n\n# support auto (if you do not have *.krt, TCGA manifest, please not include it for test)\nbget seq SRR544879 GSE23543 EGAD00001000951 b7670817-9d6b-494e-9e22-8494e2fd430d dbgap.krt *.txt -t 5\n```\n\n### Query URLs\n\n`bget url` can be used to query files using URLs.\n\n```bash\nurls=\"https://dldir1.qq.com/weixin/Windows/WeChatSetup.exe,http://download.oray.com/pgy/windows/PgyVPN_4.1.0.21693.exe,https://dldir1.qq.com/qqfile/qq/PCQQ9.1.6/25786/QQ9.1.6.25786.exe\" \u0026\u0026 echo $urls | tr \",\" \"\\n\"\u003e /tmp/urls.list\n\nbget url ${urls}\nbget url https://dldir1.qq.com/weixin/Windows/WeChatSetup.exe https://dldir1.qq.com/qqfile/qq/PCQQ9.1.6/25786/QQ9.1.6.25786.exe --save-log\nbget url ${urls} -t 3 -o /tmp/download -f -g wget --save-log --verbose 2\nbget url ${urls} -t 2 -o /tmp/download --save-log --verbose 2\nbget url ${urls} -t 3 -o /tmp/download -g wget --resume\nbget url -l /tmp/urls.list -o /tmp/download -f -t 3\n\n# query github repo (support assets files)\nbget url Miachol/github_demo --github\nbget url PapenfussLab/gridss clindet/bget --with-github-assets -t 5 --github\nbget url PapenfussLab/gridss clindet/bget --only-github-assets -t 5 --github\nbget url PapenfussLab/gridss clindet/bget --with-github-assets --with-assets-versions v2.7.2,v0.1.3 -t 5 --github\n```\n\n## Maintainer\n\n- [@Jianfeng](https://github.com/Miachol)\n\n## License\n\nAcademic Free License version 3.0\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclindet%2Fbget","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclindet%2Fbget","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclindet%2Fbget/lists"}