{"id":32204214,"url":"https://github.com/voisinneg/queryup","last_synced_at":"2025-10-22T04:52:22.457Z","repository":{"id":58454182,"uuid":"175028062","full_name":"VoisinneG/queryup","owner":"VoisinneG","description":"R client for the UniProt REST API","archived":false,"fork":false,"pushed_at":"2023-06-06T08:09:52.000Z","size":516,"stargazers_count":4,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-22T04:52:22.294Z","etag":null,"topics":["proteomics","r","rest-api","uniprot"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VoisinneG.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-03-11T15:21:33.000Z","updated_at":"2023-11-14T12:35:38.000Z","dependencies_parsed_at":"2025-09-08T15:42:54.775Z","dependency_job_id":"a12daae9-6c64-44c9-9563-4fea910313bd","html_url":"https://github.com/VoisinneG/queryup","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/VoisinneG/queryup","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VoisinneG%2Fqueryup","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VoisinneG%2Fqueryup/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VoisinneG%2Fqueryup/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VoisinneG%2Fqueryup/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VoisinneG","download_url":"https://codeload.github.com/VoisinneG/queryup/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VoisinneG%2Fqueryup/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280382995,"owners_count":26321423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["proteomics","r","rest-api","uniprot"],"created_at":"2025-10-22T04:52:21.417Z","updated_at":"2025-10-22T04:52:22.449Z","avatar_url":"https://github.com/VoisinneG.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"R package: queryup\n================\nGuillaume Voisinne\n2023 - 02 - 08\n\n[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/queryup)](https://cran.r-project.org/package=queryup)\n[![R-CMD-check](https://github.com/VoisinneG/queryup/workflows/R-CMD-check/badge.svg)](https://github.com/VoisinneG/queryup/actions)\n[![Codecov test\ncoverage](https://codecov.io/gh/VoisinneG/queryup/branch/master/graph/badge.svg)](https://app.codecov.io/gh/VoisinneG/queryup?branch=master)\n[![CRAN mirror\ndownloads](https://cranlogs.r-pkg.org/badges/queryup)](https://cran.r-project.org/package=queryup/)\n\nThe `queryup` R package aims to facilitate retrieving information from\nthe UniProt database using R. Programmatic access to the UniProt\ndatabase is performed by submitting queries to the [UniProt website REST\nAPI](https://www.uniprot.org/help/api_queries).\n\n## Install\n\nYou can install the package from CRAN using:\n\n``` r\ninstall.packages(\"queryup\")\n```\n\nAlternatively, you may also install the package from github using\ndevtools:\n\n``` r\ndevtools::install_github(\"VoisinneG/queryup\")\n```\n\n## Queries\n\nQueries combine different fields to identify matching database entries.\nHere, queries are submitted using the function `query_uniprot()`. In the\n`queryup` R package, a query must be formatted as a list containing\ncharacter vectors named after existing UniProt fields (available query\nfields can be found in the [API\ndocumentation](https://www.uniprot.org/help/query-fields) or in the\npackage data `query_fields$field`). Different query fields must be\nmatched simultaneously. For instance, the following query uses the\nfields *gene_exact* to return the UniProt entries of all proteins\nencoded by gene *Pik3r1* :\n\n``` r\nlibrary(queryup)\n```\n\n``` r\nquery \u003c- list(\"gene_exact\" = \"Pik3r1\")\ndf \u003c- query_uniprot(query, show_progress = FALSE)\nhead(df)\n```\n\n    ##        Entry       Entry Name Gene Names Organism (ID)   Reviewed\n    ## 2 A0A096MNU6 A0A096MNU6_PAPAN     PIK3R1          9555 unreviewed\n    ## 3 A0A0D9RTM6 A0A0D9RTM6_CHLSB     PIK3R1         60711 unreviewed\n    ## 4 A0A1S3F3Z7 A0A1S3F3Z7_DIPOR     Pik3r1         10020 unreviewed\n    ## 5 A0A1U7Q814 A0A1U7Q814_MESAU     Pik3r1         10036 unreviewed\n    ## 6 A0A287DCB8 A0A287DCB8_ICTTR     PIK3R1         43179 unreviewed\n    ## 7 A0A2I2ZTD7 A0A2I2ZTD7_GORGO     PIK3R1          9595 unreviewed\n\nAvailable query fields can be listed using the package data\n`query_fields`:\n\n``` r\nquery_fields$field\n```\n\n    ##  [1] \"accession\"                                                \n    ##  [2] \"active\"                                                   \n    ##  [3] \"Refer to the page: Sequence Annotations\"                  \n    ##  [4] \"lit_author\"                                               \n    ##  [5] \"protein_name\"                                             \n    ##  [6] \"chebi\"                                                    \n    ##  [7] \"uniprot_id (/uniref), then uniref_cluster_90 (/uniprotkb)\"\n    ##  [8] \"xrefcount_pdb (or xref_count)\"                            \n    ##  [9] \"date_created\"                                             \n    ## [10] \"database, xref\"                                           \n    ## [11] \"ec\"                                                       \n    ## [12] \"Refer to the pages: Comments or Sequence Annotations\"     \n    ## [13] \"existence\"                                                \n    ## [14] \"family\"                                                   \n    ## [15] \"fragment\"                                                 \n    ## [16] \"gene\"                                                     \n    ## [17] \"gene_exact\"                                               \n    ## [18] \"go\"                                                       \n    ## [19] \"virus_host_name, virus_host_id\"                           \n    ## [20] \"accession_id\"                                             \n    ## [21] \"inchikey\"                                                 \n    ## [22] \"protein_name\"                                             \n    ## [23] \"interactor\"                                               \n    ## [24] \"keyword\"                                                  \n    ## [25] \"length\"                                                   \n    ## [26] \"mass\"                                                     \n    ## [27] \"cc_mass_spectrometry\"                                     \n    ## [28] \"date_modified\"                                            \n    ## [29] \"protein_name\"                                             \n    ## [30] \"organelle\"                                                \n    ## [31] \"organism_name, organism_id\"                               \n    ## [32] \"plasmid\"                                                  \n    ## [33] \"proteome\"                                                 \n    ## [34] \"proteomecomponent\"                                        \n    ## [35] \"sec_acc\"                                                  \n    ## [36] \"reviewed\"                                                 \n    ## [37] \"scope\"                                                    \n    ## [38] \"sec_acc\"                                                  \n    ## [39] \"sequence\"                                                 \n    ## [40] \"date_sequence_modified\"                                   \n    ## [41] \"strain\"                                                   \n    ## [42] \"taxonomy_name, taxonomy_id\"                               \n    ## [43] \"tissue\"                                                   \n    ## [44] \"cc_webresource\"\n\n## Columns\n\nBy default, `query_uniprot()` returns a data.frame with UniProt\naccession IDs, gene names, organism and Swiss-Prot review status. You\ncan choose which data columns to retrieve using the `columns` parameter.\n\n``` r\ndf \u003c- query_uniprot(query, \n                    columns = c(\"id\", \"sequence\", \"keyword\", \"gene_primary\"),\n                    show_progress = FALSE)\n```\n\n    ## Warning in (function (..., deparse.level = 1) : number of columns of result is\n    ## not a multiple of vector length (arg 881)\n\nSee the [API documentation](https://www.uniprot.org/help/return_fields)\nor the package data `return_fields` for all available columns. Available\nreturned fields can be listed using the package data `return_fields`:\n\n``` r\nhead(return_fields)\n```\n\n    ##          field                      label\n    ## 1    accession                      Entry\n    ## 2           id                 Entry name\n    ## 3   gene_names                 Gene names\n    ## 4 gene_primary       Gene names (primary)\n    ## 5 gene_synonym       Gene names (synonym)\n    ## 6     gene_oln Gene names (ordered locus)\n\nNote that the parameter `columns` and the name of the corresponding\ncolumn in the output data frame do not necessarily match (they\ncorrespond to columns “field” and “label” respectively in the package\ndata `return_fields`).\n\n``` r\nnames(df)\n```\n\n    ## [1] \"Entry\"                \"Entry Name\"           \"Sequence\"            \n    ## [4] \"Keywords\"             \"Gene Names (primary)\"\n\nLet’s check the sequence and the UniProt keywords corresponding to the\nfirst entry :\n\n``` r\nas.character(df$Sequence[1])\n```\n\n    ## [1] \"MSAEGYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFSDGQEARPEEIGWLNGYNETTGERGDFPGTYVEYIGRKKISPPTPKPRPPRPLPVAPGSSKTEADVEQQALTLPDLAEQFAPPDVAPPLLIKLVEAIEKKGLECSTLYRTQSSGNLAELRQLLDCDTASVDLEMIDVHILADAFKRYLLDLPNPVIPAAVYSEMISLAQEVQSSEEYIQLLKKLIRSPSIPHQYWLTLQYLLKHFFKLSQTSSKNLLNARVLSEIFSPMLFRFSAASSDNTENLIKVIEILISTEWNERQPAPALPPKPPKPTTVANNGMNNNMSLQDAEWYWGDISREEVNEKLRDTADGTFLVRDASTKMHGDYTLTLRKGGNNKLIKIFHRDGKYGFSDPLTFNSVVELINHYRNESLAQYNPKLDVKLLYPVSKYQQDQVVKEDNIEAVGKKLHEYNTQFQEKSREYDRLYEEYTRTSQEIQMKRTAIEAFNETIKIFEEQCQTQERYSKEYIEKFKREGNEKEIQRIMHNYDKLKSRISEIIDSRRRLEEDLKKQAAEYREIDKRMNSIKPDLIQLRKTRDQYLMWLTQKGVRQKKLNEWLGNENTEDQYSLVEDDEDLPHHDEKTWNVGSSNRNKAENLLRGKRDGTFLVRESSKQGCYACSVVVDGEVKHCVINKTATGYGFAEPYNLYSSLKELVLHYQHTSLVQHNDSLNVTLAYPVYAQDSYFIFQGNMGRMHGNGHSM\"\n\n``` r\nas.character(df$Keywords[1])\n```\n\n    ## [1] \"Coiled coil;Protein transport;Reference proteome;Repeat;SH2 domain;SH3 domain;Stress response;Transport\"\n\n## Combining query fields\n\nOur first query returned many matches. We can build more specific\nqueries by using more than one query field. By default, matching entries\nmust satisfy all query fields simultaneously. Let’s retrieve the only\nSwiss-Prot reviewed protein entry encoded by gene *Pik3r1* in *Homo\nsapiens* (taxon: 9606):\n\n``` r\nquery \u003c- list(\"gene_exact\" = \"Pik3r1\", \n              \"reviewed\" = \"true\", \n              \"organism_id\" = \"9606\")\ndf \u003c- query_uniprot(query, show_progress = FALSE)\nprint(df)\n```\n\n    ##    Entry Entry Name  Gene Names Organism (ID) Reviewed\n    ## 2 P27986 P85A_HUMAN PIK3R1 GRB1          9606 reviewed\n\n## Multiple items per query field\n\nIt is also possible to look for entries that match different items\nwithin a single query field. Items from a given query field are looked\nfor independently. Hence, the following query will return all Swiss-Prot\nreviewed proteins encoded by either *Pik3r1* or *Pik3r2* in either *Mus\nmusculus* (taxon: 10090) or *Homo sapiens* (taxon: 9606):\n\n``` r\nquery \u003c- list(\"gene_exact\" = c(\"Pik3r1\", \"Pik3r2\"), \n              \"reviewed\" = \"true\", \n              \"organism_id\" = c(\"9606\", \"10090\"))\ndf \u003c- query_uniprot(query, show_progress = FALSE)\nprint(df)\n```\n\n    ##    Entry Entry Name  Gene Names Organism (ID) Reviewed\n    ## 2 O00459 P85B_HUMAN      PIK3R2          9606 reviewed\n    ## 3 O08908 P85B_MOUSE      Pik3r2         10090 reviewed\n    ## 4 P26450 P85A_MOUSE      Pik3r1         10090 reviewed\n    ## 5 P27986 P85A_HUMAN PIK3R1 GRB1          9606 reviewed\n\n## Queries with invalid entries\n\nIf a query containing invalid entries is sent to the UniProt REST API,\nan error message is returned and no information about the other\npotentially valid entries can be retrieved. To overcome this limitation,\n`queryup` parses the error messages and remove invalid entries from the\nquery. Hence, `query_uniprot()` will return information for valid\nentries only :\n\n``` r\ninvalid_ids \u003c- c(\"P226\", \"CON_P22682\", \"REV_P47941\")\nvalid_ids \u003c- c(\"A0A0U1ZFN5\", \"P22682\")\nids \u003c- c(invalid_ids, valid_ids)\nquery \u003c- list(\"accession_id\" = ids)\nquery_uniprot(query)\n```\n\n    ## 3 invalid values were found (P226, CON_P22682, REV_P47941) and removed from the query.\n\n    ##        Entry     Entry Name Gene Names Organism (ID)   Reviewed\n    ## 2 A0A0U1ZFN5 A0A0U1ZFN5_RAT  Cbl c-Cbl         10116 unreviewed\n    ## 3     P22682      CBL_MOUSE        Cbl         10090   reviewed\n\n## Long queries\n\nBecause UniProt REST API limits the size of queries, long queries\ncontaining more than a few hundreds entries cannot be passed in a single\nrequest. To overcome this limitation, the `queryup` package splits long\nqueries into smaller ones. For instance, the dataset `uniprot_entries`\nthat is bundled with the `queryup` package contains information for 1000\nUniProt entries. We could retrieve the ENSEMBL ids corresponding to\nthese entries using :\n\n``` r\nids \u003c- uniprot_entries$Entry\nquery \u003c- list(\"accession_id\" = ids)\ncolumns \u003c- c(\"gene_names\", \"xref_ensembl\")\ndf \u003c- query_uniprot(query, columns = columns, show_progress = FALSE)\nhead(df)\n```\n\n    ##        Entry                 Gene Names\n    ## 2 A0A087WPF7             Auts2 Kiaa0442\n    ## 3 A0A088MLT8 Iqcj-Schip1 Iqschfp Schip1\n    ## 4 A0A0B4J1F4                     Arrdc4\n    ## 5 A0A0B4J1G0               Fcgr4 Fcgr3a\n    ## 6 A0A0G2JDV3                 Gbp6 Mpa2l\n    ## 7 A0A0U1RPR8                     Gucy2d\n    ##                                                                Ensembl\n    ## 2 ENSMUST00000161226 [A0A087WPF7-1];ENSMUST00000161374 [A0A087WPF7-3];\n    ## 3                                   ENSMUST00000182006 [A0A088MLT8-1];\n    ## 4 ENSMUST00000048068 [A0A0B4J1F4-1];ENSMUST00000118110 [A0A0B4J1F4-2];\n    ## 5                                                  ENSMUST00000078825;\n    ## 6                                                           A0A0G2JDV3\n    ## 7                                                  ENSMUST00000206435;\n\n## Protein-protein interactions\n\nAnother usage could be to retrieve protein-protein interactions among a\nset of UniProt entries:\n\n``` r\nids \u003c- sample(uniprot_entries$Entry, 400)\nquery \u003c- list(\"accession_id\" = ids, \n              \"interactor\" = ids)\ncolumns \u003c- \"cc_interaction\"\ndf \u003c- query_uniprot(query = query, columns = columns, show_progress = FALSE)\nhead(df)\n```\n\n    ##     Entry                                         Interacts with\n    ## 2  A2A259                                         Q2EG98; A2A259\n    ## 3  O88273                                                 O88273\n    ## 4  O88522 Q60680; O88351; O88522; Q924T7; P62991; P0CG48; P24772\n    ## 21 E9Q401                 Q6PHZ2; Q9Z2I2; Q8K4S1; E9Q401; P23327\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoisinneg%2Fqueryup","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvoisinneg%2Fqueryup","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvoisinneg%2Fqueryup/lists"}