{"id":13593323,"url":"https://github.com/bjcairns/ukbschemas","last_synced_at":"2025-04-09T02:33:20.337Z","repository":{"id":54287879,"uuid":"175864824","full_name":"bjcairns/ukbschemas","owner":"bjcairns","description":"Use R to generate a database containing the UK Biobank data schemas from http://biobank.ctsu.ox.ac.uk/crystal/schema.cgi","archived":false,"fork":false,"pushed_at":"2021-02-26T10:06:01.000Z","size":223,"stargazers_count":20,"open_issues_count":8,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-11-06T14:42:35.224Z","etag":null,"topics":["r","r-package","rstats","sqlite","uk-biobank"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bjcairns.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-15T17:32:53.000Z","updated_at":"2024-08-28T20:01:51.000Z","dependencies_parsed_at":"2022-08-13T11:01:00.457Z","dependency_job_id":null,"html_url":"https://github.com/bjcairns/ukbschemas","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bjcairns%2Fukbschemas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bjcairns%2Fukbschemas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bjcairns%2Fukbschemas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bjcairns%2Fukbschemas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bjcairns","download_url":"https://codeload.github.com/bjcairns/ukbschemas/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247965844,"owners_count":21025447,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r","r-package","rstats","sqlite","uk-biobank"],"created_at":"2024-08-01T16:01:19.183Z","updated_at":"2025-04-09T02:33:19.089Z","avatar_url":"https://github.com/bjcairns.png","language":"R","funding_links":[],"categories":["Misc"],"sub_categories":["Optical coherence tomography and fundus"],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n\n# Download the files?\ndo_dl \u003c- FALSE\nsave_load_file \u003c- \"~/ukbschemas-test-data/ukbschemas_db_test.sqlite\"\n\n```\n# ukbschemas\n\n\u003c!-- badges: start --\u003e\n[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)\n[![Build Status](https://travis-ci.com/bjcairns/ukbschemas.svg?token=tA2cYTLpigx5VuTgcHFd\u0026branch=master)](https://travis-ci.com/bjcairns/ukbschemas)\n\u003c!-- badges: end --\u003e\n\nThis R package can be used to create and/or load a database containing the [UK Biobank Data Showcase schemas](http://biobank.ctsu.ox.ac.uk/crystal/schema.cgi), which are data dictionaries describing the structure of the UK Biobank main dataset.\n\n## Installation\n\nYou can install the current version of ukbschemas from [GitHub](https://github.com/) with:\n\n```{r install, eval=FALSE}\n# install.packages(\"devtools\")\ndevtools::install_github(\"bjcairns/ukbschemas\")\n\nlibrary(ukbschemas)\n```\n\n## Examples\n\n```{r prelim, echo=FALSE, include=FALSE}\ntry(rm(list=c(\"db\",\"sch\")))\nlibrary(ukbschemas)\n```\n\nThe package supports two workflows. \n\n#### Save-Load workflow (recommended)\n\nThe recommended approach is to use `ukbschemas_db()` to download the schema tables and save them to an SQLite database, then use `load_db()` to load the tables from the database and store them as tibbles in a named list:\n\n```{r create, eval=do_dl}\ndb \u003c- ukbschemas_db(path = tempdir())\nsch \u003c- load_db(db = db)\n```\n\n```{r info, echo=FALSE, include=FALSE}\nif (do_dl) {\n  file \u003c- paste0(tempdir(), \"\\\\ukb-schemas-\", Sys.Date(), \".sqlite\")\n  file.copy(file, save_load_file)\n}\n\nfinfo \u003c- file.info(save_load_file)\nfsize \u003c- round(finfo$size/1e6, 1)\nfmtime \u003c- finfo$mtime\n```\n\nBy default, the database is named `ukb-schemas-YYYY-MM-DD.sqlite` (where `YYYY-MM-DD` is the current date) and placed in the current working directory. (`path = tempdir()` in the above example puts it in the current temporary directory instead.) At the most recent compilation of the database (`r format(fmtime, \"%d %B %Y\")`), the size of the .sqlite database file produced by `ukbschemas_db()` was approximately `r fsize`MB.\n\nNote that without further arguments, `ukbschemas_db()` tidies up the database to give it a more consistent relational structure (the changes are summarised in the output of the first example, above). Alternatively the raw data can be loaded with the `as_is` argument:\n\n```{r create_as_is, eval=FALSE}\ndb \u003c- ukbschemas_db(path = tempdir(), overwrite = TRUE, as_is = TRUE)\n```\n\nThe `overwrite` option allows the database file to be overwritten (if `TRUE`), or prevents this (`FALSE`), or if not specified and the session is interactive (`interactive() == TRUE`) then the user is prompted to decide.\n\n**Note:** If you have created a schemas database with an earlier version of ukbschemas, it should be possible to load that database with the latest version of `load_db()`, which (currently) should load any SQLite database, regardless of contents.\n\n#### Load-Save workflow\n\nThe second approach is to download the schemas and store them in memory in a list, and save them to a database only as requried. \n\nThis is **not** recommended, because it is better (for everyone) not to download the schema files every time they are needed, and because the database assumes a certain structure that should be guaranteed when the database is saved. If you still want to take this approach, use:\n\n```{r inmemory, eval=FALSE}\nsch \u003c- ukbschemas()\ndb \u003c- save_db(sch, path = tempdir())\n```\n\n## Why R?\n\nThis package was originally written in bash (a Unix shell scripting language). However, R is more accessible and all dependencies are loaded when you install the package; there is no need to install any secondary software (not even SQLite).\n\n## Notes\n\n#### Design notes\n\n* All the encoding value tables (`esimpint`, `esimpstring`, `esimpreal`, `esimpdate`, `ehierint`, `ehierstring`) have been harmonised and combined into a single table `encvalues`. The `value` column in `encvalues` has type `TEXT`, but a `type` column has been added in case the value is not clear from context. The original type-specific tables have been deleted.\n* To avoid redunancy, category parent-child relationships have been moved to table `categories`, as column `parent_id`, from table `catbrowse` (which has been deleted).\n* Reference to the category to which a field belongs is in the `main_category` column in the `fields` schema, but has been renamed to `category_id` for consistency with the `categories` schema.\n* Details of several of the field properties (`value_type`, `stability`, `item_type`, `strata` and `sexed`) are available elsewhere on the Data Showcase. These have been added manually to tables `valuetypes`, `stability`, `itemtypes`, `strata` and `sexed`, and appropriate ID references have been renamed with the `_id` suffix in tables `fields` and `encodings`.\n* There are several columns in the tables which are not well-documented (e.g. `base_type` in fields, `availability` in `encodings` and `categories`, and others). Additional tables documenting these encoded values may be included in future versions (and suggestions are welcome).\n\n#### Known code issues\n\n* The UK Biobank data schemas are regularly updated as new data are added to the system. ukbschemas does not currently include a facility for updating the database; it is necessary to create a new database. \n* Because `readr::read_csv()` reads whole numbers as type `double`, not `integer` (allowing 64-bit integers without loss of information), column types in schemas loaded in R will differ depending on whether the schemas are loaded directly to R or first saved to a database. This should make little or no difference for most applications.\n* Any [other issues](https://github.com/bjcairns/ukbschemas/issues).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbjcairns%2Fukbschemas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbjcairns%2Fukbschemas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbjcairns%2Fukbschemas/lists"}