{"id":20125699,"url":"https://github.com/epiverse-trace/linelist","last_synced_at":"2025-05-06T17:34:38.857Z","repository":{"id":38421382,"uuid":"478606723","full_name":"epiverse-trace/linelist","owner":"epiverse-trace","description":"R package for handling linelist data","archived":false,"fork":false,"pushed_at":"2025-04-23T08:51:48.000Z","size":10655,"stargazers_count":8,"open_issues_count":9,"forks_count":4,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-23T08:53:23.395Z","etag":null,"topics":["data","data-structures","epidemiology","epiverse","outbreaks","r","r-package","sdg-3","structured-data"],"latest_commit_sha":null,"homepage":"https://epiverse-trace.github.io/linelist/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epiverse-trace.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-04-06T14:56:19.000Z","updated_at":"2025-04-23T08:44:28.000Z","dependencies_parsed_at":"2023-10-10T14:40:36.941Z","dependency_job_id":"7eff8a3f-635b-473e-9a20-d95a46553001","html_url":"https://github.com/epiverse-trace/linelist","commit_stats":{"total_commits":198,"total_committers":5,"mean_commits":39.6,"dds":"0.18181818181818177","last_synced_commit":"3dc0acb8fa371a8b82adf918c7e91239f3392c0d"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiverse-trace%2Flinelist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiverse-trace%2Flinelist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiverse-trace%2Flinelist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiverse-trace%2Flinelist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epiverse-trace","download_url":"https://codeload.github.com/epiverse-trace/linelist/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252734419,"owners_count":21796015,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-structures","epidemiology","epiverse","outbreaks","r","r-package","sdg-3","structured-data"],"created_at":"2024-11-13T20:09:22.218Z","updated_at":"2025-05-06T17:34:38.804Z","avatar_url":"https://github.com/epiverse-trace.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\r\noutput: github_document\r\n---\r\n\r\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\r\n\r\n```{r readmesetup, include = FALSE}\r\nknitr::opts_chunk$set(\r\n  collapse = TRUE,\r\n  comment = \"#\u003e\",\r\n  fig.path = \"man/figures/README-\",\r\n  out.width = \"100%\"\r\n)\r\n```\r\n\r\n# **linelist**: Tagging and Validating Epidemiological Data \u003cimg src=\"man/figures/logo.svg\" align=\"right\" width=\"120\" /\u003e\r\n\r\n\u003c!-- badges: start --\u003e\r\n[![Digital Public Good](https://raw.githubusercontent.com/epiverse-trace/linelist/main/man/figures/dpg_badge.png)](https://www.digitalpublicgoods.net/r/linelist)\r\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit)\r\n[![cran-check](https://badges.cranchecks.info/summary/linelist.svg)](https://cran.r-project.org/web/checks/check_results_linelist.html)\r\n[![R-CMD-check](https://github.com/epiverse-trace/linelist/workflows/R-CMD-check/badge.svg)](https://github.com/epiverse-trace/linelist/actions)\r\n[![codecov](https://codecov.io/gh/epiverse-trace/linelist/branch/main/graph/badge.svg?token=JGTCEY0W02)](https://app.codecov.io/gh/epiverse-trace/linelist)\r\n[![lifecycle-experimental](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-maturing.svg)](https://www.reconverse.org/lifecycle.html#maturing)\r\n[![month-download](https://cranlogs.r-pkg.org/badges/linelist)](https://cran.r-project.org/package=linelist)\r\n[![total-download](https://cranlogs.r-pkg.org/badges/grand-total/linelist)](https://cran.r-project.org/package=linelist)\r\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6532786.svg)](https://doi.org/10.5281/zenodo.6532786)\r\n\u003c!-- badges: end --\u003e\r\n\r\n*linelist* provides a safe entry point to the *Epiverse* software ecosystem,\r\nadding a foundational layer through *tagging*, *validation*, and *safeguarding*\r\nepidemiological data, to help make data pipelines more straightforward and\r\nrobust.\r\n\r\n## Installation\r\n\r\n### Stable version\r\n\r\nOur stable versions are released on CRAN, and can be installed using:\r\n\r\n```{r, eval=FALSE}\r\ninstall.packages(\"linelist\", build_vignettes = TRUE)\r\n```\r\n\r\n::: {.pkgdown-devel}\r\n\r\n### Development version\r\n\r\nThe development version of linelist can be installed from\r\n[GitHub](https://github.com/) with:\r\n\r\n```{r, eval=FALSE}\r\nif (!require(pak)) {\r\n  install.packages(\"pak\")\r\n}\r\npak::pak(\"epiverse-trace/linelist\")\r\n```\r\n\r\n:::\r\n\r\n## Usage\r\n\r\n```{r}\r\n#| fig.alt: \"Graphical summary of the linelist R package, with emphasis of these 4 key features: 1. Tag key epi variables, 2. Validate tagged data, 3. Safeguards vs accidental loss / alteration, 4. Robust data for stronger pipelines](man/figures/linelist_infographics.png\"\r\n#| out.width: \"60%\"\r\nknitr::include_graphics(\"man/figures/linelist_infographics.png\")\r\n```\r\n\r\nlinelist works by tagging key epidemiological data in a `data.frame` or a\r\n`tibble` to facilitate and strengthen data pipelines. The resulting object is a\r\n`linelist` object, which extends `data.frame` (or `tibble`) by providing three\r\ntypes of features:\r\n\r\n1. a **tagging system** to identify key data, enabling access to these data using\r\n   their tags rather than actual names, which may change over time and across\r\n   datasets\r\n\r\n2. **validation** of the tagged variables (making sure they are present and of the\r\n   right type/class)\r\n\r\n3. **safeguards** against accidental losses of tagged variables in common data\r\n   handling operations\r\n\r\nThe short example below illustrates these different features. See the\r\n[Documentation](#documentation) section for more in-depth examples and details\r\nabout `linelist` objects.\r\n\r\n```{r}\r\n# load packages and a dataset for the example\r\n# -------------------------------------------\r\nlibrary(linelist)\r\nlibrary(dplyr)\r\n\r\ndataset \u003c- outbreaks::mers_korea_2015$linelist\r\nhead(dataset)\r\n\r\n# check known tagged variables\r\n# ----------------------------\r\ntags_names()\r\n\r\n# build a linelist\r\n# ----------------\r\nx \u003c- dataset %\u003e%\r\n  tibble() %\u003e%\r\n  make_linelist(\r\n    date_onset = \"dt_onset\", # date of onset\r\n    date_reporting = \"dt_report\", # date of reporting\r\n    occupation = \"age\" # mistake\r\n  )\r\nx\r\ntags(x) # check available tags\r\n```\r\n\r\n`validate_linelist()` will error if one of your tagged column doesn't have the\r\ncorrect type:\r\n\r\n```{r, error = TRUE}\r\n# validation of tagged variables\r\n# ------------------------------\r\n## (this flags a likely mistake: occupation should not be an integer)\r\nvalidate_linelist(x)\r\n```\r\n\r\n```{r}\r\n# change tags: fix mistakes, add new ones\r\n# ---------------------------------------\r\nx \u003c- x %\u003e%\r\n  set_tags(\r\n    occupation = NULL, # tag removal\r\n    gender = \"sex\", # new tag\r\n    outcome = \"outcome\"\r\n  )\r\n\r\n# safeguards against actions losing tags\r\n# --------------------------------------\r\n## attemping to remove geographical info but removing dates by mistake\r\nx_no_geo \u003c- x %\u003e%\r\n  select(-(5:8))\r\n```\r\n\r\nFor stronger pipelines, you can even trigger errors upon loss:\r\n\r\n```{r error = TRUE}\r\nlost_tags_action(\"error\")\r\n\r\nx_no_geo \u003c- x %\u003e%\r\n  select(-(5:8))\r\n\r\nx_no_geo \u003c- x %\u003e%\r\n  select(-(5:7))\r\n\r\n## to revert to default behaviour (warning upon error)\r\nlost_tags_action()\r\n```\r\n\r\nAlternatively, content can be accessed by tags:\r\n\r\n```{r}\r\nx_no_geo %\u003e%\r\n  select(has_tag(c(\"date_onset\", \"outcome\")))\r\n\r\nx_no_geo %\u003e%\r\n  tags_df()\r\n```\r\n\r\nlinelist can also be connected to the incidence2 package for pipelines focused\r\non aggregated count data:\r\n\r\n```{r, fig.width=8, fig.height=6, fig.alt=\"Epicurves (daily incidence) by sex and outcome via the incidence2 R package.\"}\r\nlibrary(incidence2)\r\n\r\nx_no_geo %\u003e%\r\n  tags_df() %\u003e%\r\n  incidence(\"date_onset\", groups = c(\"gender\", \"outcome\")) %\u003e%\r\n  plot(\r\n    fill = \"outcome\",\r\n    angle = 45,\r\n    nrow = 2,\r\n    border_colour = \"white\",\r\n    legend = \"bottom\"\r\n  )\r\n```\r\n\r\n## Documentation\r\n\r\nMore detailed documentation can be found at:\r\nhttps://epiverse-trace.github.io/linelist/\r\n\r\nIn particular:\r\n\r\n* [A general introduction to linelist](https://epiverse-trace.github.io/linelist/articles/linelist.html)\r\n\r\n* [The reference manual](https://epiverse-trace.github.io/linelist/reference/index.html)\r\n\r\n## Getting help\r\n\r\nTo ask questions or give us some feedback, please use the github\r\n[issues](https://github.com/epiverse-trace/linelist/issues) system.\r\n\r\n## Data privacy\r\n\r\nCase line lists may contain personally identifiable information (PII). While\r\nlinelist provides a way to store this data in R, it does not currently provide\r\ntools for data anonymization. The user is responsible for respecting individual\r\nprivacy and ensuring PII is handled with the required level of confidentiality,\r\nin compliance with applicable laws and regulations for storing and sharing PII.\r\n\r\nNote that PII is rarely needed for common analytics tasks, so that in many\r\ninstances it may be advisable to remove PII from the data before sharing them\r\nwith analytics teams.\r\n\r\n## Development\r\n\r\n### Lifecycle\r\n\r\nThis package is currently *maturing*, as defined by the [RECON software\r\nlifecycle](https://www.reconverse.org/lifecycle.html). This means that essential\r\nfeatures and mechanisms are present and stable but minor breaking changes, or\r\nfunction renames may still occur sporadically.\r\n\r\n### Contributions\r\n\r\nContributions are welcome via [pull requests](https://github.com/epiverse-trace/linelist/pulls).\r\n\r\n### Code of Conduct\r\n\r\nPlease note that the linelist project is released with a\r\n[Code of Conduct](https://github.com/epiverse-trace/.github/blob/main/CODE_OF_CONDUCT.md).\r\nBy contributing to this project, you agree to abide by its terms.\r\n\r\n### Notes\r\n\r\nThis package is a reboot of the RECON package\r\n[linelist](https://github.com/reconhub/linelist). Unlike its predecessor, the\r\nnew package focuses on the implementation of a `linelist` class. The data\r\ncleaning features of the original package will eventually be re-implemented for\r\n`linelist` objects, albeit likely in a separate package.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepiverse-trace%2Flinelist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepiverse-trace%2Flinelist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepiverse-trace%2Flinelist/lists"}