{"id":22151699,"url":"https://github.com/epiverse-trace/simulist","last_synced_at":"2025-07-26T05:31:47.555Z","repository":{"id":191597453,"uuid":"684989592","full_name":"epiverse-trace/simulist","owner":"epiverse-trace","description":"An R package for simulating line lists","archived":false,"fork":false,"pushed_at":"2024-05-22T11:42:33.000Z","size":12138,"stargazers_count":4,"open_issues_count":7,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-05-22T15:52:03.150Z","etag":null,"topics":["epidemiology","epiverse","linelist","outbreaks","r","r-package"],"latest_commit_sha":null,"homepage":"https://epiverse-trace.github.io/simulist/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epiverse-trace.png","metadata":{"files":{"readme":"README.Rmd","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-30T09:20:13.000Z","updated_at":"2024-05-31T12:34:15.369Z","dependencies_parsed_at":"2024-03-11T15:29:39.181Z","dependency_job_id":"e7b059ca-4ef3-4cf4-8572-7d1b5e69a434","html_url":"https://github.com/epiverse-trace/simulist","commit_stats":null,"previous_names":["epiverse-trace/simulist"],"tags_count":2,"template":false,"template_full_name":"epiverse-trace/packagetemplate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiverse-trace%2Fsimulist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiverse-trace%2Fsimulist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiverse-trace%2Fsimulist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiverse-trace%2Fsimulist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epiverse-trace","download_url":"https://codeload.github.com/epiverse-trace/simulist/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227652611,"owners_count":17799230,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["epidemiology","epiverse","linelist","outbreaks","r","r-package"],"created_at":"2024-12-02T00:35:49.095Z","updated_at":"2025-07-26T05:31:47.543Z","avatar_url":"https://github.com/epiverse-trace.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file. --\u003e\n\u003c!-- The code to render this README is stored in .github/workflows/render-readme.yaml --\u003e\n\u003c!-- Variables marked with double curly braces will be transformed beforehand: --\u003e\n\u003c!-- `packagename` is extracted from the DESCRIPTION file --\u003e\n\u003c!-- `gh_repo` is extracted via a special environment variable in GitHub Actions --\u003e\n\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# _simulist_: Simulate line list data \u003cimg src=\"man/figures/logo.svg\" align=\"right\" width=\"120\" /\u003e\n\n\u003c!-- badges: start --\u003e\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit)\n[![R-CMD-check](https://github.com/{{ gh_repo }}/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/{{ gh_repo }}/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/{{ gh_repo }}/branch/main/graph/badge.svg)](https://app.codecov.io/gh/{{ gh_repo }}?branch=main)\n[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)\n[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10471459.svg)](https://doi.org/10.5281/zenodo.10471459)\n[![CRAN status](https://www.r-pkg.org/badges/version/simulist)](https://CRAN.R-project.org/package=simulist)\n[![CRAN downloads](https://cranlogs.r-pkg.org/badges/simulist)](https://cran.r-project.org/package=simulist)\n\u003c!-- badges: end --\u003e\n\n`{simulist}` is an R package to simulate individual-level infectious disease outbreak data, including line lists and contact tracing data. It can often be useful to have synthetic datasets like these available when demonstrating outbreak analytics techniques or testing new analysis methods.\n\n`{simulist}` is developed at the [Centre for the Mathematical Modelling of Infectious Diseases](https://www.lshtm.ac.uk/research/centres/centre-mathematical-modelling-infectious-diseases) at the [London School of Hygiene and Tropical Medicine](https://www.lshtm.ac.uk/) as part of [Epiverse-TRACE](https://data.org/initiatives/epiverse/).\n\n## Key features\n\n`{simulist}` allows you to simulate realistic line list and contact tracing data, with:\n\n:hourglass_flowing_sand: Parameterised epidemiological delay distributions \u003cbr\u003e\n:hospital: Population-wide or age-stratified hospitalisation and death risks \u003cbr\u003e\n:bar_chart: Uniform or age-structured populations \u003cbr\u003e\n:chart_with_upwards_trend: Constant or time-varying case fatality risk \u003cbr\u003e\n:clipboard: Customisable probability of case types and contact tracing follow-up \u003cbr\u003e\n\nPost-process simulated line list data for:\n\n:date: Real-time outbreak snapshots with right-truncation \u003cbr\u003e\n:memo: Messy data with inconsistencies, mistakes and missing values \u003cbr\u003e\n\n## Installation\n\nThe package can be installed from CRAN using\n\n```r\ninstall.packages(\"simulist\")\n```\n\nYou can install the development version of `{simulist}` from\n[GitHub](https://github.com/) with:\n\n``` r\n# check whether {pak} is installed\nif(!require(\"pak\")) install.packages(\"pak\")\npak::pak(\"epiverse-trace/simulist\")\n```\n\nAlternatively, install pre-compiled binaries from [the Epiverse TRACE R-universe](https://epiverse-trace.r-universe.dev/simulist)\n\n``` r\ninstall.packages(\"simulist\", repos = c(\"https://epiverse-trace.r-universe.dev\", \"https://cloud.r-project.org\"))\n```\n\n## Quick start\n\n```{r load-simulist}\nlibrary(simulist)\n```\n\nA line list can be simulated by calling `sim_linelist()`. The function provides sensible defaults to quickly generate a epidemiologically valid data set.\n\n```{r, sim-linelist-defaults}\nset.seed(1)\nlinelist \u003c- sim_linelist()\nhead(linelist)\n```\n\nHowever, to simulate a more realistic line list using epidemiological parameters estimated for an infectious disease outbreak we can use previously estimated epidemiological parameters. These can be from the `{epiparameter}` R package if available, or if these are not in the `{epiparameter}` database yet (such as the contact distribution for COVID-19) we can define them ourselves. Here we define a contact distribution, period of infectiousness, onset-to-hospitalisation delay, and onset-to-death delay. \n\n```{r load-epiparameter}\nlibrary(epiparameter)\n```\n\n```{r create-epidists}\n# create COVID-19 contact distribution\ncontact_distribution \u003c- epiparameter::epiparameter(\n  disease = \"COVID-19\",\n  epi_name = \"contact distribution\",\n  prob_distribution = create_prob_distribution(\n    prob_distribution = \"pois\",\n    prob_distribution_params = c(mean = 2)\n  )\n)\n\n# create COVID-19 infectious period\ninfectious_period \u003c- epiparameter::epiparameter(\n  disease = \"COVID-19\",\n  epi_name = \"infectious period\",\n  prob_distribution = create_prob_distribution(\n    prob_distribution = \"gamma\",\n    prob_distribution_params = c(shape = 1, scale = 1)\n  )\n)\n\n# create COVID-19 onset to hospital admission\nonset_to_hosp \u003c- epiparameter(\n  disease = \"COVID-19\",\n  epi_name = \"onset to hospitalisation\",\n  prob_distribution = create_prob_distribution(\n    prob_distribution = \"lnorm\",\n    prob_distribution_params = c(meanlog = 1, sdlog = 0.5)\n  )\n)\n\n# get onset to death from {epiparameter} database\nonset_to_death \u003c- epiparameter::epiparameter_db(\n  disease = \"COVID-19\",\n  epi_name = \"onset to death\",\n  single_epiparameter = TRUE\n)\n```\n\nTo simulate a line list for COVID-19 with an Poisson contact distribution with a mean number of contacts of 2 and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. The mean number of contacts and probability of infection determine the outbreak reproduction number, if the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). \n\n***Warning***: the reproduction number of the simulation results from the contact distribution (`contact_distribution`) and the probability of infection (`prob_infection`); the number of infections is a binomial sample of the number of contacts for each case with the probability of infection (i.e. being sampled) given by `prob_infection`. If the average number of secondary infections from each primary case is greater than 1 then this can lead to the outbreak becoming extremely large. There is currently no depletion of susceptible individuals in the simulation model, so the maximum outbreak size (second element of the vector supplied to the `outbreak_size` argument) can be used to return a line list early without producing an excessively large data set.\n\n```{r sim-linelist}\nset.seed(1)\nlinelist \u003c- sim_linelist(\n  contact_distribution = contact_distribution,\n  infectious_period = infectious_period,\n  prob_infection = 0.5,\n  onset_to_hosp = onset_to_hosp,\n  onset_to_death = onset_to_death\n)\nhead(linelist)\n```\n\nIn this example, the line list is simulated using the default values (see `?sim_linelist`). The default hospitalisation risk is assumed to be 0.2 (i.e. there is a 20% probability an infected individual becomes hospitalised) and the start date of the outbreak is 1st January 2023. To modify either of these, we can specify them in the function.\n\n```{r sim-linelist-diff-args}\nlinelist \u003c- sim_linelist(\n  contact_distribution = contact_distribution,\n  infectious_period = infectious_period,\n  prob_infection = 0.5,\n  onset_to_hosp = onset_to_hosp,\n  onset_to_death = onset_to_death,\n  hosp_risk = 0.01,\n  outbreak_start_date = as.Date(\"2019-12-01\")\n)\nhead(linelist)\n```\n\nTo simulate a table of contacts of cases (i.e. to reflect a contact tracing dataset) we can use the same parameters defined for the example above.\n\n```{r, sim-contacts}\ncontacts \u003c- sim_contacts(\n  contact_distribution = contact_distribution,\n  infectious_period = infectious_period,\n  prob_infection = 0.5\n)\nhead(contacts)\n```\n\nIf both the line list and contacts table are required, they can be jointly simulated using the `sim_outbreak()` function. This uses the same inputs as `sim_linelist()` and `sim_contacts()` to produce a line list and contacts table of the same outbreak (the arguments also have the same default settings as the other functions).\n\n```{r, sim-outbreak}\noutbreak \u003c- sim_outbreak(\n  contact_distribution = contact_distribution,\n  infectious_period = infectious_period,\n  prob_infection = 0.5,\n  onset_to_hosp = onset_to_hosp,\n  onset_to_death = onset_to_death\n)\nhead(outbreak$linelist)\nhead(outbreak$contacts)\n```\n\n## Help \n\nTo report a bug please open an [issue](https://github.com/epiverse-trace/simulist/issues/new/choose).\n\n## Contribute \n\nContributions to `{simulist}` are welcomed. Please follow the [package contributing guide](https://github.com/epiverse-trace/.github/blob/main/CONTRIBUTING.md).\n\n## Code of Conduct\n\nPlease note that the `{simulist}` project is released with a \n[Contributor Code of Conduct](https://github.com/epiverse-trace/.github/blob/main/CODE_OF_CONDUCT.md).\nBy contributing to this project, you agree to abide by its terms.\n\n## Citing this package\n\n```{r message=FALSE, warning=FALSE}\ncitation(\"simulist\")\n```\n\n## Complimentary R packages\n\n:package: :left_right_arrow: :package:  [{epiparameter}](https://epiverse-trace.github.io/epiparameter/) \u003cbr\u003e\n:package: :left_right_arrow: :package: [{epicontacts}](https://www.repidemicsconsortium.org/epicontacts/) \u003cbr\u003e\n:package: :left_right_arrow: :package: [{incidence2}](https://www.reconverse.org/incidence2/) \u003cbr\u003e\n:package: :left_right_arrow: :package: [{cleanepi}](https://epiverse-trace.github.io/cleanepi/) \u003cbr\u003e\n\n## Related projects\n\nThis project has some overlap with other R packages. Here we list these packages and provide a table of features and attributes that are present for each package to help decide which package is appropriate for each use-case.\n\nIn some cases the packages are dedicated to simulating line list and other epidemiological data (e.g. {simulist}), in others the line list simulation is one part of a wider R package (e.g. {EpiNow}). \n\n- [`{LLsim}`](https://github.com/jrcpulliam/LLsim) simulates line list data using a stochastic SIR model with a fixed population with observation and reporting delays. Line list data is generated in two steps, 1) the SIR model simulates the outbreak (`simpleSim()`), 2) the outbreak data is converted into a line list (`createLineList()`).\n- [`{simulacr}`](https://github.com/reconhub/simulacr) uses a branching process model to simulate cases and contacts for an outbreak. It simulates transmission of infections using other epidemiological R packages (`{epicontacts}` and `{distcrete}`) to parameterise and plot simulated data.\n- [`{epidict}`](https://github.com/R4EPI/epidict) is a package that can be used to simulate outbreak data, including line lists, in a DHIS2 format, and survey data that mimics the format by Kobo, using the function `gen_data()`. In addition, MSF outbreak data are available in this package as data dictionaries for Acute Jaundice Syndrome, Cholera, Measles and Meningitis, accessible through the function `msf_dict()`.\n- [`{EpiNow}`](https://github.com/epiforecasts/EpiNow) - a now deprecated R package - includes the `simulate_cases()` and `generate_pseudo_linelist()` functions for generating line list data.\n- [generative-nowcasting](https://github.com/adrian-lison/generative-nowcasting) is a set of R scripts and functions to perform epidemiological nowcasting. There are [functions to simulate line list data](https://github.com/adrian-lison/generative-nowcasting/blob/bf48e027e82ce9d42de468d0b708d010253b7475/code/utils/utils_simulate.R) within the repository, but the repository is not (and does not contain) an R package. Functions can be sourced. Cases are simulated with a renewal process and the simulation can incorporate epidemiological delays and ascertainment.\n\n\u003cdetails\u003e \n\n\u003csummary\u003e Table of line list simulator features \u003c/summary\u003e\n\n\n|                | {simulist}     | {LLsim}        | {simulacr}     | {epidict}      | {EpiNow}       | generative-nowcasting       | \n| -------------- | -------------- | -------------- | -------------- | -------------- | -------------- | -------------- |\n| Simulates line list | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |\n| Simulates contacts | :white_check_mark: | :white_check_mark: | :white_check_mark: | :x: | :x: | :x: |\n| Parameterised with epi distributions[^dist] | :white_check_mark: | :white_check_mark: | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: |\n| Interoperable with {epicontacts} | :white_check_mark: | :white_check_mark: | :white_check_mark: | :x: | :x: | :x: |\n| Explicit population size[^pop] | :x: | :white_check_mark: | :white_check_mark: | :x: | :x: | :x: |\n| R package | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :x: |\n| Actively maintained[^active] | :white_check_mark: | :x: | :x: | :x: | :x: | :white_check_mark: |\n| On CRAN | :white_check_mark: | :x: | :x: | :x: | :x: |  NA |\n| Unit testing[^tests] | :white_check_mark: | :white_check_mark: | :x: | :white_check_mark: | :x: | NA |\n\n[^dist]: In this context _Parameterised with epi distributions_ means that the simulation uses epidemiological distributions (e.g. serial interval, infectious period) to parameterise the model and the parameters of these epi distributions can be modified by the user.\n\n[^pop]: _Explicit population size_ refers to the simulation using a finite population size which is controlled by the user for the depletion of susceptible individuals in the model.\n\n[^active]: We define _Actively maintained_ as the repository having a commit to the main branch within the last 12 months.\n\n[^tests]: _Unit testing_ is ticked if the package contains any form of testing, this can use any testing framework, for example [{testthat}](https://CRAN.R-project.org/package=testthat) or [{tinytest}](https://CRAN.R-project.org/package=tinytest).\n\n\u003c/details\u003e\n\nIf there is another package with this functionality missing from the list that should be added, or if a package included in this list has been updated and the table should reflect this please contribute by making an [issue](https://github.com/epiverse-trace/simulist/issues) or a [pull request](https://github.com/epiverse-trace/simulist/pulls). \n\nOther R packages for simulating epidemic dynamics can be found in the _Epidemic simulation models_ section of the [Epidemiology CRAN task view](https://cran.r-project.org/web/views/Epidemiology.html).\n\nSome packages are related to {simulist} but do not simulate line list data. These include:\n\n- [`{outbreaks}`](https://CRAN.R-project.org/package=outbreaks) an R package containing a library of outbreak data sets, including line list data, for a variety of past and simulated outbreaks, e.g. Ebola and MERS.\n- [`{ringbp}`](https://github.com/epiforecasts/ringbp) an R package to simulate cases using an individual-level transmission model with contact tracing.\n- [`{epichains}`](https://github.com/epiverse-trace/epichains) an R package with functionality to simulate transmission chains using a branching process model.\n\nThe {outbreaks} package is useful if data from a past outbreak data or generic line list data is required. The {ringbp} and {epichains} packages can be used to generate case data over time which can then be converted into a line list with some manual post-processing.\n\nAnother package for creating messy data is the [{messy}](https://CRAN.R-project.org/package=messy) package. This can be used, either independently or in combination with `messy_linelist()`, to create messy line list and contacts data. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepiverse-trace%2Fsimulist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepiverse-trace%2Fsimulist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepiverse-trace%2Fsimulist/lists"}