https://github.com/openpharma/rbqmr

Tools for Risk-Based Quality Management in R
https://github.com/openpharma/rbqmr
Last synced: 6 months ago
JSON representation
Tools for Risk-Based Quality Management in R
Host: GitHub
URL: https://github.com/openpharma/rbqmr
Owner: openpharma
License: other
Created: 2022-10-05T14:08:38.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-11-01T09:31:54.000Z (8 months ago)
Last Synced: 2024-11-01T10:20:42.800Z (8 months ago)
Language: R
Homepage: https://openpharma.github.io/rbqmR
Size: 3.87 MB
Stars: 10
Watchers: 3
Forks: 1
Open Issues: 5
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
Awesome Lists containing this project

README

        ---

output: github_document

always_allow_html: true

bibliography: README.bib 

---

```{r, setup, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "80%",

  fig.height = 4,

  fig.width = 6

)

library(magrittr)

library(kableExtra)

library(tidyverse)

devtools::load_all()

```

# rbqmR

[![CRAN status](https://www.r-pkg.org/badges/version/rbqmR)](https://CRAN.R-project.org/package=rbqmR)



[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/apache-2-0)

[![R-CMD-check](https://github.com/openpharma/rbqmR/workflows/R-CMD-check/badge.svg)](https://github.com/openpharma/rbqmR/actions)

[![Test Coverage](https://raw.githubusercontent.com/openpharma/rbqmR/_xml_coverage_reports/data/main/badge.svg)](https://github.com/openpharma/rbqmR/blob/_xml_coverage_reports/data/main/coverage.xml)

## Introduction

The purpose of the `rbqmR` package is to provide a repository of r-based tools for the implementation of risk-based quality management. 

Tools currently exist for

* Dynamic Quality Tolerance Limits (QTLs) using Bayesian Hierarchical Models (ongoing)

* Observed-Minus-Expected methodology

* Observed/Expected methodology (ongoing)

This package is a work-in-progress.  It's primary focus is dynamic QTLs. Other methodologies are included for completeness.

## Installation

You can install the development version of rbqmR from [GitHub](https://github.com/openpharma/rbqmR) with:

```{r, eval=FALSE}

# install.packages("devtools")

devtools::install_github("openpharma/rbqmR")

```

## Dynamic QTLs

We use example 2.7 of Berry et al [@BERRY], described on pages 52 to 63, modifying the context so that rather than being a meta analysis of several different trials, we consider the data to represent the performance of different sites within a single trial.  The exact metric being measured is immaterial, though it remains a summary of a binomial outcome.

```{r sourceData}

data(berrySummary)

berrySummary %>% kable(digits = c(0, 0, 0, 2))

```

The central tenet of the QTL methodology implemented in `rbqmR` is not that a current trial should behave in a similar fashion to a "similar" historical or control trial, but rather that all sites within the current trial should behave in a similar way.  The justification for this assumption is that the trial's inclusion/exclusion criteria are designed to minimise heterogeneity amongst the study population (save for that induced by differences in treatment in comparative trials).  

We fit the Bayesian Hierarchical Model described by Berry et al ...

```{r, fitModel}

fitted <- berrySummary %>%

  fitBayesBinomialModel(n = Subjects, r = Events)

fitted

```

... and use the quantiles of the posterior distribution of the probability of an event to define the QTLs for this metric.  This can be done in isolation (when the trial acts as its own control) or with reference to historical data obtained from similar previous studies.

### Examples of QTL evaluation rules

> When using a Bayesian Hierarchical Model, the probabilities associated with credible intervals are generally lower than those associated with similar frequentist models.  This is because BHMs permit  more sources of variation.  Here, the BHM permits variation between the response rates at different sites, even when considering the overall event rate for the study.  The corresponding frequentist analysis assumes that all sites share  common event rate, thus assuming there is no inter-site variation.

#### Comparison to constant value(s)

The `evaluatePointEstimateQTL` allows the comparison of an arbitrary scalar summary statistic (which defaults to the mean) derived from the estimate of the posterior distribution, with an arbitrary number of lower and upper limits.

For example, the code below defines a QTL based on the mean of the posterior distribution of the probability of an event.  Call this probability $\hat{p}$.  The warning limits are 0.5 and 0.8.  The action limits are 0.4 and 0.9.

```{r}

berrySummary %>%

  evaluatePointEstimateQTL(

    posterior = fitted$tab,

    metric = p,

    observedMetric = ObservedResponse,

    lower = c("warn" = 0.5, "action" = 0.4),

    upper = c("warn" = 0.8, "action" = 0.9)

  )

```

As with all `evaluateXXXXQTL` functions, the return value of `evaluatePointEstimateQTL` is a list.  The `status` element indicates whether or not the QTL's limits have been breached.  The `qtl` element gives the calculated value of the QTL metric and the `data` element returns a copy of the `data.frame` containing the site level KRIs augmented with a column indicating which, if any, of the various limits were breached by that site.

Both the `lower` and `upper` parameters are optional (though at least one must be given) and the number of limits, and their labels, are arbitrary.

```{r, eval=FALSE}

berrySummary %>%

  evaluatePointEstimateQTL(

    posterior = fitted$tab,

    metric = p,

    observedMetric = ObservedResponse,

    upper = c("mild" = 0.6, "moderate" = 0.8, "severe" = 0.9)

  )

```

If only one limit is defined, this can be provided as a scalar, in which case it is labelled `action`.

The function on which the QTL is based is specified by the `stat` parameter of `evaluatePointEstimateQTL` and can be a user-defined function.  For example, the following code fragments define QTLs based on the median

```{r}

berrySummary %>%

  evaluatePointEstimateQTL(

    posterior = fitted$tab,

    metric = p,

    stat = median,

    observedMetric = ObservedResponse,

    upper = c("warn" = 0.7, "action" = 0.9)

  )

```

and 10th centile of the posterior distribution of $\hat{p}$.

```{r}

berrySummary %>%

  evaluatePointEstimateQTL(

    posterior = fitted$tab,

    metric = p,

    stat = function(x) quantile(x, probs = 0.1),

    observedMetric = ObservedResponse,

    upper = c("warn" = 0.3, "action" = 0.8)

  )

```

#### Based on the probability that the derived metric is in a given range

Suppose previous experience tells us that the event probability in this type of study should be between 0.50 and 0.75.  We define the QTL such that we require the posterior event probability for a new participant to be in the range 0.5 to 0.75 inclusive to be at least 60%, with a warning limit of 80%.  Individual sites are flagged if their response rate is either below 40% or above 85%.

```{r}

qtlProbInRange <- berrySummary %>%

  evaluateProbabilityInRangeQTL(

    posterior = fitted$tab,

    metric = p,

    observedMetric = ObservedResponse,

    range = c(0.5, 0.75),

    probs = c("warn" = 0.8, "action" = 0.6),

    lower = 0.4,

    upper = 0.85

  )

qtlProbInRange

```

Again, the QTL is breached, since the probability that the study-level event rate is in the range [0.5, 0.75] is only `r sprintf("%5.2f", qtlProbInRange$qtl)`.

#### Using an arbitrary criterion

`evaluatePointEstimateQTL`, `evaluateProbabilityInRangeQTL` (and `evaluateSiteMetricQTL` discussed below) are wrappers around `evaluateCustomQTL`, which can be used to evaluate an arbitrary, user-defined QTL rule. `evaluateCustomQTL` takes the following parameters:

* `data`: a tibble containing site-level observed metrics

* `posterior`: a tibble containing the posterior distribution of the metric, usually obtained from a fit Bayes model function.

* `f`: a function whose first two parameters are `data` and `posterior`, in that ordered and with those names

* `statusCol`: the column in `data` that defines the status of the site.  Here, `Status`.  Uses tidy evaluation.

* `...`: additional parameters passed to `f`.

Essentially, all that `evaluateCustomQTL` does is to perform some basic checks on its parameter values and then return the value returned by `data %>% f(posterior, ...)`.  So, for example, a simplified version of `evaluatePointEstimateQTL` that compares $\hat{p}$ to 0.6 might be

```{r, error=TRUE}

berrySummary %>%

  evaluateCustomQTL(

    posterior = fitted$tab,

    f = function(data, posterior) {

      rv <- list()

      rv$qtl <- posterior %>%

        summarise(qtl = mean(p)) %>%

        pull(qtl)

      rv$status <- ifelse(rv$qtl < 0.6, "OK", "Breach")

      rv$data <- data %>% mutate(Status = ifelse(ObservedResponse < 0.6, "OK", "Breach"))

      rv

    }

  )

```

### Without historical data

Especially early in development of a new compound, project teams often say they have no idea about what values of various metrics that could be used to define QTLs might be. For conventionally defined QTLs this can be a problem: an inappropriately chosen QTL might lead to a breach when, in reality, there is no issue.  Basing the QTL on the centiles of the posterior distribution of the metric obtained from the QTL model just fitted, and recalling that the purpose of inclusion/exclusion criteria are to minimise heterogeneity, we can avoid this problem.  

The `evaluateSiteMetricQTL` function calculates thresholds (for example warning and action limits) by translating quantiles of the posterior distribution of the metric to the real-life scale on which the metric is measured.  Again, we use the Berry data as an example.

```{r siteMetric1}

rvSiteMetrics <- berrySummary %>%

  evaluateSiteMetricQTL(

    posterior = fitted$tab,

    metric = p,

    observedMetric = ObservedResponse,

    lower = c("action" = 0.05, "warn" = 0.2),

    upper = c("action" = 0.95, "warn" = 0.8)

  )

```

The `quantiles` element of the return value contains the mappings from quantile of the posterior to observed values of the metric.  For example, the first row of `rvSiteMetrics$quantiles` shows that the lower `r rvSiteMetrics$quantiles$Status[1]` limit is the `r 100*rvSiteMetrics$quantiles$Quantile[1]`th centile of the posterior, which corresponds to an event probability of `r sprintf("%4.3f", rvSiteMetrics$quantiles$p[1])`.

```{r}

rvSiteMetrics$quantiles

```

As before, the `data` element of the list contains a copy of the site metric dataset, augmented by a column (named `Status` by default) that allocates the observed site-level metrics (KRIs) to a band defined by the thresholds in `lower` and `upper`.

```{r}

rvSiteMetrics$data

```

> NB `rvSiteMetrics$data` contains incorrect values in the `Status` column.  See [issue #12](https://github.com/openpharma/rbqmR/issues/12).

The `qtl` element of the return value contains counts of sites by threshold name.

```{r}

rvSiteMetrics$qtl

```

Finally, the `status` of the returned QTL is `"OK"` by default.

```{r}

rvSiteMetrics$status

```

However, `evaluateSiteMetricQTL` can be passed a function that can apply an arbitrary rule to determine whether or not a breach has occurred.  In the example below, the function's only argument is the `qtl` element of `evaluateSiteMetricQTL`'s return value.  For example, given that we have nine sites in our fictitious example and we have calculated a 90% posterior credible interval, it's reasonable to expect one site to lie outside this range.  So we might say that a QTL breach has occurred if two or more sites lie outside the credible interval:

```{r}

(berrySummary %>%

  evaluateSiteMetricQTL(

    posterior = fitted$tab,

    metric = p,

    observedMetric = ObservedResponse,

    lower = c("action" = 0.5, "warn" = 0.6),

    upper = c("action" = 0.9, "warn" = 0.8),

    statusFunc = function(d) {

      ifelse(

        d %>%

          dplyr::filter(Status == "action") %>%

          dplyr::pull(N) >= 2, "action", "OK"

      )

    }

  ))$status

```

## Representing the evaluation of a QTL graphically

Take, as an example, a QTL that requires a study level metric to lie within a given range, as illustrated above.

```{r}

fitted$tab %>%

  createQtlPlot(

    targetRange = list("lower" = 0.5, "upper" = 0.75),

    observedMetric = fitted$tab %>% summarise(Mean = mean(p)) %>% pull(Mean)

  )

```

The site-level KRIs can be added to the plot to help focus attention where intervention is likely to have the largest effect.

```{r createQtlPlot}

fitted$tab %>%

  createQtlPlot(

    targetRange = list("lower" = 0.5, "upper" = 0.75),

    observedMetric = fitted$tab %>% summarise(Mean = mean(p)) %>% pull(Mean),

    actionLimits = list(

      list(

        "lower" = rvSiteMetrics$quantiles %>% filter(Threshold == "Upper", Status == "action") %>% pull(p),

        "upper" = NA,

        "alpha" = 0.3,

        "colour" = "goldenrod1"

      ),

      list(

        "lower" = NA,

        "upper" = rvSiteMetrics$quantiles %>% filter(Threshold == "Lower", Status == "action") %>% pull(p),

        "alpha" = 0.3,

        "colour" = "goldenrod1"

      )

    ),

    siteData = berrySummary,

    siteSize = Subjects,

    siteMetric = ObservedResponse

  )

```

## Presenting KRIs

The KRIs associated with a QTL can be presented graphically.  `rmqmR` includes the `createQtlBubblePlot` function, which presents the KRIs (possibly from multiple evaluations of a QTL) in the form of a bubble plot overlaid with a box-and-whisker plot.  Site-level data may be grouped in an arbitrary fashion, or not at all.  Optionally, reference lines that correspond to QTL thresholds may be added.  

```{r, error=TRUE}

berrySummary %>%

  add_column(Snapshot = "End of study") %>%

  add_column(Location = c(rep("EU", 4), rep("US", 5))) %>%

  createQtlBubblePlot(

    x = Snapshot,

    y = ObservedResponse,

    size = Subjects,

    group = Location,

    boxWidth = 0.1,

    limits = list(

      list(

        label = "QTL (85%)",

        colour = "red",

        type = "dashed",

        y = 0.85,

        x = 1.25,

        vjust = -1

      ),

      list(

        label = "Sec Lim (75%)",

        colour = "goldenrod",

        type = "dotted",

        y = 0.75,

        x = 1.25,

        vjust = 1.25

      )

    )

  )

```

## Observed - Expected Methodology

We generate some random data similar to that used by [@GILBERT], after setting a seed for reproducibility.

In order to illustrate what happens when a QTL is breached, we set the probability that a participant reports an event to 0.13, even though the QTL process will assume the event rate is 0.10...

```{r}

set.seed(011327)

randomData <- tibble(

  Subject = 1:400,

  Event = rbinom(400, 1, 0.13)

)

```

... and create an observed-expected table ...

```{r}

omeTable <- randomData %>%

  createObservedMinusExpectedTable(

    timeVar = Subject,

    eventVar = Event,

    eventArray = 1,

    expectedRate = 0.1,

    maxTrialSize = 400

  )

```

... and plot the corresponding graph. 

```{r}

omeTable %>%

  createObservedMinusExpectedPlot()

```

We can see that the trial breached a warning limit.  When did this first happen?

```{r}

omeTable %>%

  filter(Status != "OK") %>%

  head(1) %>%

  select(-contains("Action"), -SubjectIndex) %>%

  kable(

    col.names = c("Subject", "Event", "Cumulative Events", "O - E", "Status", "Lower", "Upper"),

    caption = "First breach of an action or warning limit"

  ) %>%

  add_header_above(c(" " = 5, "Warning Limits" = 2))

```

## Observed / Expected methodology

Katz et al [@KATZ] calculate the confidence interval for the ratio of two binomial random variables.  We use this to determine QTLs for the ratio of observed over expected proportions.  The variability associated with a ratio suggests that this methodology is likely to be useful only for large studies with low expected event rates.

We require historical trial data to implement this methodology.

Suppose we have data on 10,000 historical patients who have reported a given event at a rate of 1.4%.  We are planning a 1500 patient trial and have no reason to suppose the event rate in the trial will be any different from what has been seen in the past.

```{r}

createObservedOverExpectedTable(

  nHistorical = 10000,

  historicalRate = 0.014,

  expectedRate = 0.014,

  nObservedRange = seq(50, 1500, 25)

) %>%

  createObservedOverExpectedPlot()

```

As the trial is executed, the observed data can be added to the table and the plot.

```{r, error=TRUE}

observedData <- tibble(

  NObserved = c(250, 500, 750, 1000),

  ObservedRate = 100 * c(2, 9, 15, 16) / NObserved

)

createObservedOverExpectedTable(

  nHistorical = 10000,

  historicalRate = 0.014,

  expectedRate = 0.014,

  nObservedRange = seq(50, 1500, 25),

  observedData = observedData,

  observedRate = ObservedRate

) %>% createObservedOverExpectedPlot(observedRate = ObservedRate)

```

> TODO Need to check interpretation of parameters

# Beyond TransCelerate

At the time of writing (late 2022) The TransCelerate Quality Tolerance Limit Framework [@Transcelerate2020] lists metrics that are exclusively binary in nature.  There are many other potential metrics that are non-binary and which may provide insight into the conduct of the trial.  For example,

* The number of episodes of rescue medication (as opposed to the percentage or number of trial participants on rescue medication)

* Time to withdrawal of consent (as opposed to the percentage or number of trial participants with withdrawal of informed consent)

As well as other metrics that can't easily be dichotomised

* Drug plasma levels

* Number of (S)AEs reported per time unit of drug exposure

* Time to respond to data queries

The Bayesian QTL framework implemented in `rbqmR` can easily be extended to include these other data types.

## Events per unit time

We use data on the numbers of Prussian cavalry officers kicked to death by horses [@Bortkiewicz1898] to illustrate the method.

```{r}

data("cavalryDeaths")

cavalryDeaths

```

Regard different cavalry Corps as "sites" and regard the number of years for which data were collected as "exposure".

```{r}

cavalrySummary <- cavalryDeaths %>%

  group_by(Corps) %>%

  summarise(

    Deaths = sum(Deaths),

    TotalTime = n(),

    .groups = "drop"

  ) %>%

  mutate(DeathRate = Deaths / TotalTime)

cavalrySummary

```

Although not necessary here, because data for all Corps was recorded for the same amount of time, the Poisson model used by `rbqmR` adjusts risk by total exposure at the site.

```{r, eval = FALSE}

getModelString("poisson")

```

```{r, echo = FALSE}

# nolint start

cat(getModelString("poisson"))

# nolint end

```

Fitting the model is straightforward.

```{r}

poissonFit <- cavalrySummary %>%

  fitBayesPoissonModel(Deaths, TotalTime)

poissonFit$tab %>%

  createQtlPlot(

    metric = lambda,

    siteData = cavalrySummary,

    siteSize = TotalTime,

    siteMetric = DeathRate

  ) +

  labs(x = "Deaths per year")

```

# Environment

```{r}

sessionInfo()

```

# References
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/openpharma/rbqmr

Awesome Lists containing this project

README