https://github.com/llrs/experdesign

Design experiments distributed in several batches
https://github.com/llrs/experdesign

batch cran experiment-design r r-package rstats

Last synced: 6 months ago
JSON representation

Design experiments distributed in several batches

Host: GitHub
URL: https://github.com/llrs/experdesign
Owner: llrs
License: other
Created: 2018-07-27T11:31:12.000Z (almost 7 years ago)
Default Branch: devel
Last Pushed: 2024-12-23T11:28:29.000Z (7 months ago)
Last Synced: 2025-01-11T15:43:56.195Z (6 months ago)
Topics: batch, cran, experiment-design, r, r-package, rstats
Language: R
Homepage: https://experDesign.llrs.dev
Size: 6.99 MB
Stars: 10
Watchers: 3
Forks: 1
Open Issues: 8
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
- Codemeta: codemeta.json

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# experDesign

[![CRAN status](https://www.r-pkg.org/badges/version/experDesign)](https://CRAN.R-project.org/package=experDesign)

[![CRAN checks](https://badges.cranchecks.info/worst/experDesign.svg)](https://cran.r-project.org/web/checks/check_results_experDesign.html)

[![R-CMD-check](https://github.com/llrs/experDesign/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/llrs/experDesign/actions/workflows/R-CMD-check.yaml)

[![Coverage status](https://codecov.io/gh/llrs/experDesign/branch/devel/graph/badge.svg)](https://app.codecov.io/github/llrs/experDesign)

[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)

[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)

[![JOSS](https://joss.theoj.org/papers/10.21105/joss.03358/status.svg)](https://doi.org/10.21105/joss.03358) 

[![DOI](https://zenodo.org/badge/142569201.svg)](https://zenodo.org/badge/latestdoi/142569201)

The goal of experDesign is to help you distribute your samples before an experiment but after they are collected. 

For example, checking for common problems in the data, and reducing or even preventing batch bias before performing an experiment, or measuring it once the experiment is performed.

It provides four main functions:

* `check_data()`: Check if there are any problems with the data.

* `design()`: Randomize the samples according to their variables.

* `replicates()`: Selects some samples for replicates and randomizes the samples (highly recommended).

* `spatial()`: Randomize the samples on a spatial grid.

There are other helpers. 

## Installation

To install the latest version on [CRAN](https://CRAN.R-project.org/package=experDesign) use:

```r

install.packages("experDesign")

```

::: {.pkgdown-devel}

You can install the development version of pkgdown from [GitHub](https://github.com/llrs/experDesign) with:

```r

# install.packages("devtools")

devtools::install_github("llrs/experDesign")

```

:::

## Example

We can use the survey dataset for the examples:

```{r show}

library("experDesign")

data(survey, package = "MASS") 

head(survey)

```

The dataset has numeric, categorical values and `NA` values.

### Checking initial data

We can check some issues from an experimental point of view via `check_data()`:

```{r check, warning=TRUE}

check_data(survey)

```

As you can see with the warnings we get a collections of problems.

In general, try to have at least 3 replicates for each condition and try to have all the data of each variable. 

### Picking samples for each batch

Imagine that we can only work in groups of 70, and we want to randomize by Sex, 

Smoke, Age, and by hand.  

There are `r choose(237, 70)` combinations of samples per batch in a 

this experiment. 

However, in some of those combinations all the right handed students are in the same batch making it impossible to compare the right handed students with the others and draw conclusions from it. 

We could check all the combinations to select those that allow us to do this comparison.

But as this would be too long with `experDesign` we can try to find the combination with the best design by comparing each combination with the original according to multiple statistics.

```{r design, fig.show='hold'}

# To reduce the variables used:

omit <- c("Wr.Hnd", "NW.Hnd", "Fold", "Pulse", "Clap", "Exer", "Height", "M.I")

(keep <- colnames(survey)[!colnames(survey) %in% omit])

head(survey[, keep])

# Set a seed for reproducibility

set.seed(87732135)

# Looking for groups at most of 70 samples.

index <- design(pheno = survey, size_subset = 70, omit = omit, iterations = 100)

index

```

We can transform then into a vector to append to the file or to pass to a colleague with:

```{r batch_names}

head(batch_names(index))

# Or via inspect() to keep it in a matrix format:

head(inspect(index, survey[, keep]))

```

# Previous work

The CRAN task View of [Experimental Design](https://CRAN.R-project.org/view=ExperimentalDesign) includes many packages relevant for designing an experiment before collecting data, but none of them provides how to manage them once the samples are already collected.

Two packages allow to distribute the samples on batches:

- The [OSAT](https://bioconductor.org/packages/release/bioc/html/OSAT.html) package handles categorical 

variables but not numeric data. It doesn't work with our data.

 - The [minDiff](https://github.com/m-Py/minDiff) package reported in [Stats.SE](https://stats.stackexchange.com/a/326015/105234), handles both 

numeric and categorical data. But it can only optimize for two nominal criteria.

It doesn't work for our data.

 - The [Omixer](https://bioconductor.org/packages/Omixer/) package handles both 

numeric and categorical data (converting categorical variables to numeric). But both the same way either Pearson's Chi-squared Test if there are few samples or Kendall's correlation. It does allow to protect some spots from being used.

If you are still designing the experiment and do not have collected any data [DeclareDesign](https://cran.r-project.org/package=DeclareDesign) might be relevant for you. But specially the [randomizr](https://cran.r-project.org/package=randomizr) packages  which makes common forms of random assignment and sampling.

Question in [Bioinformatics.SE](https://bioinformatics.stackexchange.com/q/4765/48) I made before developing the package.

# Other

Please note that this project is released with a [Contributor Code of Conduct](https://www.contributor-covenant.org/version/1/0/0/code-of-conduct/).

By participating in this project you agree to abide by its terms.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/llrs/experdesign

Awesome Lists containing this project

README