https://github.com/openpharma/datafaker

DataFakeR is an R package designed to help you generate sample of fake data preserving specified assumptions about the original one.
https://github.com/openpharma/datafaker

Last synced: 6 months ago
JSON representation

DataFakeR is an R package designed to help you generate sample of fake data preserving specified assumptions about the original one.

Host: GitHub
URL: https://github.com/openpharma/datafaker
Owner: openpharma
License: other
Created: 2021-09-02T10:47:32.000Z (almost 4 years ago)
Default Branch: master
Last Pushed: 2023-04-26T09:27:11.000Z (about 2 years ago)
Last Synced: 2023-10-25T13:37:03.838Z (over 1 year ago)
Language: R
Homepage: https://openpharma.github.io/DataFakeR/articles/main.html
Size: 966 KB
Stars: 24
Watchers: 5
Forks: 5
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r setup, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  eval = TRUE,        

  echo = TRUE,         # echo code?

  message = TRUE,     # Show messages

  warning = TRUE,     # Show warnings

  fig.width = 8,       # Default plot width

  fig.height = 6,      # .... height

  dpi = 200,           # Plot resolution

  fig.align = "center",

  fig.path = "man/figures/README-"

)

knitr::opts_chunk$set()  # Figure alignment   

library(DataFakeR)

set.seed(123)

options(tibble.width = Inf)

```

# DataFakeR 

[![version](https://img.shields.io/static/v1.svg?label=github.com&message=v.0.1.3&color=ff69b4)](https://openpharma.github.io/DataFakeR/)

[![lifecycle](https://img.shields.io/badge/lifecycle-experimental-success.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)

## Overview

DataFakeR is an R package designed to help you generate sample of fake data preserving specified assumptions about the original one. 

##  DataFakeR 0.1.3 is now available!

## Installation

- from CRAN

```

install.packages("DataFakeR")

```

- latest version from Github

```

remotes::install_github(

  "openpharma/DataFakeR"

)

```

## Learning DataFakeR

If you are new to DataFakeR, look at the **[Welcome Page](https://openpharma.github.io/DataFakeR/articles/main.html)**.

You may find there a list of useful articles that will guide you through the package functionality.

## Usage

### Configure schema YAML structure

```

# schema_books.yml

public:

  tables:

    books:

      nrows: 10

      columns:

        book_id:

          type: char(8)

          formula: !expr paste0(substr(author, 1, 4), substr(title, 1, 4), substr(bought, 1, 4))

        author:

          type: varchar

          spec: name

        title:

          type: varchar

          spec: book

          spec_params:

            add_second: true

        genre:

          type: varchar

          values: [Fantasy, Adventure, Horror, Romance]

        bought:

          type: date

          range: ['2020-01-02', '2021-06-01']

        amount:

          type: smallint

          range: [1, 99]

          na_ratio: 0.2

        purchase_id:

          type: varchar

      check_constraints:

        purchase_id_check:

          column: purchase_id

          expression: !expr purchase_id == paste0('purchase_', bought)

    borrowed:

      nrows: 30

      columns:

        book_id:

          type: char(8)

          not_null: true

        user_id:

          type: char(10)

      foreign_keys:

        book_id_fkey:

          columns: book_id

          references:

            columns: book_id

            table: books

```

### Define custom simulation methods if needed

```{r}

books <- function(n, add_second = FALSE) {

  first <- c("Learning", "Amusing", "Hiding", "Symbols", "Hunting", "Smile")

  second <- c("Of", "On", "With", "From", "In", "Before")

  third <- c("My", "Your", "The", "Common", "Mysterious", "A")

  fourth <- c("Future", "South", "Technology", "Forest", "Storm", "Dreams")

  second_res <- NULL

  if (add_second) {

    second_res <- sample(second, n, replace = TRUE)

  }

  paste(

    sample(first, n, replace = TRUE), second_res, 

    sample(third, n, replace = TRUE), sample(fourth, n, replace = TRUE)

  )

}

simul_spec_character_book <- function(n, unique, spec_params, ...) {

  spec_params$n <- n

  

  DataFakeR::unique_sample(

    do.call(books, spec_params), 

    spec_params = spec_params, unique = unique

  )

}

set_faker_opts(

  opt_simul_spec_character = opt_simul_spec_character(book = simul_spec_character_book)

)

```

### Source schema (and check table and column dependencies)

```{r}

options("dfkr_verbose" = TRUE) # set `dfkr_verbose` option to see the workflow progress

sch <- schema_source("schema_books.yml")

```

```{r tbls_dep}

schema_plot_deps(sch)

```

```{r books_dep}

schema_plot_deps(sch, "books")

```

### Run data simulation

```{r}

sch <- schema_simulate(sch)

```

### Check the results

```{r}

schema_get_table(sch, "books")

```

```{r}

schema_get_table(sch, "borrowed")

```

## Acknowledgment

**The package was created thanks to [Roche](https://www.roche.com/) support and contributions from RWD Insights Engineering Team.**

Special thanks to:

- [Adam Foryś](mailto:[email protected]) for technical support, numerous suggestions for the current and future implementation of the package.

- [Adam Leśniewski](mailto:[email protected]) for challenging limitations of the package by providing multiple real-world test scenarios (and wonderful hex sticker!).

- [Paweł Kawski](mailto:[email protected]) for indication of initial assumptions about the package based on real-world medical data.

- [Kamil Wais](mailto:[email protected]) for highlighting the need for the package and its relevance to real-world applications.

## Lifecycle

DataFakeR 0.1.3 is at experimental stage. If you find bugs or post an issue on GitHub page at 

## Getting help

There are two main ways to get help with `DataFakeR`

1. Reach the package author via email: [email protected].

2. Post an issue on our GitHub page at [https://github.com/openpharma/DataFakeR/issues](https://github.com/openpharma/DataFakeR/issues).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/openpharma/datafaker

Awesome Lists containing this project

README