https://github.com/openpharma/datafaker
DataFakeR is an R package designed to help you generate sample of fake data preserving specified assumptions about the original one.
https://github.com/openpharma/datafaker
Last synced: 6 months ago
JSON representation
DataFakeR is an R package designed to help you generate sample of fake data preserving specified assumptions about the original one.
- Host: GitHub
- URL: https://github.com/openpharma/datafaker
- Owner: openpharma
- License: other
- Created: 2021-09-02T10:47:32.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2023-04-26T09:27:11.000Z (about 2 years ago)
- Last Synced: 2023-10-25T13:37:03.838Z (over 1 year ago)
- Language: R
- Homepage: https://openpharma.github.io/DataFakeR/articles/main.html
- Size: 966 KB
- Stars: 24
- Watchers: 5
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = TRUE,
echo = TRUE, # echo code?
message = TRUE, # Show messages
warning = TRUE, # Show warnings
fig.width = 8, # Default plot width
fig.height = 6, # .... height
dpi = 200, # Plot resolution
fig.align = "center",
fig.path = "man/figures/README-"
)
knitr::opts_chunk$set() # Figure alignment
library(DataFakeR)
set.seed(123)
options(tibble.width = Inf)
```# DataFakeR
[](https://openpharma.github.io/DataFakeR/)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)## Overview
DataFakeR is an R package designed to help you generate sample of fake data preserving specified assumptions about the original one.
## DataFakeR 0.1.3 is now available!
## Installation
- from CRAN
```
install.packages("DataFakeR")
```- latest version from Github
```
remotes::install_github(
"openpharma/DataFakeR"
)
```## Learning DataFakeR
If you are new to DataFakeR, look at the **[Welcome Page](https://openpharma.github.io/DataFakeR/articles/main.html)**.
You may find there a list of useful articles that will guide you through the package functionality.
## Usage
### Configure schema YAML structure
```
# schema_books.yml
public:
tables:
books:
nrows: 10
columns:
book_id:
type: char(8)
formula: !expr paste0(substr(author, 1, 4), substr(title, 1, 4), substr(bought, 1, 4))
author:
type: varchar
spec: name
title:
type: varchar
spec: book
spec_params:
add_second: true
genre:
type: varchar
values: [Fantasy, Adventure, Horror, Romance]
bought:
type: date
range: ['2020-01-02', '2021-06-01']
amount:
type: smallint
range: [1, 99]
na_ratio: 0.2
purchase_id:
type: varchar
check_constraints:
purchase_id_check:
column: purchase_id
expression: !expr purchase_id == paste0('purchase_', bought)
borrowed:
nrows: 30
columns:
book_id:
type: char(8)
not_null: true
user_id:
type: char(10)
foreign_keys:
book_id_fkey:
columns: book_id
references:
columns: book_id
table: books
```### Define custom simulation methods if needed
```{r}
books <- function(n, add_second = FALSE) {
first <- c("Learning", "Amusing", "Hiding", "Symbols", "Hunting", "Smile")
second <- c("Of", "On", "With", "From", "In", "Before")
third <- c("My", "Your", "The", "Common", "Mysterious", "A")
fourth <- c("Future", "South", "Technology", "Forest", "Storm", "Dreams")
second_res <- NULL
if (add_second) {
second_res <- sample(second, n, replace = TRUE)
}
paste(
sample(first, n, replace = TRUE), second_res,
sample(third, n, replace = TRUE), sample(fourth, n, replace = TRUE)
)
}simul_spec_character_book <- function(n, unique, spec_params, ...) {
spec_params$n <- n
DataFakeR::unique_sample(
do.call(books, spec_params),
spec_params = spec_params, unique = unique
)
}set_faker_opts(
opt_simul_spec_character = opt_simul_spec_character(book = simul_spec_character_book)
)```
### Source schema (and check table and column dependencies)
```{r}
options("dfkr_verbose" = TRUE) # set `dfkr_verbose` option to see the workflow progress
sch <- schema_source("schema_books.yml")
``````{r tbls_dep}
schema_plot_deps(sch)
``````{r books_dep}
schema_plot_deps(sch, "books")
```### Run data simulation
```{r}
sch <- schema_simulate(sch)
```### Check the results
```{r}
schema_get_table(sch, "books")
``````{r}
schema_get_table(sch, "borrowed")
```## Acknowledgment
**The package was created thanks to [Roche](https://www.roche.com/) support and contributions from RWD Insights Engineering Team.**
Special thanks to:
- [Adam Foryś](mailto:[email protected]) for technical support, numerous suggestions for the current and future implementation of the package.
- [Adam Leśniewski](mailto:[email protected]) for challenging limitations of the package by providing multiple real-world test scenarios (and wonderful hex sticker!).
- [Paweł Kawski](mailto:[email protected]) for indication of initial assumptions about the package based on real-world medical data.
- [Kamil Wais](mailto:[email protected]) for highlighting the need for the package and its relevance to real-world applications.## Lifecycle
DataFakeR 0.1.3 is at experimental stage. If you find bugs or post an issue on GitHub page at
## Getting help
There are two main ways to get help with `DataFakeR`
1. Reach the package author via email: [email protected].
2. Post an issue on our GitHub page at [https://github.com/openpharma/DataFakeR/issues](https://github.com/openpharma/DataFakeR/issues).