Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tomaztk/datasetR
Generate datasets for R projects
https://github.com/tomaztk/datasetR
data data-frame data-science r-language r-programming sample sample-data sample-data-generator
Last synced: 3 months ago
JSON representation
Generate datasets for R projects
- Host: GitHub
- URL: https://github.com/tomaztk/datasetR
- Owner: tomaztk
- Created: 2023-05-18T15:48:04.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-08-17T04:03:00.000Z (about 1 year ago)
- Last Synced: 2024-06-04T23:03:46.996Z (5 months ago)
- Topics: data, data-frame, data-science, r-language, r-programming, sample, sample-data, sample-data-generator
- Language: R
- Homepage:
- Size: 3.31 MB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - tomaztk/datasetR - Generate datasets for R projects (R)
README
# datasetR: Generating random dataset for R
The `datasetR` helps you generate a random datasets for your R project. It provides a preset random list of values with different data types (interval, ordinal, nominal values). It includes also a function for imputing a NULL, NA or missing values.
## Installing
`datasetR` package can be installed and downloaded from this Github repository using **devtools** package. More on
[devtools package](https://www.rstudio.com/products/rpackages/devtools/).
Installing is done using:``` r
library(devtools)
install_github("tomaztk/datasetR")
```## Getting started
Comes prepacked with a main function `dsR()` that will help you generate the dataset. But first, let's create a list with 20 different variable types and a random values.
```
library(datasetR)
set_of_val <- set_of_val
```
And you will get a starting set of values:
![Set of Values](man/figures/img1_set_of_vals.png)## Data Types
Understand the predefined list of values for constructing the datasets.
**Types** explained:
1. ms: multi-class type of nominal data; all values are equal and no ordering can be done. Available:
* color
* continents
* imaginary ZIP codes
* capital cities
2. od: ordinal data; values can be assigned order and comparison can be created. Available:
* clothing size
* classes from 1 to 6
3. li: likert scale data; questionnaire type of data and values can be sorted. Available:
* three-value scale for expressing opinion
* five-value scale for expressing opinion
* seven-value scale for expressing opinion
4. bo: boolean data; with values of TRUE and FALSE
5. bi: binary data with two outcomes. Available:
* character type of values Yes and No
* integer type of values 0 and 1
6. le: single character values of alphabet letters. Available:
* lowercase letters
* uppercase letters
7. gu: alpha-numeric character string of 16 bytes length, v4; known as Globally unique identifier (GUID or UUID)
8. te: numeric data of temperature with no specific unit of measure. Available:
* temperature (pref. °C) with range from -20 to 35
* temperature (pref. °C) with range from 1 to 130 (integer type)
9. mo: numeric data of money with no specific unit.### Generating your random dataset
With the following example, the code will create a dataframe of 100 rows with total of 8 variables. The 8 variables will be type:
1. 3 x multi-class (nominal with multiple classes; characters or numbers)
2. 4 x two-class (nominal with two (binary) class ; characters or numbers)
3. 1 x interval (integer)The dimensions of the dataset is 8 variables and 100 rows of sampled data.
``` r
library(datasetR)
library(dplyr)my_dataset <- dsR(vr="ms:3;bi:4;ii:1", nr=100);
```### Generating your desired
When you want to create a desired dataset, use the `vr`parameter and construct the string for the values.
The string is annotated as **type** : _number of variables_ . When stating multiple types, make sure to separated them with semi-colon.```r
test_data <- dsR(vr="od:1;ms:1;bi:1;ii:1", nr=10);
```And following statements will generate the dataset of the same dimension.
```r
test_data <- dsR(vr="od:1;od:1", nr=10);
test_data <- dsR(vr="od:2", nr=10);
```### Adding missing values to your dataset
When you want to skew your dataset, you can add some missing values to your desired data. By using `addMissingValues` on a desired dataset and desired column, the values will get replaced by `NA` values.
In addition, the parameter `pc` is for percent of values for given dataframe.column that you want to replace.```r
my_dataset$ii_1 <- addMissingValues(my_dataset, ii_1, pc = 10)```
## Community and distribution
You are welcome to submit suggestions and report bugs: https://github.com/tomaztk/datasetR/issues
Thanks goes to all the of these [contributors](https://github.com/tomaztk/datasetR/graphs/contributors)!
## Documentation
Documentation available [https://tomaztk.github.io/datasetR](https://tomaztk.github.io/datasetR) and created with [pkgdown](https://github.com/r-lib/pkgdown).