{"id":21074019,"url":"https://github.com/tomaztk/datasetr","last_synced_at":"2025-05-16T06:31:08.111Z","repository":{"id":167175886,"uuid":"642454445","full_name":"tomaztk/datasetR","owner":"tomaztk","description":"Generate datasets for R projects","archived":false,"fork":false,"pushed_at":"2023-08-17T04:03:00.000Z","size":3466,"stargazers_count":6,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-08-13T07:11:16.589Z","etag":null,"topics":["data","data-frame","data-science","r-language","r-programming","sample","sample-data","sample-data-generator"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tomaztk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-18T15:48:04.000Z","updated_at":"2023-07-30T08:55:29.000Z","dependencies_parsed_at":"2024-02-19T18:34:12.364Z","dependency_job_id":null,"html_url":"https://github.com/tomaztk/datasetR","commit_stats":null,"previous_names":["tomaztk/datasetr"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaztk%2FdatasetR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaztk%2FdatasetR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaztk%2FdatasetR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomaztk%2FdatasetR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tomaztk","download_url":"https://codeload.github.com/tomaztk/datasetR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225411520,"owners_count":17470245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-frame","data-science","r-language","r-programming","sample","sample-data","sample-data-generator"],"created_at":"2024-11-19T19:14:12.053Z","updated_at":"2024-11-19T19:14:12.754Z","avatar_url":"https://github.com/tomaztk.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# datasetR: Generating random dataset for R\n\nThe `datasetR` helps you generate a random datasets for your R project. It provides a preset random list of values with different data types (interval, ordinal, nominal values). It includes also a function for imputing a NULL, NA or missing values.\n\n## Installing\n\n`datasetR` package can be installed and downloaded from this Github repository using **devtools** package. More on\n[devtools package](https://www.rstudio.com/products/rpackages/devtools/).\n \nInstalling is done using:\n\n``` r\nlibrary(devtools)\ninstall_github(\"tomaztk/datasetR\")\n```\n\n## Getting started\n\nComes prepacked with a main function `dsR()` that will help you generate the dataset. But first, let's create a list with 20 different variable types and a random values.\n\n```         \nlibrary(datasetR)\nset_of_val \u003c- set_of_val\n```\nAnd you will get a starting set of values:\n![Set of Values](man/figures/img1_set_of_vals.png)\n\n## Data Types\n\nUnderstand the predefined list of values for constructing the datasets.\n\n**Types** explained:\n\n1. ms: multi-class type of nominal data; all values are equal and no ordering can be done. Available:\n\t* \tcolor\n\t*\tcontinents\n\t*\timaginary ZIP codes\n\t*\tcapital cities\n2. od: ordinal data; values can be assigned order and comparison can be created. Available:\n\t*\tclothing size\n\t*\tclasses from 1 to 6\n3. li: likert scale data; questionnaire type of data and values can be sorted. Available:\n\t*\tthree-value scale for expressing opinion\n\t*\tfive-value scale for expressing opinion\n\t*\tseven-value scale for expressing opinion\n4. bo: boolean data; with values of TRUE and FALSE\n5. bi: binary data with two outcomes. Available:\n\t*\tcharacter type of values Yes and No\n\t* \tinteger type of values 0 and 1\n6. le: single character values of alphabet letters. Available:\n\t*\tlowercase letters\n\t* \tuppercase letters\n7. gu: alpha-numeric character string of 16 bytes length, v4; known as Globally unique identifier (GUID or UUID)\n8. te: numeric data of temperature with no specific unit of measure. Available:\n\t*\ttemperature (pref. °C) with range from -20 to 35\n\t* \ttemperature (pref. °C) with range from 1 to 130 (integer type)\n9. mo: numeric data of money with no specific unit.\n\n### Generating your random dataset\n\nWith the following example, the code will create a dataframe of 100 rows with total of 8 variables. The 8 variables will be type:\n1. 3 x multi-class (nominal with multiple classes; characters or numbers)\n2. 4 x two-class (nominal with two (binary) class ; characters or numbers)\n3. 1 x interval (integer)\n\nThe dimensions of the dataset is 8 variables and 100 rows of sampled data.\n\n``` r\nlibrary(datasetR)\nlibrary(dplyr)\n\nmy_dataset \u003c- dsR(vr=\"ms:3;bi:4;ii:1\", nr=100);\n```\n\n### Generating your desired \n\nWhen you want to create a desired dataset, use the `vr`parameter and construct the string for the values. \nThe string is annotated as **type** : _number of variables_ . When stating multiple types, make sure to separated them with semi-colon.\n\n```r\ntest_data \u003c- dsR(vr=\"od:1;ms:1;bi:1;ii:1\", nr=10);\n```\n\nAnd following statements will generate the dataset of the same dimension.\n\n```r\ntest_data \u003c- dsR(vr=\"od:1;od:1\", nr=10);\ntest_data \u003c- dsR(vr=\"od:2\", nr=10);\n```\n\n\n### Adding missing values to your dataset \n\nWhen you want to skew your dataset, you can add some missing values to your desired data. By using `addMissingValues` on a desired dataset and desired column, the values will get replaced by `NA` values.\nIn addition, the parameter `pc` is for percent of values for given dataframe.column that you want to replace. \n\n```r\nmy_dataset$ii_1 \u003c- addMissingValues(my_dataset, ii_1, pc = 10)\n\n```\n\n## Community and distribution\n\nYou are welcome to submit suggestions and report bugs: https://github.com/tomaztk/datasetR/issues\n\nThanks goes to all the of these [contributors](https://github.com/tomaztk/datasetR/graphs/contributors)!\n\n\n## Documentation\n\nDocumentation available [https://tomaztk.github.io/datasetR](https://tomaztk.github.io/datasetR) and created with [pkgdown](https://github.com/r-lib/pkgdown).\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomaztk%2Fdatasetr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomaztk%2Fdatasetr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomaztk%2Fdatasetr/lists"}