{"id":14966017,"url":"https://github.com/antononcube/raku-data-generators","last_synced_at":"2025-10-25T13:31:02.492Z","repository":{"id":41267860,"uuid":"421071170","full_name":"antononcube/Raku-Data-Generators","owner":"antononcube","description":"This Raku package has functions for generating random strings, words, pet names, vectors, and tabular datasets.","archived":false,"fork":false,"pushed_at":"2024-07-30T17:06:27.000Z","size":987,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-31T07:21:21.699Z","etag":null,"topics":["raku","rakulang","random-date","random-date-generator","random-generation","random-name-generator","random-variates","random-word","random-word-generator"],"latest_commit_sha":null,"homepage":"https://raku.land/zef:antononcube/Data::Generators","language":"Raku","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antononcube.png","metadata":{"files":{"readme":"README-work.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-25T15:03:25.000Z","updated_at":"2024-07-30T17:06:30.000Z","dependencies_parsed_at":"2024-03-27T01:24:25.025Z","dependency_job_id":"4b572066-8d90-4fc8-9418-3786469c36d6","html_url":"https://github.com/antononcube/Raku-Data-Generators","commit_stats":{"total_commits":133,"total_committers":3,"mean_commits":"44.333333333333336","dds":0.1428571428571429,"last_synced_commit":"b98126ea1951222efee14395b5a7d38aeefb1a57"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Data-Generators","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Data-Generators/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Data-Generators/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Data-Generators/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antononcube","download_url":"https://codeload.github.com/antononcube/Raku-Data-Generators/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238147568,"owners_count":19424285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["raku","rakulang","random-date","random-date-generator","random-generation","random-name-generator","random-variates","random-word","random-word-generator"],"created_at":"2024-09-24T13:35:42.098Z","updated_at":"2025-10-25T13:30:56.992Z","avatar_url":"https://github.com/antononcube.png","language":"Raku","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Raku Data::Generators\n\n[![Actions Status](https://github.com/antononcube/Raku-Data-Generators/actions/workflows/linux.yml/badge.svg)](https://github.com/antononcube/Raku-Data-Generators/actions) \n[![Actions Status](https://github.com/antononcube/Raku-Data-Generators/actions/workflows/macos.yml/badge.svg)](https://github.com/antononcube/Raku-Data-Generators/actions) \n[![Actions Status](https://github.com/antononcube/Raku-Data-Generators/actions/workflows/windows.yml/badge.svg)](https://github.com/antononcube/Raku-Data-Generators/actions)\n\n[![License: Artistic-2.0](https://img.shields.io/badge/License-Artistic%202.0-0298c3.svg)](https://opensource.org/licenses/Artistic-2.0)\n\nThis Raku package has functions for generating random strings, words, pet names, vectors, arrays, and\n(tabular) datasets. \n\n### Motivation\n\nThe primary motivation for this package is to have simple, intuitively named functions\nfor generating random vectors (lists) and datasets of different objects.\n\nAlthough, Raku has a fairly good support of random vector generation, it is assumed that commands\nlike the following are easier to use:\n\n```{raku, eval = FALSE}\nsay random-string(6, chars =\u003e 4, ranges =\u003e [ \u003cy n Y N\u003e, \"0\"..\"9\" ] ).raku;\n```\n\n------\n\n## Random strings\n\nThe function `random-string` generates random strings.\n\nHere is a random string:\n\n```perl6\nuse Data::Generators;\nrandom-string\n```\n\nHere we generate a vector of random strings with length 4 and characters that belong to specified ranges:\n\n```perl6\nsay random-string(6, chars =\u003e 4, ranges =\u003e [ \u003cy n Y N\u003e, \"0\"..\"9\" ] ).raku;\n```\n\n------\n\n## Random words\n\nThe function `random-word` generates random words.\n\nHere is a random word:\n\n```perl6\nrandom-word\n```\n\nHere we generate a list with 12 random words:\n\n```perl6\nrandom-word(12)\n```\n\nHere we generate a table of random words of different types:\n\n```perl6\nuse Data::Reshapers;\nmy @dfWords = do for \u003cAny Common Known Stop\u003e -\u003e $wt { $wt =\u003e random-word(6, type =\u003e $wt) };\nsay to-pretty-table(@dfWords);\n```\n\n**Remark:** `Whatever` can be used instead of `'Any'`.\n\n**Remark:** The function `to-pretty-table` is from the package \n[Data::Reshapers](https://modules.raku.org/dist/Data::Reshapers:cpan:ANTONOV).\n\nAll word data can be retrieved with the resources object:\n\n```perl6\nmy $ra = Data::Generators::ResourceAccess.instance();\n$ra.get-word-data().elems;\n```\n\n\n------\n\n## Random pet names\n\nThe function `random-pet-name` generates random pet names.\n\nThe pet names are taken from publicly available data of pet license registrations in\nthe years 2015–2020 in Seattle, WA, USA. See [DG1].\n\nHere is a random pet name:\n\n```perl6\nrandom-pet-name\n```\n\nThe following command generates a list of six random pet names:\n\n```perl6\nsrand(32);\nrandom-pet-name(6).raku\n```\n\nThe named argument `species` can be used to specify specie of the random pet names. \n(According to the specie-name relationships in [DG1].)\n\nHere we generate a table of random pet names of different species:\n\n```perl6\nmy @dfPetNames = do for \u003cAny Cat Dog Goat Pig\u003e -\u003e $wt { $wt =\u003e random-pet-name(6, species =\u003e $wt) };\nsay to-pretty-table(@dfPetNames);\n```\n\n**Remark:** `Whatever` can be used instead of `'Any'`.\n\nThe named argument (adverb) `weighted` can be used to specify random pet name choice \nbased on known real-life number of occurrences:\n\n```perl6\nsrand(32);\nsay random-pet-name(6, :weighted).raku\n```\n\nThe weights used correspond to the counts from [DG1].\n\n**Remark:** The implementation of `random-pet-name` is based on the Mathematica implementation\n[`RandomPetName`](https://resources.wolframcloud.com/FunctionRepository/resources/RandomPetName),\n[AAf1].\n\nAll pet data can be retrieved with the resources object:\n\n```perl6\nmy $ra = Data::Generators::ResourceAccess.instance();\n$ra.get-pet-data()\u003e\u003e.elems\n```\n\n------\n\n## Random pretentious job titles\n\nThe function `random-pretentious-job-title` generates random pretentious job titles.\n\nHere is a random pretentious job title:\n\n```perl6\nrandom-pretentious-job-title\n```\n\nThe following command generates a list of six random pretentious job titles:\n\n```perl6\nrandom-pretentious-job-title(6).raku\n```\n\nThe named argument `number-of-words` can be used to control the number of words in the generated job titles.\n\nThe named argument `language` can be used to control in which language the generated job titles are in.\nAt this point, only Bulgarian and English are supported.\n\nHere we generate pretentious job titles using different languages and number of words per title:\n\n```perl6\nmy $res = random-pretentious-job-title(12, number-of-words =\u003e Whatever, language =\u003e Whatever);\nsay to-pretty-table($res.rotor(3));\n```\n\n**Remark:** `Whatever` can be used as values for the named arguments `number-of-words` and `language`.\n\n**Remark:** The implementation uses the job title phrases in https://www.bullshitjob.com . \nIt is, more-or-less, based on the Mathematica implementation \n[`RandomPretentiousJobTitle`](https://resources.wolframcloud.com/FunctionRepository/resources/RandomPretentiousJobTitle),\n[AAf2].\n\n------\n\n## Random reals\n\nThis module provides the function `random-real` that can be used to generate lists of real numbers\nusing the uniform distribution.\n\nHere is a random real:\n\n```perl6\nsay random-real(); \n```\n\nHere is a random real between 0 and 20:\n\n```perl6\nsay random-real(20); \n```\n\nHere are six random reals between -2 and 12:\n\n```perl6\nsay random-real([-2,12], 6);\n```\n\nHere is a 4-by-3 array of random reals between -3 and 3:\n\n```perl6\nsay random-real([-3,3], [4,3]);\n```\n\n\n**Remark:** The signature design follows Mathematica's function\n[`RandomReal`](https://reference.wolfram.com/language/ref/RandomVariate.html).\n\n\n------\n\n## Random variates\n\nThis module provides the function `random-variate` that can be used to generate lists of real numbers\nusing distribution specifications.\n\nHere are examples:\n\n```perl6\nsay random-variate(BernoulliDistribution.new(:p(0.3)), 1000).BagHash.Hash; \n```\n\n```perl6\nsay random-variate(BinomialDistribution.new(:n(10), :p(0.2)), 10); \n```\n\n```perl6\nsay random-variate(NormalDistribution.new( µ =\u003e 10, σ =\u003e 20), 5); \n```\n\n```perl6\nsay random-variate(UniformDistribution.new(:min(2), :max(60)), 5);\n```\n\n**Remark:** Only Normal distribution and Uniform distribution are implemented at this point.\n\n**Remark:** The signature design follows Mathematica's function\n[`RandomVariate`](https://reference.wolfram.com/language/ref/RandomVariate.html).\n\nHere is an example of 2D array generation:\n\n```perl6\nsay random-variate(NormalDistribution.new, [3,4]);\n```\n\n------\n\n## Random tabular datasets\n\nThe function `random-tabular-dataset` can be used generate tabular *datasets*.\n\n**Remark:** In this module a *dataset* is (usually) an array of arrays of pairs.\nThe dataset data structure resembles closely Mathematica's data structure \n[`Dataset`]https://reference.wolfram.com/language/ref/Dataset.html), [WRI2]. \n\n**Remark:** The programming languages R and S have a data structure called \"data frame\" that\ncorresponds to dataset. (In the Python world the package `pandas` provides data frames.)\nData frames, though, are column-centric, not row-centric as datasets.\nFor example, data frames do not allow a column to have elements of heterogeneous types.\n\nHere are basic calls:\n\n```{perl6, eval=FALSE}\nrandom-tabular-dataset();\nrandom-tabular-dataset(Whatever):row-names;\nrandom-tabular-dataset(Whatever, Whatever);\nrandom-tabular-dataset(12, 4);\nrandom-tabular-dataset(Whatever, 4);\nrandom-tabular-dataset(Whatever, \u003cCol1 Col2 Col3\u003e):!row-names;\n```\n\nHere is example of a generated tabular dataset that column names that are cat pet names:\n\n```perl6\nmy @dfRand = random-tabular-dataset(5, 3, column-names-generator =\u003e { random-pet-name($_, species =\u003e 'Cat') });\nsay to-pretty-table(@dfRand);\n```\n\nThe display function `to-pretty-table` is from\n[`Data::Reshapers`](https://modules.raku.org/dist/Data::Reshapers:cpan:ANTONOV).\n\n**Remark:** At this point only\n[*wide format*](https://en.wikipedia.org/wiki/Wide_and_narrow_data)\ndatasets are generated. (The long format implementation is high in my TOOD list.)\n\n**Remark:** The signature design and implementation are based on the Mathematica implementation\n[`RandomTabularDataset`](https://resources.wolframcloud.com/FunctionRepository/resources/RandomTabularDataset),\n[AAf3].\n\n------\n\n## TODO\n\n1. [ ] TODO Random tabular datasets generation\n    - [X] DONE Row spec\n    - [X] DONE Column spec that takes columns count and column names\n    - [X] DONE Column names generator\n    - [X] DONE Wide form implementation only\n    - [X] DONE Generators of column values  \n      - [X] DONE Column-generator hash\n      - [X] DONE List of generators\n      - [X] DONE Single generator\n      - [X] DONE Turn \"generators\" that are lists into sampling pure functions\n    - [ ] TODO Long form implementation\n    - [ ] TODO Max number of values\n    - [ ] TODO Min number of values\n    - [ ] TODO Form (long or wide)\n    - [X] DONE Row names (automatic)\n    \n2. [X] DONE Random reals vectors generation\n\n3. [ ] TODO Figuring out how to handle and indicate missing values\n   \n4. [ ] TODO Random reals vectors generation according to distribution specs\n    - [X] DONE Uniform distribution\n    - [X] DONE Normal distribution\n    - [ ] TODO Poisson distribution\n    - [ ] TODO Skew-normal distribution\n    - [ ] TODO Triangular distribution\n    \n5. [X] DONE `RandomReal`-like implementation \n    - See `random-real`.\n\n6. [X] DONE Selection between `roll` and `pick` for:\n    - [X] DONE `RandomWord`  \n    - [X] DONE `RandomPetName`\n\n------\n\n## References\n\n### Articles\n\n[AA1] Anton Antonov,\n[\"Pets licensing data analysis\"](https://mathematicaforprediction.wordpress.com/2020/01/20/pets-licensing-data-analysis/), \n(2020), \n[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com).\n\n### Functions, packages\n\n[AAf1] Anton Antonov,\n[RandomPetName](https://resources.wolframcloud.com/FunctionRepository/resources/RandomPetName),\n(2021),\n[Wolfram Function Repository](https://resources.wolframcloud.com/FunctionRepository).\n\n[AAf2] Anton Antonov,\n[RandomPretentiousJobTitle](https://resources.wolframcloud.com/FunctionRepository/resources/RandomPretentiousJobTitle),\n(2021),\n[Wolfram Function Repository](https://resources.wolframcloud.com/FunctionRepository).\n\n[AAf3] Anton Antonov,\n[RandomTabularDataset](https://resources.wolframcloud.com/FunctionRepository/resources/RandomTabularDataset),\n(2021),\n[Wolfram Function Repository](https://resources.wolframcloud.com/FunctionRepository).\n\n[SHf1] Sander Huisman,\n[RandomString](https://resources.wolframcloud.com/FunctionRepository/resources/RandomString),\n(2021),\n[Wolfram Function Repository](https://resources.wolframcloud.com/FunctionRepository).\n\n[WRI1] Wolfram Research (2010), \n[RandomVariate](https://reference.wolfram.com/language/ref/RandomVariate.html), \nWolfram Language function.\n\n[WRI2] Wolfram Research (2014),\n[Dataset](https://reference.wolfram.com/language/ref/Dataset.html),\nWolfram Language function.\n\n### Data repositories\n\n[DG1] Data.Gov,\n[Seattle Pet Licenses](https://catalog.data.gov/dataset/seattle-pet-licenses),\n[catalog.data.gov](https://catalog.data.gov).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-data-generators","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantononcube%2Fraku-data-generators","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-data-generators/lists"}