{"id":23029751,"url":"https://github.com/antononcube/raku-dsl-english-dataqueryworkflows","last_synced_at":"2025-08-14T12:34:45.956Z","repository":{"id":39002502,"uuid":"281157425","full_name":"antononcube/Raku-DSL-English-DataQueryWorkflows","owner":"antononcube","description":"Raku package for making data query workflows from natural language commands (in English.)","archived":false,"fork":false,"pushed_at":"2024-03-27T13:01:01.000Z","size":594,"stargazers_count":6,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-04-19T04:50:40.299Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Raku","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antononcube.png","metadata":{"files":{"readme":"README-work.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2020-07-20T15:38:44.000Z","updated_at":"2024-04-19T04:50:40.300Z","dependencies_parsed_at":"2024-02-21T03:01:54.397Z","dependency_job_id":"1ed119f8-61ed-400d-8c72-2fa9a7194f4c","html_url":"https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-DSL-English-DataQueryWorkflows","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-DSL-English-DataQueryWorkflows/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-DSL-English-DataQueryWorkflows/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-DSL-English-DataQueryWorkflows/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antononcube","download_url":"https://codeload.github.com/antononcube/Raku-DSL-English-DataQueryWorkflows/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229827651,"owners_count":18130395,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-15T14:16:59.516Z","updated_at":"2024-12-15T14:17:00.190Z","avatar_url":"https://github.com/antononcube.png","language":"Raku","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Query Workflows\n\n[![MacOS](https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows/actions/workflows/macos.yml/badge.svg)](https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows/actions/workflows/macos.yml)\n[![Linux](https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows/actions/workflows/linux.yml/badge.svg)](https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows/actions/workflows/linux.yml)\n[![Win64](https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows/actions/workflows/windows.yml/badge.svg)](https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows/actions/workflows/windows.yml)\n\n[![https://raku.land/zef:antononcube/DSL::English::DataQueryWorkflows](https://raku.land/zef:antononcube/DSL::English::DataQueryWorkflows/badges/version)](https://raku.land/zef:antononcube/DSL::English::DataQueryWorkflows)\n\n## Introduction\n\nThis Raku (Perl 6) package has grammar and action classes for the parsing and interpretation of natural\nDomain Specific Language (DSL) commands that specify data queries in the style of Standard Query Language (SQL) or\n[RStudio](https://rstudio.com)'s library [`tidyverse`](https://tidyverse.tidyverse.org).\n\nThe interpreters (actions) have as targets different programming languages (and packages in them.)\n\nThe currently implemented programming-language-and-package targets are:\nJulia::DataFrames, Mathematica, Python::pandas, R::base, R::tidyverse, Raku::Reshapers.\n\nThere are also interpreters to natural languages: Bulgarian, English, Korean, Russian, Spanish.\n\n------\n\n## Installation\n\nZef ecosystem:\n\n```\nzef install DSL::English::DataQueryWorkflows\n```\n\nGitHub:\n\n```\nzef install https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows.git\n```\n\n-------\n\n## Current state\n\nThe following diagram:\n\n- Summarizes the data wrangling capabilities envisioned for this package \n- Represents the Raku parsers and interpreters in this package with the hexagon\n- Indicates future plans with dashed lines\n\n\n![](https://raw.githubusercontent.com/antononcube/RakuForPrediction-book/main/Diagrams/DSLs-Interpreter-for-Data-Wrangling-August-2022-state.png)\n\n**Remark:** The grammar of this package is extended to parse Bulgarian DSL commands\nwith the package \n[\"DSL::Bulgarian\"](https://github.com/antononcube/Raku-DSL-Bulgarian), \n[AAp5].\n\n-------\n\n## Workflows considered\n\nThe following flow-chart encompasses the data transformations workflows we consider:\n\n![](https://raw.githubusercontent.com/antononcube/ConversationalAgents/master/ConceptualDiagrams/Tabular-data-transformation-workflows.png)\n\nHere are some properties of the methodology / flow chart:\n\n- The flow chart is for tabular datasets, or for lists (arrays) or dictionaries (hashes) of tabular datasets\n- In the flow chart only the data loading and summary analysis are not optional\n- All other steps are optional\n- Transformations like inner-joins are represented by the block “Combine groups”\n- It is assumed that in real applications several iterations (loops) have to be run over the flow chart\n\nIn the world of the programming language R the orange blocks represent the so called\nSplit-Transform-Combine pattern;\nsee the article \"The Split-Apply-Combine Strategy for Data Analysis\" by Hadley Wickham, [HW1].\n\nFor more data query workflows design details see the article \n[\"Introduction to data wrangling with Raku\"](https://rakuforprediction.wordpress.com/2021/12/31/introduction-to-data-wrangling-with-raku/), \n[AA1] or its translation (and upgrade) in Bulgarian, [AA2].\n\n------\n\n## Examples\n\nHere is example code:\n\n```perl6\nuse DSL::English::DataQueryWorkflows;\n\nsay ToDataQueryWorkflowCode('select mass \u0026 height', 'R-tidyverse');\n```\n\nHere is a longer data wrangling command:\n\n```perl6\nmy $command = 'use starwars;\nselect species, mass \u0026 height;\ngroup by species;\narrange by the variables species and mass descending';\n```\nHere we translate that command into executable code for Julia, Mathematica, Python, R, and Raku:\n\n```perl6\n{say $_.key,  \":\\n\", $_.value, \"\\n\"} for \u003cJulia Mathematica Python R R::tidyverse Raku\u003e.map({ $_ =\u003e ToDataQueryWorkflowCode($command, $_ ) });\n```\n\nHere we translate to other human languages:\n\n```perl6\n{say $_.key,  \":\\n\", $_.value, \"\\n\"} for \u003cBulgarian English Korean Russian Spanish\u003e.map({ $_ =\u003e ToDataQueryWorkflowCode($command, $_ ) });\n```\n\nAdditional examples can be found in this file:\n[DataQueryWorkflows-examples.raku](./examples/DataQueryWorkflows-examples.raku).\n\n-------\n\n## Command line interface\n\nThe package provides the Command Line Interface (CLI) program `ToDataQueryWorkflowCode`.\nHere is its usage message:\n\n```shell\n\u003e ToDataQueryWorkflowCode --help\nTranslates natural language commands into data transformations programming code.\nUsage:\n  ToDataQueryWorkflowCode [-t|--target=\u003cStr\u003e] [-l|--language=\u003cStr\u003e] [-f|--format=\u003cStr\u003e] [-c|--clipboard-command=\u003cStr\u003e] \u003ccommand\u003e -- Main CLI signature.\n  ToDataQueryWorkflowCode [-l|--language=\u003cStr\u003e] [-f|--format=\u003cStr\u003e] [-c|--clipboard-command=\u003cStr\u003e] \u003ctarget\u003e \u003ccommand\u003e -- Easier target specification.\n  ToDataQueryWorkflowCode [-t|--target=\u003cStr\u003e] [-l|--language=\u003cStr\u003e] [-f|--format=\u003cStr\u003e] [-c|--clipboard-command=\u003cStr\u003e] [\u003cwords\u003e ...] -- Command given as a sequence of words.\n  \n    \u003ccommand\u003e                       A string with one or many commands (separated by ';').\n    -t|--target=\u003cStr\u003e               Target (programming language with optional library spec.) [default: 'Whatever']\n    -l|--language=\u003cStr\u003e             The natural language to translate from. [default: 'English']\n    -f|--format=\u003cStr\u003e               The format of the output, one of 'Whatever', 'code', 'hash', or 'raku'. [default: 'Whatever']\n    -c|--clipboard-command=\u003cStr\u003e    Clipboard command to use. [default: 'Whatever']\n    \u003ctarget\u003e                        Programming language.\n    [\u003cwords\u003e ...]                   Words of a data query.\n\n\nDetails:\n    If --target is 'Whatever' then:\n        1. It is attempted to use the environmental variable RAKU_DSL_DATAQUERYWORKFLOWS_TARGET\n        2. If RAKU_DSL_DATAQUERYWORKFLOWS_TARGET is not defined then 'R::tidyverse' is used.\n    If --clipboard-command is the empty string then no copying to the clipboard is done.\n    If --clipboard-command is 'Whatever' then:\n        1. It is attempted to use the environment variable CLIPBOARD_COPY_COMMAND.\n            If CLIPBOARD_COPY_COMMAND is defined and it is the empty string then no copying to the clipboard is done.\n        2. If the variable CLIPBOARD_COPY_COMMAND is not defined then:\n            - 'pbcopy' is used on macOS\n            - 'clip.exe' on Windows\n            - 'xclip -selection clipboard' on Linux.\n```\n\nHere is an example invocation:\n\n```shell\n\u003e ToDataQueryWorkflowCode Python \"use the dataset dfTitanic; group by passengerSex; show counts\"\nobj = dfTitanic.copy()\nobj = obj.groupby([\"passengerSex\"])\nprint(obj.size())\n```\n\n-------\n\n## Testing\n\nThere are three types of unit tests for:\n\n1. Parsing abilities; see [example](./t/Basic-commands.rakutest)\n\n2. Interpretation into correct expected code; see [example](./t/Basic-commands-R-tidyverse.rakutest)\n\n3. Data transformation correctness; see tests in:\n   - [R](https://github.com/antononcube/R-packages/tree/master/DataQueryWorkflowsTests), [AAp2]\n   - [WL](https://github.com/antononcube/ConversationalAgents/blob/master/UnitTests/WL/DataQueryWorkflows-Unit-Tests.wlt), [AAp3]\n   - [Python](https://github.com/antononcube/ConversationalAgents/blob/master/UnitTests/Python/DataQueryWorkflows-Unit-Tests.py), [AAp4]\n\nThe unit tests R-package [AAp2] can be used to test both R and Python translations and equivalence between them.\n\nThere is a similar WL package, [AAp3].\n(The WL unit tests package *can* have unit tests for Julia, Python, R -- not implemented yet.)\n\n------\n\n## On naming of translation packages\n\nWL has a `System` context where usually the built-in functions reside. WL adepts know this, but others do not.\n(Every WL package provides a context for its functions.)\n\nMy naming convention for the translation files so far is `\u003cprogramming language\u003e::\u003cpackage name\u003e`. \nAnd I do not want to break that invariant.\n\nKnowing the package is not essential when invoking the functions. For example `ToDataQueryWorkflowCode[_,\"R\"]` produces\nsame results as `ToDataQueryWorkflowCode[_,\"R-base\"]`, etc.\n\n------\n\n## Versions\n\nThe original version of this Raku package was developed/hosted at\n[ [AAp1](https://github.com/antononcube/ConversationalAgents/tree/master/Packages/Perl6/DataQueryWorkflows) ].\n\nA dedicated GitHub repository was made in order to make the installation with Raku's `zef` more direct.\n(As shown above.)\n\n------\n\n## TODO\n\n- [X] DONE Implement SQL actions.\n\n- [ ] TODO Implement [Swift::TabularData](https://developer.apple.com/documentation/tabulardata) actions.\n  \n- [ ] TODO Implement [Raku::Dan](https://github.com/p6steve/raku-Dan) actions.\n\n- [ ] TODO Make sure \"round trip\" translations work. \n\n------\n\n## References\n\n### Articles\n\n[AA1] Anton Antonov,\n[\"Introduction to data wrangling with Raku\"](https://rakuforprediction.wordpress.com/2021/12/31/introduction-to-data-wrangling-with-raku/),\n(2021),\n[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).\n\n[AA2] Anton Antonov,\n[\"Увод в обработката на данни с Raku\"](https://rakuforprediction.wordpress.com/2022/05/24/увод-в-обработката-на-данни-с-raku/),\n(2022),\n[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).\n\n[HW1] Hadley Wickham, \n[\"The Split-Apply-Combine Strategy for Data Analysis\"](https://www.jstatsoft.org/article/view/v040i01), \n(2011), \n[Journal of Statistical Software](https://www.jstatsoft.org/).\n\n### Notebooks\n\n[AAn1] Anton Antonov,\n[\"Standard Data Wrangling Commands (in Raku)\"](./doc/Standard-data-wrangling-commands.md), \n(2022),\n[Raku-DSL-English-DataQueryWorkflows at GitHub](https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows).   \n(Markdown notebook.)\n\n[AAn2] Anton Antonov,\n[\"Standard Data Wrangling Commands (in Python)\"](https://github.com/antononcube/RakuForPrediction-book/blob/main/Notebooks/Jupyter/Standard-data-wrangling-commands-Python.ipynb),\n(2022),\n[RakuForPrediction-book at GitHub](https://github.com/antononcube/RakuForPrediction-book).   \n(Jypyter notebook.)\n\n### Packages\n\n[AAp1] Anton Antonov,\n[Data Query Workflows Raku Package](https://github.com/antononcube/ConversationalAgents/tree/master/Packages/Perl6/DataQueryWorkflows)\n,\n(2020),\n[ConversationalAgents at GitHub/antononcube](https://github.com/antononcube/ConversationalAgents).\n\n[AAp2] Anton Antonov,\n[Data Query Workflows Tests](https://github.com/antononcube/R-packages/tree/master/DataQueryWorkflowsTests),\n(2020),\n[R-packages at GitHub/antononcube](https://github.com/antononcube/R-packages).\n\n[AAp3] Anton Antonov,\n[Data Query Workflows Mathematica Unit Tests](https://github.com/antononcube/ConversationalAgents/blob/master/UnitTests/WL/DataQueryWorkflows-Unit-Tests.wlt),\n(2020),\n[ConversationalAgents at GitHub/antononcube](https://github.com/antononcube/ConversationalAgents).\n\n[AAp4] Anton Antonov,\n[Data Query Workflows Python Unit Tests](https://github.com/antononcube/ConversationalAgents/blob/master/UnitTests/Python/DataQueryWorkflows-Unit-Tests.py),\n(2020),\n[ConversationalAgents at GitHub/antononcube](https://github.com/antononcube/ConversationalAgents).\n\n[AAp5] Anton Antonov,\n[DSL::Bulgarian Raku package](https://github.com/antononcube/Raku-DSL-Bulgarian),\n(2022),\n[GitHub/antononcube](https://github.com/antononcube).\n\n### Videos\n\n[AAv1] Anton Antonov,\n[\"Multi-language Data-Wrangling Conversational Agent\"](https://www.youtube.com/watch?v=pQk5jwoMSxs),\n(2020),\n[Wolfram Technology Conference 2020, YouTube/Wolfram](https://www.youtube.com/channel/UCJekgf6k62CQHdENWf2NgAQ).\n\n[AAv2] Anton Antonov, \n[\"Raku for Prediction\"](https://conf.raku.org/talk/157), \n(2021), \n[The Raku Conference 2021](https://conf.raku.org/).\n\n[AAv3] Anton Antonov,\n[\"Doing it like a Cro (Raku data wrangling Shortcuts demo)\"](https://www.youtube.com/watch?v=wS1lqMDdeIY),\n(2021),\n[Anton Antonov's channel at YouTube](https://www.youtube.com/channel/UC5qMPIsJeztfARXWdIw3Xzw).\n\n[AAv4] Anton Antonov,\n[\"FOSDEM2022 Multi language Data Wrangling and Acquisition Conversational Agents (in Raku)\"](https://www.youtube.com/watch?v=pQk5jwoMSxs),\n(2022),\n[Anton Antonov's channel at YouTube](https://www.youtube.com/channel/UC5qMPIsJeztfARXWdIw3Xzw).\n\n[AAv5] Anton Antonov,\n[\"Implementing Machine Learning algorithms in Raku\" at TRC-2022](https://conf.raku.org/talk/170)\n(2022),\n[The Raku Conference 2022](https://conf.raku.org/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-dsl-english-dataqueryworkflows","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantononcube%2Fraku-dsl-english-dataqueryworkflows","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-dsl-english-dataqueryworkflows/lists"}