{"id":23029634,"url":"https://github.com/antononcube/raku-data-reshapers","last_synced_at":"2025-08-14T12:34:44.367Z","repository":{"id":40529435,"uuid":"401241845","full_name":"antononcube/Raku-Data-Reshapers","owner":"antononcube","description":"Raku package with data reshaping functions for different data structures (full arrays, Red tables, Text::CSV tables.)","archived":false,"fork":false,"pushed_at":"2024-06-11T23:13:59.000Z","size":230,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-07T03:42:15.591Z","etag":null,"topics":["data","data-transformation","data-wrangling","rakulang"],"latest_commit_sha":null,"homepage":"https://raku.land/zef:antononcube/Data::Reshapers","language":"Raku","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antononcube.png","metadata":{"files":{"readme":"README-work.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-30T06:39:35.000Z","updated_at":"2024-06-11T23:14:02.000Z","dependencies_parsed_at":"2024-01-16T23:26:39.883Z","dependency_job_id":"ce43ef12-5d6a-4bc8-be08-7cb828310d8e","html_url":"https://github.com/antononcube/Raku-Data-Reshapers","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Data-Reshapers","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Data-Reshapers/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Data-Reshapers/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Data-Reshapers/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antononcube","download_url":"https://codeload.github.com/antononcube/Raku-Data-Reshapers/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229827652,"owners_count":18130395,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-transformation","data-wrangling","rakulang"],"created_at":"2024-12-15T14:16:19.689Z","updated_at":"2024-12-15T14:16:20.470Z","avatar_url":"https://github.com/antononcube.png","language":"Raku","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Raku Data::Reshapers\n\n[![MacOS](https://github.com/antononcube/Raku-Data-Reshapers/actions/workflows/macos.yml/badge.svg)](https://github.com/antononcube/Raku-Data-Reshapers/actions/workflows/macos.yml)\n[![Linux](https://github.com/antononcube/Raku-Data-Reshapers/actions/workflows/linux.yml/badge.svg)](https://github.com/antononcube/Raku-Data-Reshapers/actions/workflows/linux.yml)\n[![Win64](https://github.com/antononcube/Raku-Data-Reshapers/actions/workflows/windows.yml/badge.svg)](https://github.com/antononcube/Raku-Data-Reshapers/actions/workflows/windows.yml)\n[![https://raku.land/zef:antononcube/Data::Reshapers](https://raku.land/zef:antononcube/Data::Reshapers/badges/version)](https://raku.land/zef:antononcube/Data::Reshapers)\n[![License: Artistic-2.0](https://img.shields.io/badge/License-Artistic%202.0-0298c3.svg)](https://opensource.org/licenses/Artistic-2.0)\n\nThis Raku package has data reshaping functions for different data structures that are \ncoercible to full arrays.\n\nThe supported data structures are:\n  - Positional-of-hashes\n  - Positional-of-arrays\n \nThe most important data reshaping provided by the package over those data structures are:\n\n- Cross tabulation, `cross-tabulate`\n- Long format conversion, `to-long-format`\n- Wide format conversion, `to-wide-format`\n- Join across (aka `SQL JOIN`), `join-across`\n- Transpose, `transpose`\n\nThe first four operations are fundamental in data wrangling and data analysis; \nsee [AA1, Wk1, Wk2, AAv1-AAv2].\n\n(Transposing of tabular data is, of course, also fundamental, but it also can be seen as a\nbasic functional programming operation.)\n\nThere are other reshaping functions for:\n\n- Flattening and tallying, \n- Simple and stratified (dataset) splitting\n- Taking, renaming, and deleting of table columns,\n- Table column separation\n\nAn overview is given in (some part of) the presentation \n[\"TRC 2022 Implementation of ML algorithms in Raku\"](https://youtu.be/efRHfjYebs4?si=-KHucA8exZ8Cxx-w\u0026t=1335),\n[AAv4]. \n\nMore detailed explanations of the data wrangling methodology and workflows is given in the article\n[\"Introduction to data wrangling with Raku\"](https://rakuforprediction.wordpress.com/2021/12/31/introduction-to-data-wrangling-with-raku/), [AA2]. \n(And its Bulgarian version [AA3].)\n\nThis package is one of the translation targets of the interpreter(s) provided by the package\n[\"DSL::English::DataQueryWorkflows\"](https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows), [AAp2].\n\n------\n\n## Usage examples\n\n### Cross tabulation\n\nMaking contingency tables -- or cross tabulation -- is a fundamental statistics and data analysis operation,\n[Wk1, AA1]. \n\nHere is an example using the \n[Titanic](https://en.wikipedia.org/wiki/Titanic) \ndataset (that is provided by this package through the function `get-titanic-dataset`):\n\n```perl6\nuse Data::Reshapers;\n\nmy @tbl = get-titanic-dataset();\nmy $res = cross-tabulate( @tbl, 'passengerSex', 'passengerClass');\nsay $res;\n```\n\n```perl6\nto-pretty-table($res);\n```\n\n### Long format\n\nConversion to long format allows column names to be treated as data.\n\n(More precisely, when converting to long format specified column names of a tabular dataset become values\nin a dedicated column, e.g. \"Variable\" in the long format.)\n\n```perl6\nmy @tbl1 = @tbl.roll(3);\n.say for @tbl1;\n```\n\n```perl6\n.say for to-long-format( @tbl1 );\n```\n\n```perl6\nmy @lfRes1 = to-long-format( @tbl1, 'id', [], variablesTo =\u003e \"VAR\", valuesTo =\u003e \"VAL2\" );\n.say for @lfRes1;\n```\n\n### Wide format\n\nHere we transform the long format result `@lfRes1` above into wide format -- \nthe result has the same records as the `@tbl1`:\n\n```perl6\nto-pretty-table( to-wide-format( @lfRes1, 'id', 'VAR', 'VAL2' ) );\n```\n\n### Transpose\n\nUsing cross tabulation result above:\n\n```perl6\nmy $tres = transpose( $res );\n\nto-pretty-table($res, title =\u003e \"Original\");\n```\n\n```perl6\nto-pretty-table($tres, title =\u003e \"Transposed\");\n```\n\n------\n\n## Type system\n\nEarlier versions of the package implemented a type \"deduction\" system. \nCurrently, the type system is provided by the package [\n\"Data::TypeSystem\"](https://resources.wolframcloud.com/FunctionRepository), [AAp1].\n\nThe type system conventions follow those of Mathematica's \n[`Dataset`](https://reference.wolfram.com/language/ref/Dataset.html) \n-- see the presentation \n[\"Dataset improvements\"](https://www.wolfram.com/broadcast/video.php?c=488\u0026p=4\u0026disp=list\u0026v=3264).\n\nHere we get the Titanic dataset, change the \"passengerAge\" column values to be numeric, \nand show dataset's dimensions:\n\n```perl6\nmy @dsTitanic = get-titanic-dataset(headers =\u003e 'auto');\n@dsTitanic = @dsTitanic.map({$_\u003cpassengerAge\u003e = $_\u003cpassengerAge\u003e.Numeric; $_}).Array;\ndimensions(@dsTitanic)\n```\n\nHere is a sample of dataset's records:\n\n```perl6\nto-pretty-table(@dsTitanic.pick(5).List, field-names =\u003e \u003cid passengerAge passengerClass passengerSex passengerSurvival\u003e)\n```\n\nHere is the type of a single record:\n\n```perl6\nuse Data::TypeSystem;\ndeduce-type(@dsTitanic[12])\n```\n\nHere is the type of single record's values:\n\n```perl6\ndeduce-type(@dsTitanic[12].values.List)\n```\n\nHere is the type of the whole dataset:\n\n```perl6\ndeduce-type(@dsTitanic)\n```\n\nHere is the type of \"values only\" records:\n\n```perl6\nmy @valArr = @dsTitanic\u003e\u003e.values\u003e\u003e.Array;\ndeduce-type(@valArr)\n```\n\nHere is the type of the string values only records:\n\n```perl6\nmy @valArr = delete-columns(@dsTitanic, 'passengerAge')\u003e\u003e.values\u003e\u003e.Array;\ndeduce-type(@valArr)\n```\n\n------\n\n## TODO\n\n1. [X] DONE Simpler more convenient interface.\n\n   - ~~Currently, a user have to specify four different namespaces\n     in order to be able to use all package functions.~~\n    \n2. [ ] TODO More extensive long format tests.\n\n3. [ ] TODO More extensive wide format tests.\n\n4. [X] DONE Implement verifications for:\n   \n    - See the type system implementation -- it has all of functionalities listed here.\n    \n    - [X] DONE Positional-of-hashes\n      \n    - [X] DONE Positional-of-arrays\n       \n    - [X] DONE Positional-of-key-to-array-pairs\n    \n    - [X] DONE Positional-of-hashes, each record of which has:\n      \n       - [X] Same keys \n       - [X] Same type of values of corresponding keys\n      \n    - [X] DONE Positional-of-arrays, each record of which has:\n    \n       - [X] Same length\n       - [X] Same type of values of corresponding elements\n\n5. [X] DONE Implement \"nice tabular visualization\" using \n   [Pretty::Table](https://gitlab.com/uzluisf/raku-pretty-table)\n   and/or\n   [Text::Table::Simple](https://github.com/ugexe/Perl6-Text--Table--Simple).\n\n6. [X] DONE Document examples using pretty tables.\n\n7. [X] DONE Implement transposing operation for:\n    - [X] hash of hashes\n    - [X] hash of arrays\n    - [X] array of hashes\n    - [X] array of arrays\n    - [X] array of key-to-array pairs \n\n8. [X] DONE Implement to-pretty-table for:\n   - [X] hash of hashes\n   - [X] hash of arrays\n   - [X] array of hashes\n   - [X] array of arrays\n   - [X] array of key-to-array pairs\n\n9. [ ] DONE Implement join-across:\n   - [X] DONE inner, left, right, outer\n   - [X] DONE single key-to-key pair\n   - [X] DONE multiple key-to-key pairs\n   - [X] DONE optional fill-in of missing values\n   - [ ] TODO handling collisions\n\n10. [X] DONE Implement semi- and anti-join\n\n11. [ ] TODO Implement to long format conversion for:\n    - [ ] TODO hash of hashes\n    - [ ] TODO hash of arrays\n\n12. [ ] TODO Speed/performance profiling.\n    - [ ] TODO Come up with profiling tests\n    - [ ] TODO Comparison with R\n    - [ ] TODO Comparison with Python\n   \n13. [ ] TODO Type system.\n    - [X] DONE Base type (Int, Str, Numeric)\n    - [X] DONE Homogenous list detection\n    - [X] DONE Association detection\n    - [X] DONE Struct discovery\n    - [ ] TODO Enumeration detection\n    - [X] DONE Dataset detection\n       - [X] List of hashes\n       - [X] Hash of hashes\n       - [X] List of lists\n       - \n14. [X] DONE Refactor the type system into a separate package.\n\n15. [X] DONE \"Simple\" or fundamental functions \n    - [X] `flatten`\n    - [X] `take-drop`\n    - [X] `tally`\n       - Currently in \"Data::Summarizers\".\n       - Can be easily, on the spot, \"implemented\" with `.BagHash.Hash`.\n    \n------\n\n## References\n\n### Articles\n\n[AA1] Anton Antonov,\n[\"Contingency tables creation examples\"](https://mathematicaforprediction.wordpress.com/2016/10/04/contingency-tables-creation-examples/), \n(2016), \n[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com).\n\n[AA2] Anton Antonov,\n[\"Introduction to data wrangling with Raku\"](https://rakuforprediction.wordpress.com/2021/12/31/introduction-to-data-wrangling-with-raku/),\n(2021),\n[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).\n\n[AA3] Anton Antonov,\n[\"Увод в обработката на данни с Raku\"](https://rakuforprediction.wordpress.com/2022/05/24/увод-в-обработката-на-данни-с-raku/),\n(2022),\n[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).\n\n[Wk1] Wikipedia entry, [Contingency table](https://en.wikipedia.org/wiki/Contingency_table).\n\n[Wk2] Wikipedia entry, [Wide and narrow data](https://en.wikipedia.org/wiki/Wide_and_narrow_data).\n\n### Functions, repositories\n\n[AAf1] Anton Antonov,\n[CrossTabulate](https://resources.wolframcloud.com/FunctionRepository/resources/CrossTabulate),\n(2019),\n[Wolfram Function Repository](https://resources.wolframcloud.com/FunctionRepository).\n\n[AAf2] Anton Antonov,\n[LongFormDataset](https://resources.wolframcloud.com/FunctionRepository/resources/LongFormDataset),\n(2020),\n[Wolfram Function Repository](https://resources.wolframcloud.com/FunctionRepository).\n\n[AAf3] Anton Antonov,\n[WideFormDataset](https://resources.wolframcloud.com/FunctionRepository/resources/WideFormDataset),\n(2021),\n[Wolfram Function Repository](https://resources.wolframcloud.com/FunctionRepository).\n\n[AAf4] Anton Antonov,\n[RecordsSummary](https://resources.wolframcloud.com/FunctionRepository/resources/RecordsSummary),\n(2019),\n[Wolfram Function Repository](https://resources.wolframcloud.com/FunctionRepository).\n\n[AAp1] Anton Antonov,\n[Data::TypeSystem Raku package](https://github.com/antononcube/Raku-Data-TypeSystem),\n(2023),\n[GitHub/antononcube](https://github.com/antononcube).\n\n[AAp2] Anton Antonov,\n[DSL::English::DataQueryWorkflows Raku package](https://github.com/antononcube/Raku-DSL-English-DataQueryWorkflows),\n(2022-2024),\n[GitHub/antononcube](https://github.com/antononcube).\n\n### Videos\n\n[AAv1] Anton Antonov,\n[\"Multi-language Data-Wrangling Conversational Agent\"](https://www.youtube.com/watch?v=pQk5jwoMSxs),\n(2020),\n[YouTube channel of Wolfram Research, Inc.](https://www.youtube.com/channel/UCJekgf6k62CQHdENWf2NgAQ).\n(Wolfram Technology Conference 2020 presentation.)\n\n[AAv2] Anton Antonov,\n[\"Data Transformation Workflows with Anton Antonov, Session #1\"](https://www.youtube.com/watch?v=iXrXMQdXOsM),\n(2020),\n[YouTube channel of Wolfram Research, Inc.](https://www.youtube.com/channel/UCJekgf6k62CQHdENWf2NgAQ).\n\n[AAv3] Anton Antonov,\n[\"Data Transformation Workflows with Anton Antonov, Session #2\"](https://www.youtube.com/watch?v=DWGgFsaEOsU),\n(2020),\n[YouTube channel of Wolfram Research, Inc.](https://www.youtube.com/channel/UCJekgf6k62CQHdENWf2NgAQ).\n\n[AAv4] Anton Antonov,\n[\"TRC 2022 Implementation of ML algorithms in Raku](https://youtu.be/efRHfjYebs4?si=-KHucA8exZ8Cxx-w),\n(2022),\n[YouTube/@AAA4Prediction](https://www.youtube.com/@AAA4prediction).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-data-reshapers","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantononcube%2Fraku-data-reshapers","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-data-reshapers/lists"}