https://github.com/antononcube/raku-data-summarizers
Raku package with summarization functions for different types of data.
https://github.com/antononcube/raku-data-summarizers
Last synced: about 1 year ago
JSON representation
Raku package with summarization functions for different types of data.
- Host: GitHub
- URL: https://github.com/antononcube/raku-data-summarizers
- Owner: antononcube
- License: artistic-2.0
- Created: 2021-10-24T15:03:36.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-08-02T13:33:08.000Z (almost 2 years ago)
- Last Synced: 2025-02-12T05:38:52.427Z (over 1 year ago)
- Language: Raku
- Size: 101 KB
- Stars: 1
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README-work.md
- License: LICENSE
Awesome Lists containing this project
README
# Raku Data::Summarizers
[](http://ci.sparrowhub.io)
[](https://opensource.org/licenses/Artistic-2.0)
This Raku package has data summarizing functions for different data structures that are
coercible to full arrays.
The supported data structures (so far) are:
- 1D Arrays
- 1D Lists
- Positional-of-hashes
- Positional-of-arrays
------
## Usage examples
### Setup
Here we load the Raku modules
[`Data::Generators`](https://modules.raku.org/dist/Data::Generators:cpan:ANTONOV),
[`Data::Reshapers`](https://modules.raku.org/dist/Data::Reshapers:cpan:ANTONOV),
[`Text::Plot`](https://modules.raku.org/dist/Text::Plot:cpan:ANTONOV),
and this module,
[`Data::Summarizers`](https://github.com/antononcube/Raku-Data-Summarizers):
```perl6
use Data::Generators;
use Data::Reshapers;
use Text::Plot;
use Data::Summarizers;
```
### Summarize vectors
Here we generate a numerical vector, place some NaN's or Whatever's in it:
```perl6
my @vec = [^1001].roll(12);
@vec = @vec.append( [NaN, Whatever, Nil]);
@vec .= pick(@vec.elems);
@vec
```
Here we summarize the vector generated above:
```perl6
records-summary(@vec)
```
### Summarize tabular datasets
Here we generate a random tabular dataset with 16 rows and 3 columns and display it:
```perl6
srand(32);
my $tbl = random-tabular-dataset(16,
,
generators=>[random-pet-name(4), -> $n { ((^20).rand xx $n).List }, random-string(6)]);
to-pretty-table($tbl)
```
**Remark:** The values of the column "Pet" is sampled from a set of four pet names, and the values of the column
and "Code" is sampled from a set of 6 strings.
Here we summarize the tabular dataset generated above:
```perl6
records-summary($tbl)
```
### Summarize collections of tabular datasets
Here is a hash of tabular datasets:
```perl6
my %group = group-by($tbl, 'Pet');
%group.pairs.map({ say("{$_.key} =>"); say to-pretty-table($_.value) });
```
Here is the summary of that collection of datasets:
```perl6
records-summary(%group)
```
### Pareto principle statistic
Here is vector of 200 random (normally distributed) numbers:
```perl6
my @vec = random-variate(NormalDistribution.new(30, 20), 200);
```
Here we compute the
[Pareto principle statistic](https://en.wikipedia.org/wiki/Pareto_principle)
and plot it:
```perl6
text-list-plot(pareto-principle-statistic(@vec))
```
### Skim
*TBD...*
------
## TODO
- [ ] User specified `NA` marker
- [ ] Tabular dataset summarization tests
- [ ] Skimmer
- [ ] Peek-er
------
## References
### Functions, repositories
[AAf1] Anton Antonov,
[RecordsSummary](https://resources.wolframcloud.com/FunctionRepository/resources/RecordsSummary),
(2019),
[Wolfram Function Repository](https://resources.wolframcloud.com/FunctionRepository).