{"id":20982442,"url":"https://github.com/juliaaplavin/flexigroups.jl","last_synced_at":"2025-08-12T18:16:09.702Z","repository":{"id":238906051,"uuid":"759481523","full_name":"JuliaAPlavin/FlexiGroups.jl","owner":"JuliaAPlavin","description":null,"archived":false,"fork":false,"pushed_at":"2024-12-16T16:15:03.000Z","size":64,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-21T07:36:06.765Z","etag":null,"topics":["data-manipulation","grouping"],"latest_commit_sha":null,"homepage":"","language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JuliaAPlavin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-18T17:57:03.000Z","updated_at":"2024-12-16T16:15:09.000Z","dependencies_parsed_at":"2024-05-08T20:12:12.827Z","dependency_job_id":null,"html_url":"https://github.com/JuliaAPlavin/FlexiGroups.jl","commit_stats":null,"previous_names":["juliaaplavin/flexigroups.jl"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaAPlavin%2FFlexiGroups.jl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaAPlavin%2FFlexiGroups.jl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaAPlavin%2FFlexiGroups.jl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaAPlavin%2FFlexiGroups.jl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JuliaAPlavin","download_url":"https://codeload.github.com/JuliaAPlavin/FlexiGroups.jl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243384986,"owners_count":20282474,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-manipulation","grouping"],"created_at":"2024-11-19T05:45:39.100Z","updated_at":"2025-03-13T10:26:32.544Z","avatar_url":"https://github.com/JuliaAPlavin.png","language":"Julia","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FlexiGroups.jl\n\nArrange tabular or non-tabular datasets into groups according to a specified key function.\n\nThe main principle of `FlexiGroups` is that the result of a grouping operation is always a collection of groups, and each group is a collection of elements. Groups are typically indexed by the grouping key.\n\n## `group`/`groupview`/`groupmap`\n\n`group([keyf=identity], X; [restype=Dictionary])`: group elements of `X` by `keyf(x)`, returning a mapping `keyf(x)` values to lists of `x` values in each group.\n\nThe result is an (ordered) `Dictionary` by default, but can be changed to the base `Dict` or another dictionary type.\n\nAlternatively to dictionaries, specifying `restype=KeyedArray` (from `AxisKeys.jl`) results in a `KeyedArray`. Its `axiskeys` are the group keys.\n\n```julia\nxs = 3 .* [1, 2, 3, 4, 5]\ng = group(isodd, xs)\n# g == dictionary([true =\u003e [3, 9, 15], false =\u003e [6, 12]]) from Dictionaries.jl\n\n\ng = group(x -\u003e (a=isodd(x),), xs; restype=KeyedArray)\n# g == KeyedArray([[6, 12], [3, 9, 15]]; a=[false, true])\n```\n\n`groupview([keyf=identity], X; [restype=Dictionary])`: like the `group` function, but each group is a `view` of `X` and doesn't copy the input elements.\n\n`groupmap([keyf=identity], mapf, X; [restype=Dictionary])`: like `map(mapf, group(keyf, X))`, but more efficient. Supports a limited set of `mapf` functions: `length`, `first`/`last`, `only`, `rand`.\n\n# Margins\n\n`addmargins(dict; [combine=flatten])`: add margins to a grouping for all combinations of group key components.\n\nFor example, if a dataset is grouped by `a` and `b` (`keyf=x -\u003e (;x.a, x.b)` in `group`), this adds groups for each `a` value and for each `b` value separately.\n\n`combine` specifies how multiple groups are combined into margins. The default `combine=flatten` concatenates all relevant groups into a single collection.\n\n```julia\nxs = [(a=1, b=:x), (a=2, b=:x), (a=2, b=:y), (a=3, b=:x), (a=3, b=:x), (a=3, b=:x)]\n\n# basic grouping by unique combinations of a and b\ng = group(xs)\nmap(length, g) == dictionary([(a=1, b=:x) =\u003e 1, (a=2, b=:x) =\u003e 1, (a=2, b=:y) =\u003e 1, (a=3, b=:x) =\u003e 3])\n\n# add margins\ngm = addmargins(g)\nmap(length, gm) == dictionary([\n    (a=1, b=:x) =\u003e 1, (a=2, b=:x) =\u003e 1, (a=2, b=:y) =\u003e 1, (a=3, b=:x) =\u003e 3,  # original grouping result\n    (a=1, b=total) =\u003e 1, (a=2, b=total) =\u003e 2, (a=3, b=total) =\u003e 3,  # margins for all values of a\n    (a=total, b=:x) =\u003e 5, (a=total, b=:y) =\u003e 1,  # margins for all values of b\n    (a=total, b=total) =\u003e 6,  # total\n])\n```\n\n# More examples\n\nCompute the fraction of elements in each group:\n```julia\njulia\u003e using FlexiGroups, DataPipes\n\njulia\u003e x = rand(1:100, 100);\n\njulia\u003e @p x |\u003e\n       groupmap(_ % 3, length) |\u003e  # group by x % 3 and compute length of each group\n       FlexiGroups.addmargins(combine=sum) |\u003e  # append the margin - here, the total of all group lengths\n       __ ./ __[total]  # divide lengths by that total\n4-element Dictionaries.Dictionary{Union{Colon, Int64}, Float64}\n       2 │ 0.34\n       0 │ 0.33\n       1 │ 0.33\n   total │ 1.0\n```\n\nPerform per-group computations and combine into a single flat collection:\n```julia\njulia\u003e using FlexiGroups, FlexiMaps, DataPipes, StructArrays\n\njulia\u003e x = rand(1:100, 10)\n10-element Vector{Int64}:\n 70\n 57\n 57\n 69\n 61\n 74\n 31\n 39\n 48\n 96\n\n# regular flatmap: puts all elements of the first group first, then the second, and so on\n# the resulting order is different from the original `x` above\njulia\u003e @p x |\u003e\n       groupview(_ % 3) |\u003e\n       flatmap(StructArray(x=_, ind_in_group=eachindex(_)))\n10-element StructArray(::Vector{Int64}, ::Vector{Int64}) with eltype NamedTuple{(:x, :ind_in_group), Tuple{Int64, Int64}}:\n (x = 70, ind_in_group = 1)\n (x = 61, ind_in_group = 2)\n (x = 31, ind_in_group = 3)\n (x = 57, ind_in_group = 1)\n (x = 57, ind_in_group = 2)\n (x = 69, ind_in_group = 3)\n (x = 39, ind_in_group = 4)\n (x = 48, ind_in_group = 5)\n (x = 96, ind_in_group = 6)\n (x = 74, ind_in_group = 1)\n\n# flatmap_parent: puts elements in the same order as they were in the parent `x` array above\njulia\u003e @p x |\u003e\n       groupview(_ % 3) |\u003e\n       flatmap_parent(StructArray(x=_, ind_in_group=eachindex(_)))\n10-element StructArray(::Vector{Int64}, ::Vector{Int64}) with eltype NamedTuple{(:x, :ind_in_group), Tuple{Int64, Int64}}:\n (x = 70, ind_in_group = 1)\n (x = 57, ind_in_group = 1)\n (x = 57, ind_in_group = 2)\n (x = 69, ind_in_group = 3)\n (x = 61, ind_in_group = 2)\n (x = 74, ind_in_group = 1)\n (x = 31, ind_in_group = 3)\n (x = 39, ind_in_group = 4)\n (x = 48, ind_in_group = 5)\n (x = 96, ind_in_group = 6)\n```\n\nPivot tables:\n```julia\njulia\u003e using FlexiGroups, DataPipes, StructArrays, AxisKeys\n\n# generate a simple table\njulia\u003e x = @p rand(1:100, 100) |\u003e map((value=_, mod3=_ % 3, mod5=_ % 5)) |\u003e StructArray\n100-element StructArray(::Vector{Int64}, ::Vector{Int64}, ::Vector{Int64}) with eltype NamedTuple{(:value, :mod3, :mod5), Tuple{Int64, Int64, Int64}}:\n (value = 29, mod3 = 2, mod5 = 4)\n (value = 93, mod3 = 0, mod5 = 3)\n (value = 1, mod3 = 1, mod5 = 1)\n (value = 57, mod3 = 0, mod5 = 2)\n (value = 2, mod3 = 2, mod5 = 2)\n ⋮\n\n# compute sum of `value`s grouped by `mod3` and `mod5`\njulia\u003e @p x |\u003e\n       group((; _.mod3, _.mod5); restype=KeyedArray) |\u003e\n       map(sum(_.value))\n2-dimensional KeyedArray(NamedDimsArray(...)) with keys:\n↓   mod3 ∈ 3-element Vector{Int64}\n→   mod5 ∈ 5-element Vector{Int64}\nAnd data, 3×5 Matrix{Int64}:\n      (0)  (1)  (2)  (3)  (4)\n (0)  390  372  378  258  225\n (1)  480  247  372  187  362\n (2)  475   82  352  318  203\n```\n\n# Alternatives\n\n- `SplitApplyCombine.jl` also provides `group` and `groupview` functions with similar basic semantics. Notable differences of `FlexiGroups` include:\n  - margins support;\n  - more flexibility in the return container type - various dictionaries, keyed arrays;\n  - group collection type is the same as the input collection type, when possible; for example, grouping a `StructArray`s results in each group also being a `StructArray`;\n  - better return eltype and type inference;\n  - often performs faster and with fewer allocations.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliaaplavin%2Fflexigroups.jl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuliaaplavin%2Fflexigroups.jl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliaaplavin%2Fflexigroups.jl/lists"}