https://github.com/fixedeffects/groupedarrays.jl
https://github.com/fixedeffects/groupedarrays.jl
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/fixedeffects/groupedarrays.jl
- Owner: FixedEffects
- License: other
- Created: 2021-07-24T18:58:22.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2023-06-21T14:59:50.000Z (about 3 years ago)
- Last Synced: 2025-10-21T11:57:25.768Z (8 months ago)
- Language: Julia
- Size: 88.9 KB
- Stars: 0
- Watchers: 0
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
[](https://github.com/FixedEffects/GroupedArrays.jl/actions)
## Installation
The package is registered in the [`General`](https://github.com/JuliaRegistries/General) registry and so can be installed at the REPL with
`] add GroupedArrays`.
## Introduction
GroupedArray is an AbstractArray that contains positive integers or missing values.
- `GroupedArray(x::AbstractArray)` returns a `GroupedArray` of the same length as the original array, where each distinct value is encoded as a distinct integer.
- `GroupedArray(xs...::AbstractArray)` returns a `GroupedArray` where each distinct combination of values is encoded as a distinct integer
- By default (with `coalesce = false`), `GroupedArray` encodes `missing` values as a distinct `missing` category. With `coalesce = true`, missing values are treated similarly to other values.
## Examples
```julia
using GroupedArrays
p = repeat(["a", "b", missing], outer = 2)
GroupedArray(p)
# 6-element GroupedArray{Int64, 1}:
# 1
# 2
# missing
# 1
# 2
# missing
p = repeat(["a", "b", missing], outer = 2)
GroupedArray(p; coalesce = true)
# 6-element GroupedArray{Int64, 1}:
# 1
# 2
# 3
# 1
# 2
# 3
p1 = repeat(["a", "b"], outer = 3)
p2 = repeat(["d", "e"], inner = 3)
GroupedArray(p1, p2)
# 6-element GroupedArray{Int64, 1}:
# 1
# 2
# 1
# 3
# 4
# 3
```
## Relation to other packages
- `GroupedArray` is similar to `PooledArray`, except that the pool is simply the set of integers from 1 to n where n is the number of groups(`missing` is encoded as 0). This allows for faster lookup in setups where the group value is not meaningful.
- The algorithm to group multiple vectors is taken from [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl)