https://github.com/juliadata/pooledarrays.jl
A pooled representation for arrays with few unique elements
https://github.com/juliadata/pooledarrays.jl
Last synced: 3 months ago
JSON representation
A pooled representation for arrays with few unique elements
- Host: GitHub
- URL: https://github.com/juliadata/pooledarrays.jl
- Owner: JuliaData
- License: other
- Created: 2016-07-06T19:18:25.000Z (over 9 years ago)
- Default Branch: main
- Last Pushed: 2023-09-17T20:34:05.000Z (over 2 years ago)
- Last Synced: 2025-09-09T07:40:13.161Z (4 months ago)
- Language: Julia
- Homepage:
- Size: 136 KB
- Stars: 50
- Watchers: 7
- Forks: 13
- Open Issues: 18
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PooledArrays.jl
[](https://github.com/JuliaData/PooledArrays.jl/actions?query=workflow%3ACI)
[](https://codecov.io/gh/JuliaData/PooledArrays.jl)
[](https://juliahub.com/ui/Packages/PooledArrays/vi11X?t=2)
[](https://juliahub.com/ui/Packages/PooledArrays/vi11X)
[](https://juliahub.com/ui/Packages/PooledArrays/vi11X)
A pooled representation of arrays for purposes of compression when there are few unique elements.
**Installation**: at the Julia REPL, `import Pkg; Pkg.add("PooledArrays")`
**Usage**:
Working with `PooledArray` objects does not differ from working with general
`AbstractArray` objects, with two exceptions:
* If you hold mutable objects in `PooledArray` it is not allowed to modify them
after they are stored in it.
* In multi-threaded context it is not safe to assign values that are not already
present in a `PooledArray`'s pool from one thread while either reading or
writing to the same array from another thread.
Keeping in mind these two restrictions, as a user, the only thing you need to
learn is how to create `PooledArray` objects. This is accomplished by passing
an `AbstractArray` to the `PooledArray` constructor:
```
julia> using PooledArrays
julia> PooledArray(["a" "b"; "c" "d"])
2×2 PooledMatrix{String, UInt32, Matrix{UInt32}}:
"a" "b"
"c" "d"
```
`PooledArray` performs compression by storing an array of reference integers and
a mapping from integers to its elements in a dictionary. In this way, if the
size of the reference integer is smaller than the size of the actual elements
the resulting `PooledArray` has a smaller memory footprint than the equivalent
`Array`. By default `UInt32` is used as a type of reference integers. However,
you can specify the reference integer type you want to use by passing it as a
second argument to the constructor. This is usually done when you know that you
will have only a few unique elements in the `PooledArray`.
```
julia> PooledArray(["a", "b", "c", "d"], UInt8)
4-element PooledVector{String, UInt8, Vector{UInt8}}:
"a"
"b"
"c"
"d"
```
Alternatively you can pass the `compress` and `signed` keyword arguments to the
`PooledArray` constructor to automatically select the reference integer type.
When you pass `compress=true` then the reference integer type is chosen to be
the smallest type that is large enough to hold all unique values in array. When
you pass `signed=true` the reference type is signed (by default it is unsigned).
```
julia> PooledArray(["a", "b", "c", "d"]; compress=true, signed=true)
4-element PooledVector{String, Int8, Vector{Int8}}:
"a"
"b"
"c"
"d"
```
**Maintenance**: PooledArrays is maintained collectively by the
[JuliaData collaborators](https://github.com/orgs/JuliaData/people).
Responsiveness to pull requests and issues can vary,
depending on the availability of key collaborators.
## Related Packages
- [IndirectArrays](https://github.com/JuliaArrays/IndirectArrays.jl)
- [CategoricalArrays](https://github.com/JuliaData/CategoricalArrays.jl)