https://github.com/denius/unitrangessortedsets.jl
Sorted set of unit ranges.
https://github.com/denius/unitrangessortedsets.jl
datastructures julia julialang
Last synced: 6 months ago
JSON representation
Sorted set of unit ranges.
- Host: GitHub
- URL: https://github.com/denius/unitrangessortedsets.jl
- Owner: denius
- License: mit
- Created: 2022-12-21T02:41:13.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-02-15T15:01:41.000Z (over 2 years ago)
- Last Synced: 2025-01-20T22:57:54.111Z (over 1 year ago)
- Topics: datastructures, julia, julialang
- Language: Julia
- Homepage:
- Size: 183 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# UnitRangesSortedSets
[](https://github.com/denius/UnitRangesSortedSets.jl/actions/workflows/CI.yml?query=branch%3Amain)
[](https://codecov.io/gh/denius/UnitRangesSortedSets.jl)
Sorted set of `UnitRange`s. Sorted in ascending order and no one range overlaps with another.
mutable struct UnitRangesSortedSet{K, TU} <: AbstractSet{TU}
`UnitRangesSortedSet` can be created like the standard `Set`:
```julia
UnitRangesSortedSet(somecontainer)
```
for example:
```julia
julia> using UnitRangesSortedSets
julia> UnitRangesSortedSet((1, 2, 4))
UnitRangesSortedSet{Int64} with 2 elements:
1:2
4:4
julia> UnitRangesSortedSet(('a':'z', 'α':'ω'))
UnitRangesSortedSet{Char} with 2 elements:
'a':'z'
'α':'ω'
julia> Random.seed!(1234);
julia> UnitRangesSortedSet(rand(1:20, 10))
UnitRangesSortedSet{Int64} with 6 elements:
5:5
7:8
10:11
15:16
18:18
20:20
```
or with `push!`:
```julia
julia> urs = UnitRangesSortedSet{Int}()
UnitRangesSortedSet{Int64}()
julia> push!(urs, 1)
UnitRangesSortedSet{Int64} with 1 element:
1:1
julia> push!(urs, 2)
UnitRangesSortedSet{Int64} with 1 element:
1:2
julia> push!(urs, 10:12)
UnitRangesSortedSet{Int64} with 2 elements:
1:2
10:12
```
Iterating over set of ranges:
```julia
julia> for r in urs @show(r) end
r = 1:2
r = 10:12
julia> for r in urs, i in r @show(i) end
i = 1
i = 2
i = 10
i = 11
i = 12
julia> for i in Iterators.flatten(urs) @show(i) end
i = 1
i = 2
i = 10
i = 11
i = 12
julia> collect(urs)
2-element Vector{UnitRange{Int64}}:
1:2
10:12
```
Deleting elements and ranges:
```julia
julia> delete!(urs, 10:11)
UnitRangesSortedSet{Int64} with 2 elements:
1:2
12:12
julia> delete!(urs, 1)
UnitRangesSortedSet{Int64} with 2 elements:
2:2
12:12
```
# SubSet
It is possible to create the subset of `UnitRangesSortedSet`, like a `view` for `Array`s:
```julia
julia> urs = UnitRangesSortedSet((1:2, 10:12))
UnitRangesSortedSet{Int64} with 2 elements:
1:2
10:12
julia> ss = subset(urs, 0:10)
2-element subset(UnitRangesSortedSet{Int64}, DataStructures.Tokens.IntSemiToken(3):DataStructures.Tokens.IntSemiToken(4)):
1:2
10:10
```
The `subset` object is an static, iterable view of the container.
# Two types of `UnitRangesSortedSet`
The first type `UnitRangesSortedSet{K}` contains `SortedDict{K,K}`,
```julia
mutable struct UnitRangesSortedSet{K,TU} <: AbstractUnitRangesSortedContainer{K,TU}
ranges::SortedDict{K,K,FOrd}
end
```
where each element of the dict contains the `first(range)` as key, and the `last(range)` as value.
The second implementation `VecUnitRangesSortedSet{K}` is based on `Vector{K}`s:
```julia
mutable struct VecUnitRangesSortedSet{K,TU} <: AbstractUnitRangesSortedContainer{K,TU}
rstarts::Vector{K}
rstops::Vector{K}
end
```
where `rstarts::Vector{K}` and `rstops::Vector{K}` are the starts and stops of
the ranges respectively.
These two implementations have a similar API but different speeds.
In either case, both of them can be converted to each other using the appropriate constructor.
# Benchmarking
All results of benchmarks in the file [test-bench-results.md](test/test-bench-results.md).
Main conclusions of benchmarking:
* in any case of iterating over `range`s or consecutively element-wise in any `AbstractUnitRangesSortedSet` is
much much faster then in any another variant.
* element-wise iterating, and over ranges iterating, in `VecUnitRangesSortedSet` is faster by
the orders over `UnitRangesSortedSet`.
* when created from elements in random order, `UnitRangesSortedSet` is vastly superior
to the `Vec` variant.
* creating in consecutively element-wise order, `VecUnitRangesSortedSet` is an order of magnitude faster
than creating a set of the second type.
* in searching operations (`in()`, `subset()`) `VecUnitRangesSortedSet` variant is faster:
in Julia-v1.6 it is twice as fast, in Julia-1.8 the speedup is about 20-30%.
* if your range diapason is about some millions of elements then the `BitSet` is the best choice
for creating. And then `convert(UnitRangesSortedSet, someBitSetContainer)` is the solution to
have the fast iteration over container.
### Note
For `Char`, `StepRange{Char,UInt8}` will be used, with a step of `oneunit(UInt8)` if needed.