Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/JuliaArrays/AxisArrays.jl

Performant arrays where each dimension can have a named axis with values
https://github.com/JuliaArrays/AxisArrays.jl
Last synced: 3 months ago
JSON representation
Performant arrays where each dimension can have a named axis with values
Host: GitHub
URL: https://github.com/JuliaArrays/AxisArrays.jl
Owner: JuliaArrays
License: other
Created: 2015-02-13T01:08:29.000Z (about 10 years ago)
Default Branch: master
Last Pushed: 2024-04-08T15:50:03.000Z (11 months ago)
Last Synced: 2024-11-09T03:29:25.406Z (3 months ago)
Language: Julia
Homepage: http://JuliaArrays.github.io/AxisArrays.jl/latest/
Size: 1.04 MB
Stars: 200
Watchers: 14
Forks: 42
Open Issues: 67
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project

awesome-sciml - JuliaArrays/AxisArrays.jl: Performant arrays where each dimension can have a named axis with values
README

        # AxisArrays.jl

[![][action-img]][action-url]

[![][pkgeval-img]][pkgeval-url]

[![][codecov-img]][codecov-url]

This package for the Julia language provides an array type (the `AxisArray`) that knows about its dimension names and axis values.

This allows for indexing by name without incurring any runtime overhead.

This permits one to implement algorithms that are oblivious to the storage order of the underlying arrays.

AxisArrays can also be indexed by the values along their axes, allowing column names or interval selections.

In contrast to similar approaches in [Images.jl](https://github.com/timholy/Images.jl) and [NamedArrays.jl](https://github.com/davidavdav/NamedArrays), this allows for type-stable selection of dimensions and compile-time axis lookup.  It is also better suited for regularly sampled axes, like samples over time.

Collaboration is welcome! This is still a work-in-progress. See [the roadmap](https://github.com/JuliaArrays/AxisArrays.jl/issues/7) for the project's current direction.

### Note about `Axis{}` and keywords

An `AxisArray` stores an object of type `Axis{:name}` for each dimension, 

containing both the name (a `Symbol`) and the "axis values" (an `AbstractVector`).

These types are what made compile-time lookup possible.

Instead of providing them explicitly, it is now possible to use keyword arguments

for both construction and indexing:

```julia

V = AxisArray(rand(10); row='a':'j')  # AxisArray(rand(10), Axis{:row}('a':'j'))

V[row='c'] == V[Axis{:row}('c')] == V[row=3] == V[3] 

```

### Note about `axes()` and `indices()`

The function `AxisArrays.axes` returns the tuple of such `Axis` objects. 

Since Julia version 0.7, `Base.axes(V) == (1:10,)` gives instead the range of possible 

ordinary integer indices. (This was called `Base.indices`.) Since both names are exported, 

this collision results in a warning if you try to use `axes` without qualification:

```julia

julia> axes([1,2])

WARNING: both AxisArrays and Base export "axes"; uses of it in module Main must be qualified

ERROR: UndefVarError: axes not defined

```

Packages that are upgrading to support Julia 0.7+ should:

* Replace all uses of the `axes` function with the fully-qualified `AxisArrays.axes`

* Replace all uses of the deprecated `indices` function with the un-qualified `axes`

* Immediately after `using AxisArrays`, add `const axes = Base.axes`

In the future, AxisArrays will be looking for a new name for its functionality.

This will allow you to use the idiomatic `Base` name and offers an easy upgrade

path to whatever the new name will be.

## Example of currently-implemented behavior:

```julia

julia> using Pkg; pkg"add AxisArrays Unitful"

julia> using AxisArrays, Unitful, Random

julia> fs = 40000; # Generate a 40kHz noisy signal, with spike-like stuff added for testing

julia> import Unitful: s, ms, µs

julia> rng = Random.MersenneTwister(123); # Seed a random number generator for repeatable examples

julia> y = randn(rng, 60*fs+1)*3;

julia> for spk = (sin.(0.8:0.2:8.6) .* [0:0.01:.1; .15:.1:.95; 1:-.05:.05] .* 50,

                  sin.(0.8:0.4:8.6) .* [0:0.02:.1; .15:.1:1; 1:-.2:.1] .* 50)

           i = rand(rng, round(Int,.001fs):1fs)

           while i+length(spk)-1 < length(y)

               y[i:i+length(spk)-1] += spk

               i += rand(rng, round(Int,.001fs):1fs)

           end

       end

julia> A = AxisArray(hcat(y, 2 .* y); time = (0s:1s/fs:60s), chan = ([:c1, :c2]))

2-dimensional AxisArray{Float64,2,...} with axes:

    :time, 0.0 s:2.5e-5 s:60.0 s

    :chan, Symbol[:c1, :c2]

And data, a 2400001×2 Array{Float64,2}:

  3.5708     7.14161

  6.14454   12.2891  

  3.42795    6.85591

  1.37825    2.75649

 -1.19004   -2.38007

 -1.99414   -3.98828

  2.9429     5.88581

 -0.226449  -0.452898

  0.821446   1.64289

 -0.582687  -1.16537

  ⋮                  

 -3.50593   -7.01187

  2.26783    4.53565

 -0.16902   -0.33804

 -3.84852   -7.69703

  0.226457   0.452914

  0.560809   1.12162

  4.67663    9.35326

 -2.41005   -4.8201  

 -3.71612   -7.43224

```

AxisArrays behave like regular arrays, but they additionally use the axis

information to enable all sorts of fancy behaviors. For example, we can specify

indices in *any* order, just so long as we annotate them with the axis name:

```julia

julia> A[time=4] # or A[Axis{:time}(4)]

1-dimensional AxisArray{Float64,1,...} with axes:

    :chan, Symbol[:c1, :c2]

And data, a 2-element Array{Float64,1}:

 1.37825

 2.75649

julia> A[chan = :c2, time = 1:5] # or A[Axis{:chan}(:c2), Axis{:time}(1:5)]

1-dimensional AxisArray{Float64,1,...} with axes:

    :time, 0.0 s:2.5e-5 s:0.0001 s

And data, a 5-element Array{Float64,1}:

  7.14161

 12.2891

  6.85591

  2.75649

 -2.38007

```

We can also index by the *values* of each axis using an `Interval` type that

selects all values between two endpoints `a .. b` or the axis values directly.

Notice that the returned AxisArray still has axis information itself... and it

still has the correct time information for those datapoints!

```julia

julia> A[40µs .. 220µs, :c1]

1-dimensional AxisArray{Float64,1,...} with axes:

    :time, 5.0e-5 s:2.5e-5 s:0.0002 s

And data, a 7-element Array{Float64,1}:

  3.42795

  1.37825

 -1.19004

 -1.99414

  2.9429  

 -0.226449

  0.821446

julia> AxisArrays.axes(ans, 1)

AxisArrays.Axis{:time,StepRangeLen{Quantity{Float64, Dimensions:{𝐓}, Units:{s}},Base.TwicePrecision{Quantity{Float64, Dimensions:{𝐓}, Units:{s}}},Base.TwicePrecision{Quantity{Float64, Dimensions:{𝐓}, Units:{s}}}}}(5.0e-5 s:2.5e-5 s:0.0002 s)

```

You can also index by a single value using `atvalue(t)`. 

This function is not needed for categorical axes like `:chan` here, 

as `:c1` is a `Symbol` which can't be confused with an integer index.

Using `atvalue()` will drop a dimension (like using a single integer). 

Indexing with an `Interval(lo, hi)` type retains dimensions, even

when the ends of the interval are equal (like using a range `1:1`):

```julia

julia> A[atvalue(2.5e-5s), :c1]

6.14453912336772

julia> A[2.5e-5s..2.5e-5s, :c1]

1-dimensional AxisArray{Float64,1,...} with axes:

    :time, 2.5e-5 s:2.5e-5 s:2.5e-5 s

And data, a 1-element Array{Float64,1}:

 6.14454

```

You can even index by multiple values by broadcasting `atvalue` over an array:

```julia

julia> A[atvalue.([2.5e-5s, 75.0µs])]

2-dimensional AxisArray{Float64,2,...} with axes:

    :time, Quantity{Float64, Dimensions:{𝐓}, Units:{s}}[2.5e-5 s, 7.5e-5 s]

    :chan, Symbol[:c1, :c2]

And data, a 2×2 Array{Float64,2}:

 6.14454  12.2891

 1.37825   2.75649

```

Sometimes, though, what we're really interested in is a window of time about a

specific index. One of the operations above (looking for values in the window from 40µs

to 220µs) might be more clearly expressed as a symmetrical window about a

specific index where we know something interesting happened. To represent this,

we use the `atindex` function:

```julia

julia> A[atindex(-90µs .. 90µs, 5), :c2]

1-dimensional AxisArray{Float64,1,...} with axes:

    :time_sub, -7.5e-5 s:2.5e-5 s:7.500000000000002e-5 s

And data, a 7-element Array{Float64,1}:

  6.85591

  2.75649

 -2.38007

 -3.98828

  5.88581

 -0.452898

  1.64289

```

Note that the returned AxisArray has its time axis shifted to represent the

interval about the given index!  This simple concept can be extended to some

very powerful behaviors. For example, let's threshold our data and find windows

about those threshold crossings.

```julia

julia> idxs = findall(diff(A[:,:c1] .< -15) .> 0);

julia> spks = A[atindex(-200µs 
2-dimensional 
    :time_sub, 
    :time_rep, 
And data, a 41×247 
   0.672063    7.25649 
  -1.65182     2.57487 
   4.46035     2.11313 
   5.25651    -2.19785 
   7.8537      0.345008 
   0.466816    0.643649 
  -6.0445    -13.2441 
  -9.21703   -13.2254 
 -16.1809    -22.7037 
 -23.2671    -31.2021 
   ⋮ 
  -0.301629 
   4.7182      1.18615 
  -2.43775    -0.151882 
   3.2482     -0.60967 
  -4.26967     2.24747 
  -0.752487    1.69446 
   4.64348    -3.94187 
   1.09652    -2.66999 
  -0.94341     2.60785 
```

.. 800µs, idxs), :c1] AxisArray{Float64,2,...} with axes: -0.0002 s:2.5e-5 s:0.0008 s Quantity{Float64, Dimensions:{𝐓}, Units:{s}}[0.162 s, 0.20045 s, 0.28495 s, 0.530325 s, 0.821725 s, 1.0453 s, 1.11967 s, 1.1523 s, 1.22085 s, 1.6253 s  …  57.0094 s, 57.5818 s, 57.8716 s, 57.8806 s, 58.4353 s, 58.7041 s, 59.1015 s, 59.1783 s, 59.425 s, 59.5657 s] Array{Float64,2}: 0.633375  …    1.54583     5.81194    -4.706 0.477408       3.09505     3.52478     4.13037 4.78372        1.23385     7.2525      3.57485 3.05933        0.965021    6.78414     5.94854 0.960533       0.812989    0.336715    0.303909 -3.67087   …    3.92978    -3.1242      0.789722 -4.60716        0.265144   -4.50987    -8.84897 -14.4409        -8.6664    -13.3457    -11.6213 -25.023        -15.9376    -28.0817    -16.996 -25.3787       -24.4914    -32.2599    -26.1118 ⋱                ⋮ 0.0683982   -4.36574        1.92362    -5.12333    -3.4431 4.40717       -4.51757    -8.64314     0.0800021 -1.40817       -3.38555    -2.23418     0.728549 0.471288  …    2.53395     0.468817   -3.65905 -3.13758        1.74967     4.5052     -0.145357 -1.20491        1.71429     1.81936     0.290158 -1.59213        7.15428    -0.539748    4.82309 0.521931      -3.80528     1.70421     3.40583 -3.34291   …    1.10584     4.31118     3.6404

By indexing with a repeated interval, we have *added* a dimension to the

output! The returned AxisArray's columns specify each repetition of the

interval, and each datapoint in the column represents a timepoint within that

interval, adjusted by the time of the theshold crossing. The best part here

is that the returned matrix knows precisely where its data came from, and has

labeled its dimensions appropriately. Not only is there the proper time

base for each waveform, but we also have recorded the event times as the axis

across the columns.

## Indexing

Two main types of Axes supported by default include:

* Categorical axis -- These are vectors of labels, normally symbols or

  strings. Elements or slices can be selected by elements or vectors

  of elements.

* Dimensional axis -- These are sorted vectors or iterators that can

  be selected by `Intervals`. These are commonly used for sequences of

  times or date-times. For regular sample rates, ranges can be used.

Here is an example with a Dimensional axis representing a time

sequence along rows and a Categorical axis of symbols for column

headers.

```julia

B = AxisArray(reshape(1:15, 5, 3), .1:.1:0.5, [:a, :b, :c])

B[row = (0.2..0.4)] # restrict the AxisArray along the time axis

B[0.0..0.3, [:a, :c]]   # select an interval and two of the columns

```

User-defined axis types can be added along with custom indexing

behaviors.

### Example: compute the intensity-weighted mean along the z axis

```julia

B = AxisArray(randn(100,100,100), :x, :y, :z)

Itotal = sumz = 0.0

for iter in CartesianIndices(Base.axes(B))  # traverses in storage order for cache efficiency

    global Itotal, sumz

    I = B[iter]  # intensity in a single voxel

    Itotal += I

    sumz += I * iter[axisdim(B, Axis{:z})]  # axisdim "looks up" the z dimension

end

meanz = sumz/Itotal

```

The intention is that all of these operations are just as efficient as they would be if you used traditional position-based indexing with all the inherent assumptions about the storage order of `B`.

[pkgeval-img]: https://juliaci.github.io/NanosoldierReports/pkgeval_badges/A/AxisArrays.svg

[pkgeval-url]: https://juliaci.github.io/NanosoldierReports/pkgeval_badges/report.html

[action-img]: https://github.com/JuliaArrays/AxisArrays.jl/workflows/Unit%20test/badge.svg

[action-url]: https://github.com/JuliaArrays/AxisArrays.jl/actions

[codecov-img]: https://codecov.io/github/JuliaArrays/AxisArrays.jl/coverage.svg?branch=master

[codecov-url]: https://codecov.io/gh/JuliaArrays/AxisArrays.jl