https://github.com/mcabbott/oddarrays.jl

☯︎
https://github.com/mcabbott/oddarrays.jl
Last synced: 5 months ago
JSON representation
☯︎
Host: GitHub
URL: https://github.com/mcabbott/oddarrays.jl
Owner: mcabbott
Created: 2021-10-23T13:13:10.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2021-11-03T16:48:12.000Z (over 4 years ago)
Last Synced: 2025-06-06T14:40:43.535Z (about 1 year ago)
Language: Julia
Size: 29.3 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # OddArrays.jl

This defines a few array types whose storage is quite different from their values:

```julia

julia> using OddArrays

julia> Rotation(pi/6) * Rotation(pi/6)  # stores one angle

2×2 Rotation{Float64}, with theta = 1.0471975511965976:

 0.5       -0.866025

 0.866025   0.5

julia> Vandermonde([0,2,10])  # coefficient vector, scale 1

3×3 Vandermonde{Int64, Vector{Int64}}:

 1   0    0

 1   2    4

 1  10  100

julia> Full(2pi, 1, 3)  # one number

1×3 Full{Float64, 2}:

 6.28319  6.28319  6.28319

julia> Range(1,2,3)  # start, stop, length::Int

3-element Range{Float64}, with step = 0.5:

 1.0

 1.5

 2.0

julia> Outer([5,7], [1,10,100])  # here two vectors, allows two matrices etc.

2×3 Outer{Int64, Vector{Int64}, Vector{Int64}}, storing 5 numbers:

 5  50  500

 7  70  700

julia> Mask([1 missing 3], [40,50,60]')  # two arrays

1×3 Mask{Int64, 2, Matrix{Union{Missing, Int64}}, Adjoint{Int64, Vector{Int64}}}:

 1  50  3

julia> PDiagMat([1,20,300])  # stores the inverse too

3×3 PDiagMat{Float64, Vector{Float64}}:

 1.0   0.0    0.0

 0.0  20.0    0.0

 0.0   0.0  300.0

```

They exist for the purpose checking that we know what we're doing with automatic differentiation. In particular, with reverse mode AD, there are issues of how to make sure the gradient stays inside the right subpace, and how best to represent it.

One problem we'd like to avoid is:

```julia

julia> using Zygote, ChainRulesCore

julia> gradient(_det, Vandermonde([3,4]))  # type's special definition, accesses fields

((coeff = [-1.0, 1.0], scale = 2.0),)

julia> gradient(prod, Vandermonde([3,4]))  # generic rule, makes a matrix

([12.0 4.0; 12.0 3.0],)

julia> gradient(x -> _det(x) / prod(x), Vandermonde([3,4]))  # without projection

ERROR: MethodError: no method matching +(::Matrix{Float64}, ::NamedTuple{(:coeff, :scale), ...})

```

There was a similar problem accumulating gradients for `Diagonal`:

```julia

julia> pullback(sum, Diagonal([3,-4]))[2](1.0)[1]

2×2 Fill{Float64}, with entries equal to 1.0

julia> pullback(x -> 5 * x.diag[1], Diagonal([3,-4]))[2](1)[1]

(diag = [5.0, 0.0],)

```

... which we can solve by standardising on the "natural" form, i.e., converting both contributions to `dx::Diagonal`:

```julia

julia> gradient(x -> sum(x) + 5 * x.diag[1], Diagonal([3,-4]))  # with Zygote#1104 + CRC#446 

2×2 Diagonal{Float64, Vector{Float64}}:

 6.0    ⋅ 

  ⋅    1.0

```

Perhaps these arrays which are nonlinear functions of their fields should instead standardise on the "structural" one?

## Structural gradients

The operation we want is the pullback of `collect`, which is called `uncollect` here. Some arrays have their own; the (slow but not wrong) fallback version uses Zygote's `pullback` and the array's own `getindex`.

```julia

julia> uncollect([0 1; 0 0], Vandermonde([3,4]))

[ Info: generic _uncollect

(coeff = [1.0, 0.0], scale = 3.0)

```

This is now called by `ProjectTo` for these arrays, which in turn is called by many generic rules, including the one for `prod`:

```julia

julia> gradient(x -> _det(x) / prod(x), Vandermonde([3,4]))

┌ Info: projecting to Tangent{Vandermonde}

└   typeof(dx) = Matrix{Float64}

[ Info: generic _uncollect

((coeff = [-0.1111111111111111, 0.0625], scale = -0.16666666666666666),)

```

Here `ProjectTo{OddArray}` saves the whole original array. Because in general the gradient subspace depends on the point.

While the possible perturbations of `θ` in `Rotation(θ)` are a 1-dimensional subspace of 2x2 matrices, the particular subspace depends on `θ`, etc. This is also why we cannot implement something like `+(::Matrix, ::Tangent{Rotation})` since, by that stage, the original `θ` has been lost.

```julia

julia> uncollect([0 1; 0 0], Rotation(pi/3))

(theta = -0.5000000000000001,)

julia> uncollect([0 1; 0 0], Rotation(pi/4))

(theta = -0.7071067811865476,)

```

Can this go wrong? We like "natural" `dx::Diagonal` so that it can flow backwards into generic rules. For this to matter, the original `x::Diagonal` must have been the output of a function which has a generic rule. Here, there are methods for multiplication of `r::Rotation`, which means one can be produced by `*` which has a generic rule. Which then fails, unless we opt out:

```julia

julia> gradient(x -> _getindex(x*x, 1,2), Rotation(pi/7))  # not good!

[ Info: *(::Rotation, ::Rotation)

ERROR: MethodError: no method matching *(::Tangent{Any, NamedTuple{(:theta,), Tuple{Float64}}}, ::Adjoint{Float64, Rotation{Float64}})

julia> gradient(x -> (_collect(Rotation(x))*_collect(Rotation(x)))[1,2], pi/7)  # desired result

(-1.2469796037174672,)

julia> ChainRulesCore.@opt_out ChainRulesCore.rrule(::typeof(*), ::Rotation, ::Rotation)

julia> gradient(x -> _getindex(x*x, 1,2), Rotation(pi/7))

[ Info: *(::Rotation, ::Rotation)

((theta = -1.2469796037174672,),)

```

There is also a method `mul!(::Vector, ::Rotation, ::Vector)` which doesn't cause problems, since it never returns a `Rotation` matrix. The generic rule does make a full matrix before `uncollect` is called, and this can't be avoided by opting out:

```julia

julia> gradient(x -> (x * [1,0])[1], Rotation(pi/7))

[ Info: mul!(_, ::Rotation, _)

┌ Info: projecting to Tangent{Rotation}

└   typeof(dx) = Matrix{Float64} (alias for Array{Float64, 2})

┌ Info: uncollect(_, ::Rotation)

└   theta = -0.4338837391175581

((theta = -0.4338837391175581,),)

julia> ChainRulesCore.@opt_out ChainRulesCore.rrule(::typeof(*), ::Rotation, ::Vector)

julia> gradient(x -> (x * [1,0])[1], Rotation(pi/7))

[ Info: mul!(_, ::Rotation, _)

ERROR: Mutating arrays is not supported -- called setindex!(::Vector{Float64}, _...)

```

## Natural gradients

For some of these types, we can plausibly standardise on a "natural" gradient instead. Here we need other functions, mapping onto a different representation of the tangent space. 

The most trivial example is probably `Diagonal` (an honorary `OddArray`). The two new functions we need in general are, `restrict` & `naturalise`, are:

```julia

julia> restrict([1 2; 3 4], Diagonal)

2×2 Diagonal{Int64, Vector{Int64}}:

 1  ⋅

 ⋅  4

julia> uncollect([1 2; 3 4], Diagonal)

(diag = [1, 4],)

julia> naturalise(ans, Diagonal)

2×2 Diagonal{Int64, Vector{Int64}}:

 1  ⋅

 ⋅  4

```

These obey the following properties, all of them a bit trivial:

```julia

x = Diagonal(rand(3))

dx = rand(3,3); dx2 = randn(3,3)

@testset "simple naturalise checks for x::$(typeof(x).name.name)" begin

  # doesn't forget things which uncollect remembers:

  @test uncollect(naturalise(uncollect(dx, x), x), x) ≈ uncollect(dx, x)

  # linearity (using that of uncollect):

  @test naturalise(uncollect(33 * dx, x), x) ≈ 33 * naturalise(uncollect(dx, x), x)

  @test naturalise(uncollect(dx + dx2, x), x) ≈ naturalise(uncollect(dx, x), x) + naturalise(uncollect(dx2, x), x)

  # this defines restrict in terms of naturalise:

  @test restrict(dx, x) ≈ naturalise(uncollect(dx, x), x)

end

```

These are also satisfied by `x = Full(2,3,3)`, if the action of `restrict` & `naturalise` is this:

```julia

julia> uncollect([10 0 5], Full(pi,1,3))

(value = 15, size = nothing)  # sum

julia> restrict([10 0 5], Full)  # doesn't depend on the point, just type

1×3 Full{Float64, 2}:

 5.0  5.0  5.0                # mean

julia> naturalise((value = 15, size = nothing), Full(pi,1,3))  # needs the size

1×3 Full{Float64, 2}:

 5.0  5.0  5.0                # value / length

```

What those tests don't check is that `naturalise` maps into the cotangent subspace corresponding to the submanifold defined by the type.

This is a bit more involved to check, but is the argument against this returning say `[15 0 0]`, which is in the same equivalence class as `[10 0 5]` and `[5 5 5]` according to `uncollect`.

You can try the above tests with `x = Full(2.0, 3, 3; one=true)`, but the new check is this:

```julia

julia> restrict([10 0 5], Full(pi, 1, 3; one=true))

1×3 OneElement(::Int64):

 15  0  0

julia> subspacetest(Full, 2.0, 3, 3; one=true);

┌ Warning: naturalise(dw, Full) has components in 8 directions outside T*S

└ @ OddArrays ~/.julia/dev/OddArrays/src/OddArrays.jl:958

julia> subspacetest(Full, 2.0, 3, 3);

[ Info: naturalise(dw, Full) seems to live in T*S, as it should

```

Less obviously, there is a correct projection for `Range` objects:

```julia

julia> uncollect([1,1,13], Range(1,2,3))

(start = 1.5, stop = 13.5, len = nothing)

julia> naturalise(ans, Range(1,2,3))

3-element Range{Float64}, with step = 6.0:

 -1.0

  5.0

 11.0

julia> ans ≈ restrict([1,1,13], Range)

true

julia> x = Range(0,ℯ,5); dx = rand(5); dx2 = rand(5);  # for above @testset

julia> subspacetest(Range, 1.2, 3.4, 5);

[ Info: naturalise(dw, Range) seems to live in T*S, as it should

```

For rotation matrices, an example of `restrict` which passes the above tests but fails `subspacetest` is this:

```julia

julia> restrict([1 2; 3 4], Rotation(pi/3))

2×2 AntiSymOne{Float64}:

  0.0      3.83013

 -3.83013  0.0

julia> x = Rotation(randn()); dx = randn(2,2); dx2 = randn(2,2);  # for above @testset

julia> subspacetest(Rotation, pi/3);

┌ Warning: naturalise(dw, Rotation) has components in 3 directions outside T*S

└ @ OddArrays ~/.julia/dev/OddArrays/src/OddArrays.jl:958

```

It's pretty that the cotangent lives in the Lie algebra, but in fact irrelevant.

The way to stay inside the submanifold is to use the dual part of this, which you could represent as a scaled rotation matrix:

```julia

julia> using ForwardDiff: Dual

julia> Rotation(Dual(pi/3, 1000))

2×2 Rotation{Dual{Nothing, Float64, 1}}, with theta = Dual{Nothing}(1.0471975511965976,1000.0):

 Dual{Nothing}(0.5,-866.025)    Dual{Nothing}(-0.866025,-500.0)

 Dual{Nothing}(0.866025,500.0)   Dual{Nothing}(0.5,-866.025)

```

This can't be a `Rotation` struct, in fact that's obvious from the start as the cotangent representation has to be a vector space, but the sum of two rotation matrices is outside the set.

## Over-parameterised types

These store more numbers than there are dimensions in the matrix subspace. They have unambiguous "structural" gradients:

```julia

julia> gradient(x -> x[1], UnitVector([3,0,4]))

┌ Info: projecting to Tangent{UnitVector}

└   typeof(dx) = OneElement{Float64, 1, Tuple{Int64}, Tuple{Base.OneTo{Int64}}}

[ Info: generic _uncollect

((raw = [0.128, 0.0, -0.096],),)

julia> gradient(x -> x[1], Outer(3, [4 5; 6 7]))

┌ Info: projecting to Tangent{Outer}

└   typeof(dx) = Matrix{Int64} (alias for Array{Int64, 2})

((x = 4, y = [3 0; 0 0], size = nothing),)

julia> gradient(x -> sum(abs2, x), Mask([1,NaN,3], [40,50,60]))  # with default OddArray projection

┌ Info: projecting to Tangent{Mask}

└   typeof(dx) = Vector{Float64} (alias for Array{Float64, 1})

((alpha = [2.0, 0.0, 6.0], beta = [0.0, 100.0, 0.0]),)

```

Do they have "natural" ones? For `Mask` can you just add `alpha + beta`:

```julia

julia> restrict([1 2 3], Mask)

1×3 Matrix{Int64}:

 1  2  3

julia> naturalise((alpha = [2 0 6], beta = [0 100 0]), Mask)

1×3 Matrix{Int64}:

 2  100  6

julia> x = Mask([1,NaN,3], [40,50,60]); dx = randn(3); dx2 = randn(3);  # for above @testset

julia> subspacetest(Mask, [1,NaN,3], [40,50,60]);

[ Info: naturalise(dw, Mask) seems to live in T*S, as it should

julia> gradient(x -> x.beta[1]^2, x)  # reads a field which doesn't contribute... garbage primal?

([80.0, 0.0, 0.0],)

```

For `Outer`, there is more serious redundancy, `Outer([4], [9,9]) == Outer([6], [6,6]) == Outer([9], [4,4])` describe the matrix `x`. And the constructor is nonlinear.

You can still make a valid `naturalise`, I think, but it's not trivial and it cannot in general re-use the struct:

```julia

julia> uncollect([3 0 0; 0 0 0], Outer([5,5], [7,7,7]))  # S is 4 dimensional here

(x = [21, 0], y = [15, 0, 0], size = nothing)

julia> naturalise(ans,  Outer([5,5], [7,7,7]))  # this cannot be written as Outer

2×3 Matrix{Float64}:

 2.0   0.5   0.5

 1.0  -0.5  -0.5

julia> uncollect(ans, Outer([5,5], [7,7,7]))

(x = [21.0, -8.881784197001252e-16], y = [15.0, -8.881784197001252e-16, 0.0], size = nothing)

julia> subspacetest(Outer, [5,6], [7,8,9]);

[ Info: naturalise(dw, Outer) seems to live in T*S, as it should

```

The case `Outer(::Matrix, ::Number)` is simpler:

```julia

julia> uncollect([1 10 100], Outer([4 5 6], 7))

(x = [7 70 700], y = 654, size = nothing)

julia> naturalise(ans, Outer([4 5 6], 7))

1×3 Matrix{Float64}:

 1.0  10.0  100.0

julia> ans == Outer([1 10 100], 1) == Outer([2 20 200], 1/2)  # but no advantage

true

```

Next, `PDiagMat` stores both the diagonal and its inverse. It specialises `*` of two such to produce a third, and opts out of the generic rule:

```julia

julia> gradient(x -> (x * x)[5], PDiagMat([1,2,3]))

[ Info: *(::PDiagMat, ::PDiagMat)

┌ Info: projecting to Tangent{PDiagMat}

└   typeof(dx) = Matrix{Float64} (alias for Array{Float64, 2})

((dim = nothing, diag = [0.0, 4.0, 0.0], inv_diag = nothing),)

julia> gradient(x -> (x * _inv(x))[5], PDiagMat([1,2,3]))  # weird, uncollect could never make this

[ Info: _inv(::PDiagMat)

[ Info: *(::PDiagMat, ::PDiagMat)

┌ Info: projecting to Tangent{PDiagMat}

└   typeof(dx) = Matrix{Float64} (alias for Array{Float64, 2})

((dim = nothing, diag = [0.0, 0.5, 0.0], inv_diag = [0.0, 2.0, 0.0]),)

julia> gradient(x -> (PDiagMat(x) * _inv(PDiagMat(x)))[5], [1,2,3])

[ Info: _inv(::PDiagMat)

[ Info: *(::PDiagMat, ::PDiagMat)

┌ Info: projecting to Tangent{PDiagMat}

└   typeof(dx) = Matrix{Float64} (alias for Array{Float64, 2})

([0.0, 0.0, 0.0],)

```

Haven't sorted these ones out.

## Discussed elsewhere

This PR https://github.com/JuliaDiff/ChainRulesCore.jl/pull/449 contains some comparable maps. (Formatted [notes.md](https://github.com/JuliaDiff/ChainRulesCore.jl/blob/wct/writing-generic-rrules/notes.md) and [examples](https://github.com/JuliaDiff/ChainRulesCore.jl/blob/wct/writing-generic-rrules/examples.jl).)

* Since `destructure == collect`, the useful map from Matrix to Tangent is called `destructure_pullback` or else `pullback_of_destructure(x)(dx)` for `uncollect(dx, x)` here.

* There is also a "Restructure", and I think `pullback_of_restructure` is playing a role like `naturalise` here. But I am not very sure.

* The `ScaledVector` example there is much like `Outer(pi, [0 1 2])` here, but `Outer` allows other things.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mcabbott/oddarrays.jl

Awesome Lists containing this project

README