https://github.com/emmt/numoptbase.jl
Basic operations on variables for multi-variate numerical optimization methods
https://github.com/emmt/numoptbase.jl
Last synced: 4 months ago
JSON representation
Basic operations on variables for multi-variate numerical optimization methods
- Host: GitHub
- URL: https://github.com/emmt/numoptbase.jl
- Owner: emmt
- License: mit
- Created: 2023-05-25T10:52:26.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-04-08T18:45:14.000Z (about 1 year ago)
- Last Synced: 2025-08-17T21:27:51.393Z (10 months ago)
- Language: Julia
- Size: 229 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Basic operations on variables for numerical optimization in Julia
[](https://github.com/emmt/NumOptBase.jl/actions/workflows/CI.yml?query=branch%3Amain)
[](https://ci.appveyor.com/project/emmt/NumOptBase-jl)
[](https://codecov.io/gh/emmt/NumOptBase.jl)
`NumOptBase` implements efficient basic operations on variables for
multi-variate numerical optimization methods in [Julia](https://julialang.org).
It is similar to the `BLAS` library for linear algebra methods.
By leveraging the methods provided by `NumOptBase`, numerical optimization
methods can be written in a general way that is agnostic to the specific type
of arrays used to store the variables of the problem. Package
[`ConjugateGradient`](https://github.com/emmt/ConjugateGradient.jl) is such an
example. The methods of `NumOptBase` are thus intended to be extended by other
packages to apply numerical optimization methods to their own variables (that
is their own array types). For instance, `NumOptBase` provides optimized
methods for [`CUDA` arrays](https://github.com/JuliaGPU/CUDA.jl) .
## Variables in optimization methods
An optimization problem typically writes:
minₓ f(x) s.t. x ∈ Ω
where `f: Ω → ℝ` is the objective function, `x` are the variables, and `Ω ⊆ ℝⁿ`
is the set of acceptable solutions with `n` the dimension of the problem.
It is assumed by this package that the variables `x` are stored in Julia
arrays. Depending on the problem, these arrays may be multidimensional but are
treated as real-valued *vectors* by the numerical optimization methods. In that
respect, complex numbers are considered as pairs of reals. For efficiency, all
entries of the arrays storing variables shall have the same floating-point type.
For now, quantities with units (such as those provided by the
[`Unitful`](https://github.com/PainterQubits/Unitful.jl) package) are not
supported. If your variables have units, you may consider using `reinterpret`
to remove units before calling the numerical optimization routines.
## Operations on variables
Some Julia methods are already available to deal with the variables of a
numerical optimization problem. For example, `similar` which may be called to
create a new array of variables provided the element type is some
floating-point real.
The `NumOptBase` package provides additional methods (described in what
follows) to operate on *variables* and which either require no additional
significant storage or store their result in an output array provided by the
caller. In that way, the storage requirements can be strictly controlled. All
these *public* methods are exported except `NumOptBase.copy!` which exists in
Julia base but with a slightly different semantic regarding vectors.
As said before, these methods treat the variables as vectors of reals, except
that methods taking multiple array arguments throw a `DimensionMismatch`
exception if these arguments do not have the same axes.
### Norms
`norm1(x)`, `norm2(x)`, and `norminf(x)` respectively yield the ℓ₁, Euclidean,
and infinite norm of the variables `x`. They are similar to the norms in
`LinearAlgebra` except that they treat `x` as if it is has been flattened into
a vector of reals.
### Inner product
`inner(x, y)` yields the inner product (also known as *scalar product*) of the
variables `x` and `y` computed as expected by numerical optimization methods;
that is as if `x` and `y` are real-valued vectors and treating complex values
as pairs of reals in that respect. In other words, if `x` and `y` are
real-valued arrays, their inner product is given by:
Σᵢ xᵢ⋅yᵢ
otherwise, if `x` and `y` are both complex-valued arrays, their inner product
is given by:
Σᵢ (real(xᵢ)⋅real(yᵢ) + imag(xᵢ)⋅imag(yᵢ))
In the above pseudo-codes, index `i` runs over all indices of `x` and `y` which
may be multi-dimensional arrays but must have the same indices.
`inner(w, x, y)` yields:
Σᵢ wᵢ⋅xᵢ⋅yᵢ
that is the *triple inner product* of the variables `w`, `x`, and `y` which
must all be real-valued arrays.
### Scaling, updating, and combining variables
`scale!(dst, α, x)` returns `dst` overwritten with `α⋅x` performed
element-wise. Arguments `dst` and `x` are arrays of the same size while `α` is
a scalar. If `iszero(α)` holds, `dst` is zero-filled whatever the values in
`x`. For in-place scaling of `x` by `α`, just call `scale!(α, x)` or `scale!(x, α)`.
`update!(x, β, y)` returns `x` overwritten with `x + β⋅y` performed
element-wise. Arguments `x` and `y` are arrays of the same size while `β` is a
scalar. If `iszero(β)` holds, `x` is left unchanged whatever the values in `y`.
The `update!` method may be seen as a shortcut for `combine!(x, 1, x, β, y)`.
`update!(x, β, y, z)` returns `x` overwritten with `x + β⋅y⋅z` performed
element-wise. Arguments `x`, `y`, and `z` are arrays of the same size while `β`
is a scalar. If `iszero(β)` holds, `x` is left unchanged whatever the values in
`y` and `z`.
`multiply!(dst, x, y)` returns `dst` overwritten with the element-wise
multiplication (also known as *Hadamard product*) of `x` by `y`. Arguments
`dst`, `x`, and `y` are arrays of the same size.
`combine!(dst, α, x, β, y)` overwrites `dst` with `α⋅x + β⋅y` and returns
`dst`. Arguments `α` and `β` are real scalars while `dst`, `x`, and `y` are
arrays of the same size. If `iszero(α)` holds, the result does not depend on
the values of `x`. Similarly, if `iszero(β)` holds, the result does not depend
on the values of `y`.
### Apply mappings
The method:
``` julia
apply!(dst, f, args...) -> dst
```
overwrites the destination variables `dst` with the result of applying the
mapping `f` to arguments `args...`.
As of now, `apply!` only handles a few types of mappings:
- If `f` is an array, a generalized matrix-vector multiplication is applied to
`args...` which must be a single array of variables.
- If `f` is `NumOptBase.Identity()`, the identity mapping is applied, that is
the values of `src` are copied into `dst` (unless they are the same object).
The constant `NumOptBase.Id` is the singleton object of type
`NumOptBase.Identity`.
- If `f` is an instance of `NumOptBase.Diag`, an element-wise multiplication by
`diag(f)` is applied.
The `NumOptBase.apply!` method shall be specialized in other argument types to
handle other cases.
### Other operations
`NumOptBase.copy!(dst, src)` overwrites the destination array `dst` with the
contents of the source array `src` throwing an error if they do not have the
same axes. If checking that the arguments have the same axes is not necessary,
the end-user may use `copyto!(dst, src)` or `copy!(dst, src)` which are basic
Julia methods.
It is assumed that a few standard Julia methods are implemented in an efficient
way for the type of array storing the variables:
- `similar(x) -> y` to create a new array of variables `y` like `x`;
- `copyto!(dst, src) -> dst` to copy source variables `src` into destination
variables `dst`;
- `fill!(x, α) -> x` to set all variables in `x` to the value `α`.
## Bound constraints
`NumOptBase` provides some support for separable bound constraints on the
variables. With such constraints, the feasible set is defined by:
```
Ω = { x ∈ ℝⁿ | ℓ ≤ x ≤ u }
```
with `ℓ` and `u` the lower and upper bounds and where the comparisons (`≤`) are
taken element-wise. To represent the feasible set for bound constrained
`N`-dimensional variables of element type `T` in Julia is done by:
``` julia
Ω = BoundedSet{T,N}(ℓ, u)
```
where the lower and upper bounds, `ℓ` and `u`, may be specified as:
- `nothing` if the bound is unlimited;
- a scalar if the bound is the same for all variables;
- an array with the same axes as the variables.
For simplicity and type-stability, there are a number of restrictions which may
be alleviated in a high level interface:
- To avoid the complexity of managing all possibilities in the methods
implementing bound constraints, bounds specified as arrays *conformable* with
the variables are not directly supported. The caller may extend the array of
bound values to the same size as the variables.
- Only `nothing` and the scalars `-∞` (for a lower bound) or `+∞` (for an upper
bound) are considered as unlimited bounds even though all values of a lower
(resp. upper) bound specified as an array may be `-∞` (resp. `+∞`).
A number of standard methods are applicable to a bounded set `Ω`:
- `isempty(Ω)` yields whether the `Ω` is empty, that is infeasible.
- `x ∈ Ω` yields whether variables `x` belongs to `Ω`.
### Projection on the feasible set
For any `x ∈ ℝⁿ`, the *projected variables* `xₚ ∈ Ω` are defined by:
```
xₚ = P(x) = argmin ‖y - x‖² s.t. y ∈ Ω
```
where `P` is the projection onto the feasible set `Ω`. In other words, `xₚ` is
the element of `Ω` that is the closest (in the least Euclidean distance sense)
to `x`.
The projected variables are computed by:
``` julia
project_variables!(xₚ, x, Ω)
```
which overwrites the destination `xₚ` with the projection of `x ∈ ℝⁿ` onto the
feasible set `Ω ⊆ ℝⁿ`.
### Projected direction and line-search
A number of numerical optimization methods proceed by iterations where, at the
`k`-th iteration, the next iterate writes:
```
xₖ₊₁ = P(xₖ ± αₖ⋅dₖ) with αₖ ≈ argmin f(P(xₖ ± α⋅dₖ)) s.t. α ≥ 0
```
with `d ∈ ℝⁿ` a well chosen search direction and where, depending on the
numerical implementation, `±` is either `+` or `-` depending on whether the
variables vary as:
``` julia
x = P(xₖ + α⋅dₖ)
```
or:
``` julia
x = P(xₖ - α⋅dₖ)
```
along the path `α ≥ 0`. The `NumOptBase` package provides some methods to help
implementing such line-search methods.
For any feasible `x ∈ Ω` and search direction `d ∈ ℝⁿ`, the *projected
direction* `dₚ ∈ ℝⁿ` is defined by:
```
∀ α ∈ [0,ε], P(x ± α⋅d) = x ± α⋅dₚ
```
for some `ε > 0` and where `P` is the projection onto the feasible set `Ω`
previously defined. In other words, `dₚ` is the effective search direction in
`Ω` for any sufficiently small step size `α`.
The projected direction is computed by:
``` julia
project_direction!(dₚ, x, ±, d, Ω)
```
which overwrites the destination `dₚ` and where `±` is either `+` or `-`.
A closely related function is:
``` julia
changing_variables!(a, x, ±, d, Ω)
```
which overwrites the destination `a` with ones where variables in `x ∈ Ω` will
vary along the direction `±d` while respecting the constraints implemented by
`Ω` and zeros elsewhere. Hence, if `±d = -∇f(x)`, with `∇f(x)` the gradient of
the objective function, the destination is set to zero everywhere the
Karush-Kuhn-Tucker (K.K.T.) conditions are satisfied for the problem:
```
minₓ f(x) s.t. x ∈ Ω
```
In other words, `all(izero, a)` holds for (exact) convergence. Note that the
projected direction `dₚ` and `a` are related by `dₚ = a .* d`.
When line-searching, two specific values of the step length `α ≥ 0` are of
interest:
- `αₘᵢₙ ≥ 0` is the greatest nonnegative step length such that:
```
α ≤ αₘᵢₙ ⟹ P(x ± α⋅d) = x ± α⋅d
```
- `αₘₐₓ ≥ 0` is the least nonnegative step length such that:
```
α ≥ αₘₐₓ ⟹ P(x ± α⋅d) = P(x ± αₘₐₓ⋅d)
```
In other words, no bounds are overcome if `0 ≤ α ≤ αₘᵢₙ` and the projected
variables are all the same for any `α` such that `α ≥ αₘₐₓ`. The values of
`αₘᵢₙ` and/or `αₘₐₓ` can be computed by one of:
``` julia
αₘᵢₙ = linesearch_stepmin(x, ±, d, Ω)
αₘₐₓ = linesearch_stepmax(x, ±, d, Ω)
αₘᵢₙ, αₘₐₓ = linesearch_limits(x, ±, d, Ω)
```
Note that, for efficiency, `project_direction!`, `changing_variables!`,
`linesearch_stepmin`, `linesearch_stepmax`, and `linesearch_limits` assume
without checking that the input variables `x` are feasible, that is that `x ∈
Ω` holds.
## Extension to other array types
To extend the `NumOptBase` to other array types, some understanding of the
implementation of this package is needed. The **public methods** which can be
called by the end-users are summarized in the following table.
| Public method | Description | Remarks |
|:--------------------------|:------------------------|:-----------------------------------|
| `similar(x)` | Yield an array like `x` | |
| `zerofill!(dst)` | Zero-fill `dst` | |
| `NumOptBase.copy!(dst,x)` | Copy `x` into `dst` | See `copyto!` and `copy!` in Julia |
| `scale!(dst,α,x)` | `dst = α*x` | |
| `update!(dst,α,x)` | `dst += α*x` | |
| `update!(dst,α,x,y)` | `dst += α*x.*y` | |
| `combine!(dst,α,x,β,y)` | `dst = α*x + β*y` | |
| `inner(x,y)` | Inner product | |
| `inner(w,x,y)` | Triple inner product | |
| `norm1(x)` | ℓ₁ norm | |
| `norm2(x)` | Euclidean norm | |
| `norminf(x)` | Infinite norm | |
In the above table and hereinafter, `dst`, `w`, `x`, and `y` denote arrays
(considered as *vectors*), `α` and `β` denote scalar reals, and all operations
and function calls are assumed to be done element-wise.
These public methods check their arguments (for having the same axes and thus
the same indices) and call one of the specialized methods listed below
depending on the operation, on the type of the array arguments, and on the
specific values of the multipliers `α` and `β`.
| Operation | Specialized method | Remarks |
|:------------------|:--------------------------------------|:-------------------------|
| `dst = 0` | `zerofill!(dst)` | |
| `dst = x` | `unsafe_copy!(dst,x)` | |
| `dst = f(x)` | `unsafe_map!(f,dst,x)` | |
| `dst = f(x,y)` | `unsafe_map!(f,dst,x,y)` | |
| `dst = α*x` | `unsafe_map!(αx(α,x),dst,x)` | `α` is not 0, nor 1 |
| `dst = α*x + y` | `unsafe_map!(αxpy(α,x),dst,x,y)` | `α` is not 0, nor 1 |
| `dst = x + β*y` | `unsafe_map!(αxpy(β,y),dst,y,x)` | `β` is not 0, nor 1 |
| `dst = α*x + β*y` | `unsafe_map!(αxpβy(α,x,β,y),dst,x,y)` | neither `α` nor `β` is 0 |
| `dst = -x` | `unsafe_map!(-,dst,x)` | |
| `dst = x + y` | `unsafe_map!(+,dst,x,y)` | |
| `dst = x - y` | `unsafe_map!(-,dst,x,y)` | |
| `dst = x * y` | `unsafe_map!(*,dst,x,y)` | |
| `inner(x,y)` | `unsafe_inner(x,y)` | |
| `inner(w,x,y)` | `unsafe_inner(w,x,y)` | |
| `norm1(x)` | `norm1(x))` | |
| `norm2(x)` | `norm2(x)` | |
| `norminf(x)` | `norminf(x)` | |
The prefix `unsafe_` means that the axes of arguments have been checked to be
compatible. Any scalar argument (`α` and `β`) shall never be zero and shall
have been converted to the correct floating-point type (this conversion is
automatically done by the constructors `αx`, `αxpy`, `αxmy`, and `αxpβy`). The
code of the high-level methods shall be simple enough for these methods to be
in-lined. This may lead to some optimizations (when the multipliers have
specific values like 0 or ±1).
Remarks:
- `unsafe_copy!(dst, x)` shall not be called when `dst` and `x` are the same
object and amounts to calling `copyto!(dst, x)` by default but may be
extended.
- `zerofill!(dst)` amounts to calling `fill!(dst, zero(eltype(dst)))` by
default but `memset(pointer(dst), 0, sizeof(dst))` may be used for dense
arrays.
- When `α` (resp. `β`) is zero, it is assumed that expression `α*x` (resp.
`β*y`) is everywhere zero whatever the values of `x` (resp. `y`).
- It can be seen that a great deal of cases are handled by `unsafe_map!`. To
avoid some overheads with closures and to allow for specialization of the
code, `αx`, `αxpy`, `αxmy`, and `αxpβy` build callable objects which have
specific types and which implement simple operation involving multipliers:
```julia
f1 = αx(α,x)
f1(xᵢ) -> α*xᵢ
f2 = αxpy(α,x)
f2(xᵢ,yᵢ) -> α*xᵢ + yᵢ
f3 = αxmy(α,x)
f3(xᵢ,yᵢ) -> α*xᵢ - yᵢ
f4 = αxpβyy(α,x,β,y)
f4(xᵢ,yᵢ) -> α*xᵢ + β*yᵢ
```
where `xᵢ` and `yᵢ` denote an entry of `x` and `y`. These constructors take
care of converting the multipliers `α` and `β`to the correct floating-point
type. This the reason to provide arrays `x` and `y` along with their
respective multiplier to the constructors.
To support specific array types or to optimize the operations for given array
types, it is sufficient to extend the specialized methods (the ones prefixed by
`unsafe_`) and the methods that compute the norms. Specializing the method
`zerofill!` is not mandatory as the default version shall work for all array
types.
You may have a look in the files
[ext/NumOptBaseLoopVectorizationExt.jl](ext/NumOptBaseLoopVectorizationExt.jl)
and [ext/NumOptBaseCUDAExt.jl](ext/NumOptBaseCUDAExt.jl) which respectively
extend `NumOptBase` to use AVX loop vectorization and Cuda GPU arrays.