https://github.com/wesselb/neuralprocesses.jl
A framework for composing Neural Processes in Julia
https://github.com/wesselb/neuralprocesses.jl
julia machine-learning neural-processes
Last synced: 15 days ago
JSON representation
A framework for composing Neural Processes in Julia
- Host: GitHub
- URL: https://github.com/wesselb/neuralprocesses.jl
- Owner: wesselb
- License: mit
- Created: 2020-02-27T10:43:13.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-05-03T10:28:26.000Z (almost 4 years ago)
- Last Synced: 2025-03-28T06:03:33.156Z (about 1 month ago)
- Topics: julia, machine-learning, neural-processes
- Language: Julia
- Homepage:
- Size: 5.15 MB
- Stars: 76
- Watchers: 6
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NeuralProcesses.jl
NeuralProcesses.jl is a framework for composing
[Neural Processes](https://arxiv.org/abs/1807.01622) built on top of
[Flux.jl](https://github.com/FluxML/Flux.jl).[NeuralProcesses.jl was presented at JuliaCon 2020 [link to video (7:41)].](https://www.youtube.com/watch?v=nq6X-w5xgLo)
See the [Neural Process Family](https://github.com/YannDubs/Neural-Process-Family) for code to reproduce the image experiments from [Convolutional Conditional Neural Processes (Gordon et al., 2020)](https://openreview.net/forum?id=Skey4eBYPS).
Contents:
- [Introduction](#introduction)
- [Predefined Experimental Setup: `train.jl`](#predefined-experimental-setup-trainjl)
- [Manual](#manual)
- [Principles](#principles)
- [Available Models for 1D Regression](#available-models-for-1d-regression)
- [Building Blocks](#building-blocks)
- [Data Generators](#data-generators)
- [Training and Evaluation](#training-and-evaluation)
- [Examples](#examples)
- [The Conditional Neural Process](#the-conditional-neural-process)
- [The Neural Process](#the-neural-process)
- [The Attentive Neural Process](#the-attentive-neural-process)
- [The Convolutional Conditional Neural Process](#the-convolutional-conditional-neural-process)
- [State of the Package](#state-of-the-package)
- [Implementation Details](#implementation-details)## Introduction
The setting of NeuralProcesses.jl is _meta-learning_. Meta-learning is concerned
with learning a map from data sets directly to predictive distributions.
Neural processes are a powerful class of parametrisations of this map based on
an _encoding_ of the data.### Predefined Experimental Setup: `train.jl`
Eager to get started?!
The file `train.jl` contains an predefined experimental setup that gets
you going immediately!
Example:```
$ julia --project=. train.jl --model convcnp --data matern52 --loss loglik --epochs 20
```Here's what it can do:
```
usage: train.jl --data DATA --model MODEL [--num-samples NUM-SAMPLES]
[--batch-size BATCH-SIZE] --loss LOSS
[--starting-epoch STARTING-EPOCH] [--epochs EPOCHS]
[--evaluate] [--evaluate-iw] [--evaluate-no-iw]
[--evaluate-num-samples EVALUATE-NUM-SAMPLES]
[--evaluate-only-within] [--models-dir MODELS-DIR]
[--bson BSON] [-h]optional arguments:
--data DATA Data set: eq-small, eq, matern52, eq-mixture,
noisy-mixture, weakly-periodic, sawtooth, or
mixture. Append "-noisy" to a data set to make
it noisy.
--model MODEL Model: conv[c]np, corconvcnp, a[c]np, or
[c]np. Append "-global-{mean,sum}" to
introduce a global latent variable. Append
"-amortised-{mean,sum}" to use amortised
observation noise. Append "-het" to use
heterogeneous observation noise.
--num-samples NUM-SAMPLES
Number of samples to estimate the training
loss. Defaults to 20 for "loglik" and 5 for
"elbo". (type: Int64)
--batch-size BATCH-SIZE
Batch size. (type: Int64, default: 16)
--loss LOSS Loss: loglik, loglik-iw, or elbo.
--starting-epoch STARTING-EPOCH
Set to a number greater than one to continue
training. (type: Int64, default: 1)
--epochs EPOCHS Number of epochs to training for. (type:
Int64, default: 20)
--evaluate Evaluate model.
--evaluate-iw Force to use importance weighting for the
evaluation objective.
--evaluate-no-iw Force to NOT use importance weighting for the
evaluation objective.
--evaluate-num-samples EVALUATE-NUM-SAMPLES
Number of samples to estimate the evaluation
loss. (type: Int64, default: 4096)
--evaluate-only-within
Evaluate with only the task of interpolation
within training range.
--models-dir MODELS-DIR
Directory to store models in. (default:
"models")
--bson BSON Directly specify the file to save the model to
and load it from.
-h, --help show this help message and exit
```## Manual
### Principles
#### Models
In NeuralProcesses.jl, models consists of an _encoder_ and a _decoder_.
```julia
model = Model(encoder, decoder)
```An encoder takes in the data and produces an abstract representation of the
data.
A decoder then takes in this representation and produces a prediction at target
inputs.#### Functional Representations and Coding
In the package, the three objects — the data, encoding, and prediction —
have a common representation. In particular, everything is represented as a
_function_: a tuple `(x, y)` that corresponds to the function `f(x[i]) = y[i]`
for all indices `i`.
Encoding and decoding, which we will collectively call _coding_, then become
_transformations of functions_.Coding is implemented by the function `code`:
```julia
xz, z = code(encoder, xc, yc, xt)
```Here `encoder` transforms the function `(xc, yc)`, the _context set_, into
another function `(xz, z)`, the abstract representation. The _target inputs_
`xt` express the desire that `encoder` _should_ (not must) output a function
that maps _from_ `xt`.
If indeed `xz == xt`, then the coding operation is called _complete_.
If, on the other hand, `xz != xt`, then the coding operation is called
_partial_.
Encoders and decoders are complete coders.
However, encoders and decoders are often composed from simpler coders, and these
coders could be partial.#### Compositional Coder Design
Coders can be composed using `Chain`:
```julia
encoder = Chain(
coder1,
coder2,
coder3
)
```Coders can also be put in _parallel_ using `Parallel`.
For example, this is useful if an encoder should output multiple encodings,
e.g. in a multi-headed architecture:```julia
encoder = Chain(
...,
# Split the output into two parts, which can then be processed by two heads.
Splitter(...),
Parallel(
..., # Head 1
... # Head 2
)
)
```By default, parallel representations are combined with concatenation along the
channel dimension (see `Materialise` in the section below), but this is
readily extended to additional designs (see `?Materialise`).#### Coder Likelihoods
A coder should output either a _deterministic_ coding or a _stochastic_
coding, which can be achieved by appending a _likelihood_:```julia
deterministic_coder = Chain(
...,
DeterministicLikelihood()
)stochastic_coder = Chain(
...,
HeterogeneousGaussianLikelihood()
)
```When a model is run, the output of the _encoder_ is _sampled_.
The resulting sample is then fed to the _decoder_.
In scenarios where the encoder outputs multiple encodings in parallel, it may be
desirable to concatenate those encodings into one big tensor which can then
be processed by the decoder.
This is achieved by prepending `Materialise()` to the decoder:```julia
decoder = Chain(
Materialise(),
...,
HeterogenousGaussian()
)
```The decoder outputs the prediction for the data.
In the above example, `decoder` produces means and variances at test inputs.### Available Models for 1D Regression
The package exports constructors for a number of architectures from the
literature.| Name | Constructor | Reference |
| :- | :- | :- |
| Conditional Neural Process | `cnp_1d` | [Garnelo, Rosenbaum, et al. (2018)](https://arxiv.org/abs/1807.01613) |
| Neural Process | `np_1d` | [Garnelo, Schwarz, et al. (2018)](https://arxiv.org/abs/1807.01622) |
| Attentive Conditional Neural Process | `acnp_1d` | [Kim, Mnih, et al. (2019)](https://openreview.net/forum?id=SkE6PjC9KX) |
| Attentive Neural Process | `anp_1d` | [Kim, Mnih, et al. (2019)](https://openreview.net/forum?id=SkE6PjC9KX) |
| Convolutional Conditional Neural Process | `convcnp_1d` | [Gordon, Bruinsma, et al. (2020)](https://openreview.net/forum?id=Skey4eBYPS) |
| Convolutional Neural Process | `convnp_1d` | [Foong, Bruinsma, et al. (2020)](https://arxiv.org/abs/2007.01332) |
| Gaussian Neural Process | `corconvcnp_1d` | [Bruinsma, Requeima et al. (2021)](https://openreview.net/forum?id=rzsDn7Vzxf) |Download links for pretrained models are below.
The instructions for how a pretrained model can be run are as follows:1. Download some [pretrained models](https://www.dropbox.com/s/ua40uc9ttzq9t18/models.tar.gz?dl=1).
2. Extract the models:
```bash
$ tar -xzvf models.tar.gz
```3. Open Julia and load the model:
```julia
using NeuralProcesses, NeuralProcesses.Experiment, Fluxconvnp = best_model("models/convnp-het/loglik/matern52.bson")
```4. Run the model:
```julia
means, lowers, uppers, samples = predict(
convnp,
randn(Float32, 10), # Random context inputs
randn(Float32, 10), # Random context outputs
randn(Float32, 10) # Random target inputs
)
```#### Pretrained Models for [Foong, Bruinsma, et al. (2020)](https://arxiv.org/abs/2007.01332)
[Download link](https://www.dropbox.com/s/ua40uc9ttzq9t18/models.tar.gz?dl=1)
Interpolation results for the pretrained models are as follows:
##### `loglik`
| Model | `eq` | `matern52` | `noisy-mixture` | `weakly-periodic` | `sawtooth` | `mixture` |
|-|:-:|:-:|:-:|:-:|:-:|:-:|
| `cnp` | -1.09 | -1.17 | -1.28 | -1.34 | -0.16 | -1.17 |
| `acnp` | -0.83 | -0.93 | -1.00 | -1.29 | -0.17 | -1.09 |
| `convcnp` | -0.69 | -0.88 | -0.93 | -1.19 | 1.09 | -0.94 |
| `np-het` | -0.76 | -0.90 | -0.94 | -1.23 | -0.16 | -0.88 |
| `anp-het` | -0.53 | -0.74 | -0.66 | -1.17 | -0.10 | -0.63 |
| `convnp-het` | -0.34 | -0.61 | -0.59 | -1.01 | 2.24 | -0.40 |##### `elbo`
| Model | `eq` | `matern52` | `noisy-mixture` | `weakly-periodic` | `sawtooth` | `mixture` |
|-|:-:|:-:|:-:|:-:|:-:|:-:|
| `np-het` | -0.34 | -0.66 | -0.66 | -1.21 | -0.12 | -0.71 |
| `anp-het` | -0.71 | -0.88 | -0.80 | -1.27 | -0.00 | -0.86 |
| `convnp-het` | -0.61 | -0.59 | -2.19 | -1.07 | 2.40 | -1.27 |##### `loglik-iw`
| Model | `eq` | `matern52` | `noisy-mixture` | `weakly-periodic` | `sawtooth` | `mixture` |
|-|:-:|:-:|:-:|:-:|:-:|:-:|
| `np-het` | -0.28 | -0.57 | -0.47 | -1.20 | 0.32 | -0.59 |
| `anp-het` | -0.45 | -0.67 | -0.57 | -1.19 | -0.16 | -0.61 |
| `convnp-het` | -0.09 | -0.29 | -0.34 | -0.98 | 2.39 | -0.31 |#### Pretrained Models for [Bruinsma, Requeima et al. (2021)](https://openreview.net/forum?id=rzsDn7Vzxf)
[Download link](https://www.dropbox.com/s/knlydai66aroorh/models.tar.gz?dl=1)
Interpolation results for the pretrained models are as follows:
##### `loglik`
| Model | `eq-noisy` | `matern52-noisy` | `noisy-mixture` | `weakly-periodic-noisy` | `sawtooth-noisy` | `mixture-noisy` |
|-|:-:|:-:|:-:|:-:|:-:|:-:|
| `convcnp` | -0.80 | -0.95 | -0.95 | -1.20 | 0.55 | -0.93 |
| `corconvcnp` | 0.70 | 0.30 | 0.96 | -0.47 | 0.42 | 0.10 |
| `anp-het` | -0.61 | -0.75 | -0.73 | -1.19 | 0.34 | -0.69 |
| `convnp-het` | -0.46 | -0.67 | -0.53 | -1.02 | 1.20 | -0.50 |### Building Blocks
The package provides various building blocks that can be used to compose
encoders and decoders.
For some building blocks, there is a constructor function available that can be
used to more easily construct the block.
More information about a block can be obtained by using the built-in help
function, e.g. `?LayerNorm`.#### Glue
| Type | Constructor | Description |
| :- | :- | :- |
| `Chain` | | Put things in sequence. |
| `Parallel` | | Put things in parallel. |#### Basic Blocks
| Type | Constructor | Description |
| :- | :- | :- |
| `BatchedMLP` | `batched_mlp` | Batched MLP. |
| `BatchedConv` | `build_conv` | Batched CNN. |
| `Splitter` | | Split off a given number of channels. |
| `LayerNorm` | | Layer normalisation. |
| `MeanPooling` | | Mean pooling. |
| `SumPooling` | | Sum pooling. |#### Advanced Blocks
| Type | Constructor | Description |
| :- | :- | :- |
| `Attention` | `attention` | Attentive mechanism. |
| `SetConv` | `set_conv` | Set convolution. |
| `SetConvPD` | `set_conv` | Set convolution for kernel functions. |#### Likelihoods
| Type | Constructor | Description |
| :- | :- | :- |
| `DeterministicLikelihood` | | DeterministicLikelihood output. |
| `FixedGaussianLikelihood` | | Gaussian likelihood with a fixed variance. |
| `AmortisedGaussianLikelihood` | | Gaussian likelihood with a fixed variance that is calculated from split-off channels. |
| `HeterogeneousGaussianLikelihood` | | Gaussian likelihood with input-dependent variance. |#### Coders
| Type | Constructor | Description |
| :- | :- | :- |
| `Materialise` | | Materialise a sample. |
| `FunctionalCoder` | | Code into a function space: make the target inputs a discretisation. |
| `UniformDiscretisation1D` | | Discretise uniformly at a given density of points. |
| `InputsCoder` | | Code with the target inputs. |
| `MLPCoder` | | Rho-sum-phi coder. |### Data Generators
Models can be trained with data generators.
Data generators are callables that take in an integer (number of batches) and give
back an iterator that generates four-tuples: context inputs, context outputs,
target inputs, and target outputs.
All tensors should be of rank three where the first dimension is the data
dimension, the second dimension is the feature dimension, and the third
dimension is the batch dimension.Data generators can be constructed with `DataGenerator` which takes in an
underlying _stochastic process_.
[Stheno.jl](https://github.com/willtebbutt/Stheno.jl) can be used to build
Gaussian processes.
In addition, the package exports the following processes:| Type | Description |
| :- | :- |
| `Sawtooth` | Sawtooth process. |
| `BayesianConvNP` | A Convolutional Neural Process with a prior on the weights. |
| `Mixture` | Mixture of processes. |### Training and Evaluation
Experimentation functionality is exported by `NeuralProcesses.Experiment`.
#### Running Models
A model can be run forward by calling it with three arguments:
context inputs, context outputs, and target inputs.
All arguments to models should be tensors of rank three where the first
dimension is the data dimension, the second dimension is the feature dimension,
and the third dimension is the batch dimension.```julia
convcnp(
# Use a batch size of 16.
randn(Float32, 10, 1, 16), # Random context inputs
randn(Float32, 10, 1, 16), # Random context outputs
randn(Float32, 15, 1, 16) # Random target inputs
)
```For convenience, the package also exports the function `predict`, which
runs a model from inputs of type `Vector` and produces predictive means,
lower and upper credible bounds, and predictive samples.```julia
means, lowers, uppers, samples = predict(
convcnp,
randn(Float32, 10), # Random context inputs
randn(Float32, 10), # Random context outputs
randn(Float32, 10) # Random target inputs
)
```#### Training
To train a model, use `train!`, which, amongst other things, requires a
loss function and optimiser.
Loss functions are described below, and
optimisers can be found in `Flux.Optimiser`;
for most applications, `ADAM(5e-4)` probably suffices.
After training, a model can be evaluated with `eval_model`.See `train.jl` for an example of `train!`.
#### Losses
The following loss functions are exported:
| Function | Description |
| :- | :- |
| `loglik` | Biased estimate of the log-expected-likelihood. Exact for models with a deterministic encoder. |
| `elbo` | Neural process ELBO-style loss. |Examples:
```julia
# 1-sample log-EL loss. This is exact for models with a deterministic encoder.
loss(xs...) = loglik(xs..., num_samples=1)# 20-sample log-EL loss. This is probably what you want if you are training
# a model with a stochastic encoder.
loss(xs...) = loglik(xs..., num_samples=20)# 20-sample ELBO loss. This is an alternative to `loglik`.
loss(xs...) = elbo(xs..., num_samples=20)
```See `train.jl` for more examples.
#### Saving and Loading
After every epoch, the current model and top five best models are saved.
To file to which the model is written is determined by the keyword `bson`
of `train!`.
After training, the best model can be loaded with `best_model(path)`.## Examples
### The Conditional Neural Process
Perhaps the simplest member of the NP family is the
[Conditional Neural Process](http://proceedings.mlr.press/v80/garnelo18a.html)
(CNP).
CNPs employ a deterministic MLP-based encoder, and an MLP-based decoder.
As a first example, we provide an implementation of a simple CNP in the
framework:```julia
# The encoder maps into a finite-dimensional vector space, and produces a
# global (deterministic) representation, which is then concatenated to every
# test point. We use a `Parallel` object to achieve this.
encoder = Parallel(
# The `InputsEncoder` simply outputs the target locations. We `Chain` this
# with a `DeterministicLikelihood` to form a complete coder.
Chain(
InputsCoder(),
DeterministicLikelihood()
),
Chain(
# The representation is given by a deep-set network, which is
# implemented with the `MLPCoder` object. This object receives two MLPs
# upon construction, a pre-pooling network and post-pooling network,
# and produces a vector representation for each context set in the
# batch.
MLPCoder(
batched_mlp(
dim_in =dim_x + dim_y,
dim_hidden=dim_embedding,
dim_out =dim_embedding,
num_layers=num_encoder_layers
),
batched_mlp(
dim_in =dim_embedding,
dim_hidden=dim_embedding,
dim_out =dim_embedding,
num_layers=num_encoder_layers
)
),
# The resulting representation is also chained with a
# `DeterministicLikelihood` as we are interested in a conditional model.
DeterministicLikelihood()
)
)# The CNP decoder is also MLP based. It first `materialises` the encoder output
# (concatenates the target inputs and context set representation), and then
# passes these through an MLP that outputs a mean and standard deviation at
# every target location.
decoder = Chain(
# First, concatenate target inputs and context set representation. By
# default, `Materialise` uses concatenation to combine the different
# representations in a `Parallel` object, but alternative designs (e.g.,
# summation or multiplicative flows) could also be considered in
# NeuralProcesses.jl.
Materialise(),
# Pass the resulting representations through an MLP-based decoder. The
# input dimensionality is the dimensionality of the target inputs plus
# the dimensionality of the representation. The output dimension is
# twice the output dimensionality, since we require a mean and standard
# deviation for every location.
batched_mlp(
dim_in =dim_x + dim_embedding,
dim_hidden=dim_embedding,
dim_out =2dim_y,
num_layers=num_decoder_layers
),
# The `HeterogeneousGaussianLikelihood` automatically splits its inputs
# in two along the feature dimension, and treats the first half as the
# mean and second half as the standard deviation of a Gaussian
# distribution.
HeterogeneousGaussianLikelihood()
)cnp = Model(encoder, decoder)
```
Then, after training, we can make predictions as follows:```julia
means, lowers, uppers, samples = predict(
cnp,
randn(Float32, 10), # Random context inputs
randn(Float32, 10), # Random context outputs
randn(Float32, 10) # Random target inputs
)
```### The Neural Process
[Neural Processes](https://arxiv.org/abs/1807.01622) (NP) extend CNPs by adding
a latent variable to the model.
This enables NPs to capture joint, non-Gaussian marginal distributions for
target sets, which in turn allows producing coherent samples.
Extending CNPs to NPs in NeuralProcceses.jl is extremely easy: we simply
replace the `DeterministicLikelihood` component of the `MLPCoder` with a
`HeterogenousGaussian`, and adjust the output dimension of the encoder to
produce both means and variances!```julia
# The only change to the encoder is replacing the `DeterministicLikelihood`
# following the `MLPCoder` with a `HeterogenousGaussian`!
encoder = Parallel(
Chain(
InputsCoder(),
DeterministicLikelihood()
),
Chain(
MLPCoder(
batched_mlp(
dim_in =dim_x + dim_y,
dim_hidden=dim_embedding,
dim_out =dim_embedding,
num_layers=num_encoder_layers
),
batched_mlp(
dim_in =dim_embedding,
dim_hidden=dim_embedding,
# Since `HeterogenousGaussian` splits its inputs along the
# channel dimension, we increase the output dimension of the set
# encoder accordingly.
dim_out =2dim_embedding,
num_layers=num_encoder_layers
)
),
# This is the main change required to switch between a CNP and an NP.
HeterogeneousGaussianLikelihood()
)
)# We can then reuse the previously defined decoder as is!
np = Model(encoder, decoder)
```Note that typical NPs consider both a deterministic and latent representation.
This is easily achieved in NeuralProcesses.jl by adding an additional encoder to
the `Parallel` object (with a `DeterministicLikelihood`), and increasing the
decoder `dim_in` accordingly.
In this repo, the built-in NP model uses this form.
This example does not include a deterministic path to emphasise the ease of
switching between conditional and latent variable models in NeuralProcesses.jl.### The Attentive Neural Processes
Next, we consider a more complicated model, and demonstrate how easy it is to
implement with NeuralProcesses.jl.
[Attentive Neural Processes](https://openreview.net/forum?id=SkE6PjC9KX) (ANPs)
extend NPs by considering an attentive mechanism for the deterministic
representation.
Attention comes built-in with NeuralProcesses.jl, and so we can deploy it within
a `Chain` or `Parallel` like other building blocks.
Below is an example implementation of an ANP with a deterministic attentive
representation, and a stochastic (Gaussian) global representation.```julia
# The encoder now aggregates three separate representations:
# (i) the target inputs, like the (C)NP,
# (ii) a deterministic attentive representation, and
# (iii) a stochastic global representation.
encoder = Parallel(
# First, include the `InputsCoder` to represent the target set inputs.
Chain(
InputsCoder(),
DeterministicLikelihood()
),
# NeuralProcesses.jl uses a transformer-style multi-head architecture for
# attention. It first embeds the inputs and outputs into a
# finite-dimensional vector space with an MLP, and applies the attention in
# the embedding space. The constructor requires the dimensionalities of the
# inputs and outputs, the desired dimensionality of the embedding, and the
# number of heads to employ (each head will use a
# `div(dim_embedding, num_heads)`-dimensional embedding), and the number of
# layers in the embedding MLPs. As ANPs employ attention for the
# deterministic representations, this is chained with a
# `DeterministicLikelihood`.
Chain(
attention(
dim_x =dim_x,
dim_y =dim_y,
dim_embedding =dim_embedding,
num_heads =num_encoder_heads,
num_encoder_layers=num_encoder_layers
),
DeterministicLikelihood()
),
# The latent path uses the same form as for the NP.
Chain(
MLPCoder(
batched_mlp(
dim_in =dim_x + dim_y,
dim_hidden=dim_embedding,
dim_out =dim_embedding,
num_layers=num_encoder_layers
),
batched_mlp(
dim_in =dim_embedding,
dim_hidden=dim_embedding,
dim_out =2dim_embedding,
num_layers=num_encoder_layers
)
),
HeterogeneousGaussianLikelihood()
)
)# The decoder for the ANP is again MLP-based, and so has the same form as the
# (C)NP decoder. The only required change is to account for the dimensionality
# of the latent representation.
decoder = Chain(
Materialise(),
batched_mlp(
dim_in =dim_x + 2dim_embedding,
dim_hidden=dim_embedding,
dim_out =num_noise_channels,
num_layers=num_decoder_layers
),
noise
)anp = Model(encoder, decoder)
```### The Convolutional Conditional Neural Process
As a final example, we consider the
[Convolutional Conditional Neural Process](https://openreview.net/forum?id=Skey4eBYPS)
(ConvCNP).
The key difference between the ConvCNP and other NPs in terms of implementation
is that it encodes the data into an infinite-dimensional function space, rather
than a finite-dimensional vector space.
This is handled in NeuralPeocesses.jl with `FunctionalCoder`s, which, in
addition to complete coders, also expect `Discretisation` objects on construction.
Below is an implementation of the
[Convolutional Conditional Neural Process](https://openreview.net/forum?id=Skey4eBYPS):```julia
# The encoder maps into a function space, which is what `FunctionalCoder`
# indicates.
encoder = FunctionalCoder(
# We cannot exactly represent a function, so we represent a discretisation
# of the function instead. We use a discretisation of 64 points per unit.
# The discretisation will span from the minimal context or target input
# to the maximal context or target input with a margin of 1 on either side.
UniformDiscretisation1D(64f0, 1f0),
Chain(
# The encoder is given by a so-called set convolution, which directly
# maps the data set to the discretised functional representation. The
# data consists of one channel. We also specify a length scale of
# twice the inter-point spacing of the discretisation. The function
# space that we map into is a reproducing kernel Hilbert space (RKHS),
# and the length scale corresponds to the length scale of the kernel of
# the RKHS. We also append a density channel, which ensures that the
# encoder is injective.
set_conv(1, 2 / 64f0; density=true),
# The encoding will be deterministic. We could also use a stochastic
# encoding.
DeterministicLikelihood()
)
)decoder = Chain(
# The decoder first transforms the functional representation with a CNN.
build_conv(
4f0, # Receptive field size
10, # Number of layers
64, # Number of channels
points_per_unit =64f0, # Density of the discretisation
dimensionality =1, # This is a 1D model.
num_in_channels =2, # Account for density channel.
num_out_channels=2 # Produce a mean and standard deviation.
),
# Use another set convolution to map back from the space of the encoding
# to the space of the data.
set_conv(2, 2 / 64f0),
# Predict means and variances.
HeterogeneousGaussianLikelihood()
)convcnp = Model(encoder, decoder)
```## State of the Package
The package is currently mostly a port from academic code.
There are a still a number of important things to do:- **Support for 2D data:**
The package is currently built around 1D tasks.
We plan to add support for 2D data, e.g. images.
This should not require big changes, but it should be implemented carefully.- **Tests:**
The important components of the package are tested, but test coverage is
nowhere near where it should be.- **Regression tests:**
For the package, GPU performance is crucial, so regression tests are
necessary.- **Documentation:**
Documentation needs to be improved.## Implementation Details
### Automatic Differentiation
[Tracker.jl](https://github.com/FluxML/Tracker.jl) is used to automatically
compute gradients.
A number of custom gradients are implemented in `src/util.jl`.### Parameter Handling
The package uses [Functors.jl](https://github.com/FluxML/Functors.jl) to handle
parameters, like [Flux.jl](https://github.com/FluxML/Flux.jl), and adheres
to [Flux.jl](https://github.com/FluxML/Flux.jl)'s principles.
This means that _only_ `AbstractArray{<:Number}`s are parameters.
Nothing else will be trained, not even `Float32` or `Float64` scalars.To not train an array, wrap it with `NeuralProcesses.Fixed(array)`,
and unwrap it with `NeuralProcesses.unwrap(fixed)` at runtime.### GPU Acceleration
CUDA support for depthwise separable convolutions (`DepthwiseConv` from
[Flux.jl](https://github.com/FluxML/Flux.jl)) is implemented in `src/gpu.jl`.Loop fusion can cause issues on the GPU, so oftentimes computations are
unrolled.