{"id":13688827,"url":"https://github.com/patrick-kidger/torchcde","last_synced_at":"2025-05-16T10:08:14.551Z","repository":{"id":44439655,"uuid":"262190723","full_name":"patrick-kidger/torchcde","owner":"patrick-kidger","description":"Differentiable controlled differential equation solvers for PyTorch with GPU support and memory-efficient adjoint backpropagation.","archived":false,"fork":false,"pushed_at":"2023-07-31T19:51:43.000Z","size":253,"stargazers_count":434,"open_issues_count":13,"forks_count":46,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-05-10T09:38:50.540Z","etag":null,"topics":["controlled-differential-equations","deep-learning","deep-neural-networks","differential-equations","dynamical-systems","machine-learning","neural-differential-equations","neural-networks","pytorch","time-series"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/patrick-kidger.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"github":["patrick-kidger"]}},"created_at":"2020-05-08T00:45:40.000Z","updated_at":"2025-04-27T03:09:37.000Z","dependencies_parsed_at":"2024-01-12T20:48:43.202Z","dependency_job_id":"d1f526e3-a8ec-4c9e-a5c8-33e7e81f2c66","html_url":"https://github.com/patrick-kidger/torchcde","commit_stats":{"total_commits":113,"total_committers":4,"mean_commits":28.25,"dds":"0.053097345132743334","last_synced_commit":"7c16ad9092d216fcef6e93d99fe325eed700a40e"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patrick-kidger%2Ftorchcde","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patrick-kidger%2Ftorchcde/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patrick-kidger%2Ftorchcde/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patrick-kidger%2Ftorchcde/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/patrick-kidger","download_url":"https://codeload.github.com/patrick-kidger/torchcde/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254509478,"owners_count":22082892,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["controlled-differential-equations","deep-learning","deep-neural-networks","differential-equations","dynamical-systems","machine-learning","neural-differential-equations","neural-networks","pytorch","time-series"],"created_at":"2024-08-02T15:01:24.058Z","updated_at":"2025-05-16T10:08:09.533Z","avatar_url":"https://github.com/patrick-kidger.png","language":"Python","funding_links":["https://github.com/sponsors/patrick-kidger"],"categories":["Python","Software and Libraries","Additional Material"],"sub_categories":["Python","Software and Libraries"],"readme":"\u003ch1 align='center'\u003etorchcde\u003c/h1\u003e\n\u003ch2 align='center'\u003eDifferentiable GPU-capable solvers for CDEs\u003c/h2\u003e\n\n**Update: for any new projects, I would now recommend using [Diffrax](https://github.com/patrick-kidger/diffrax) instead. This is much faster, and producion-quality. torchcde was its prototype as a research project!**\n\nThis library provides differentiable GPU-capable solvers for controlled differential equations (CDEs). Backpropagation through the solver or via the adjoint method is supported; the latter allows for improved memory efficiency.\n\nIn particular this allows for building [Neural Controlled Differential Equation](https://github.com/patrick-kidger/NeuralCDE) models, which are state-of-the-art models for (arbitrarily irregular!) time series. Neural CDEs can be thought of as a \"continuous time RNN\".\n\n---\n\n\u003cp align=\"center\"\u003e\n\u003cimg align=\"middle\" src=\"./imgs/main.png\" width=\"666\" /\u003e\n\u003c/p\u003e\n\n## Installation\n\n```bash\npip install torchcde\n```\n\nRequires PyTorch \u003e=1.7.\n\n## Example\n```python\nimport torch\nimport torchcde\n\n# Create some data\nbatch, length, input_channels = 1, 10, 2\nhidden_channels = 3\nt = torch.linspace(0, 1, length)\nt_ = t.unsqueeze(0).unsqueeze(-1).expand(batch, length, 1)\nx_ = torch.rand(batch, length, input_channels - 1)\nx = torch.cat([t_, x_], dim=2)  # include time as a channel\n\n# Interpolate it\ncoeffs = torchcde.hermite_cubic_coefficients_with_backward_differences(x)\nX = torchcde.CubicSpline(coeffs)\n\n# Create the Neural CDE system\nclass F(torch.nn.Module):\n    def __init__(self):\n        super(F, self).__init__()\n        self.linear = torch.nn.Linear(hidden_channels,\n                                      hidden_channels * input_channels)\n\n    def forward(self, t, z):\n        return self.linear(z).view(batch, hidden_channels, input_channels)\n\nfunc = F()\nz0 = torch.rand(batch, hidden_channels)\n\n# Integrate it\ntorchcde.cdeint(X=X, func=func, z0=z0, t=X.interval)\n```\n\nSee [time_series_classification.py](./example/time_series_classification.py), which demonstrates how to use the library to train a Neural CDE model to predict the chirality of a spiral.\n\nAlso see [irregular_data.py](./example/irregular_data.py), for demonstrations on how to handle variable-length inputs, irregular sampling, or missing data, all of which can be handled easily, without changing the model.\n\n## Citation\nIf you found use this library useful, please consider citing\n\n```bibtex\n@article{kidger2020neuralcde,\n    title={{N}eural {C}ontrolled {D}ifferential {E}quations for {I}rregular {T}ime {S}eries},\n    author={Kidger, Patrick and Morrill, James and Foster, James and Lyons, Terry},\n    journal={Advances in Neural Information Processing Systems},\n    year={2020}\n}\n```\n\n## Documentation\n\nThe library consists of two main components: (1) integrators for solving controlled differential equations, and (2) ways of constructing controls from data.\n\n### Integrators\n\nThe library provides the `cdeint` function, which solves the system of controlled differential equations:\n```\ndz(t) = f(t, z(t))dX(t)     z(t_0) = z0\n```\n\nThe goal is to find the response `z` driven by the control `X`. This can be re-written as the following differential equation:\n```\ndz/dt(t) = f(t, z)dX/dt(t)     z(t_0) = z0\n```\nwhere the right hand side describes a matrix-vector product between `f(t, z)` and `dX/dt(t)`.\n\nThis is solved by\n```python\ncdeint(X, func, z0, t, adjoint, backend, **kwargs)\n```\nwhere letting `...` denote an arbitrary number of batch dimensions:\n* `X` is a `torch.nn.Module` with method `derivative`, such that `X.derivative(t)` is a Tensor of shape `(..., input_channels)`,\n* `func` is a `torch.nn.Module`, such that `func(t, z)` returns a Tensor of shape `(..., hidden_channels, input_channels)`,\n* `z0` is a Tensor of shape `(..., hidden_channels)`,\n* `t` is a one-dimensional Tensor of times to output `z` at.\n* `adjoint` is a boolean (defaulting to `True`).\n* `backend` is a string (defaulting to `\"torchdiffeq\"`).\n\nAdjoint backpropagation (which is slower but more memory efficient) can be toggled with `adjoint=True/False`.\n\nThe `backend` should be either `\"torchdiffeq\"` or `\"torchsde\"`, corresponding to which underlying library to use for the solvers. If using torchsde then the stochastic term is zero -- so the CDE is still reduced to an ODE. This is useful if one library supports a feature that the other doesn't. (For example torchsde supports a reversible solver, the [reversible Heun method](https://arxiv.org/abs/2105.13493); at time of writing torchdiffeq does not support any reversible solvers.)\n\nAny additional `**kwargs` are passed on to `torchdiffeq.odeint[_adjoint]` or `torchsde.sdeint[_adjoint]`, for example to specify the solver.\n\n### Constructing controls\n\n A very common scenario is to construct the continuous control`X` from discrete data (which may be irregularly sampled with missing values). To support this, we provide three main interpolation schemes:\n\n* Hermite cubic splines with backwards differences\n* Linear interpolation\n* Rectilinear interpolation\n\n_Note that if for some reason you already have a continuous control `X` then you won't need an interpolation scheme at all!_\n\nHermite cubic splines are usually the best choice, if possible. Linear and rectilinear interpolations are particularly useful in causal settings -- when at inference time the data is arriving over time. We go into further details in the [Further Documentation](#further-documentation) below.\n\nJust demonstrating Hermite cubic splines for now:\n```python\ncoeffs = hermite_cubic_coefficients_with_backward_differences(x)\n\n# coeffs is a torch.Tensor you can save, load,\n# pass through Datasets and DataLoaders etc.\n\nX = CubicSpline(coeffs)\n```\nwhere:\n* `x` is a Tensor of shape `(..., length, input_channels)`, where `...` is some number of batch dimensions. Missing data should be represented as a `NaN`.\n\nThe interface provided by `CubicSpline` is:\n\n* `.interval`, which gives the time interval the spline is defined over. (Often used as the `t` argument in `cdeint`.) This is determined implicitly from the length of the data, and so does _not_ in general correspond to the time your data was actually observed at. (See the [Further Documentation](#further-documentation) note on reparameterisation invariance.)\n* `.grid_points` is all of the knots in the spline, so that for example `X.evaluate(X.grid_points)` will recover the original data.\n* `.evaluate(t)`, where `t` is an any-dimensional Tensor, to evaluate the spline at any (collection of) time(s).\n* `.derivative(t)`, where `t` is an any-dimensional Tensor, to evaluate the derivative of the spline at any (collection of) time(s).\n\nUsually `hermite_cubic_coefficients_with_backward_differences` should be computed as a preprocessing step, whilst `CubicSpline` should be called inside the forward pass of your model. See [time_series_classification.py](./example/time_series_classification.py) for a worked example.\n\nThen call:\n```python\ncdeint(X=X, func=... z0=..., t=X.interval)\n```\n\n## Further documentation\nThe earlier documentation section should give everything you need to get up and running.\n\nHere we discuss a few more advanced bits of functionality:\n* The reparameterisation invariance property of CDEs.\n* Other interpolation methods, and the differences between them.\n* The use of fixed solvers. (They just work.)\n* Stacking CDEs (i.e. controlling one by the output of another).\n* Computing logsignatures for the log-ODE method.\n\n#### Reparameterisation invariance\nThis is a classical fact about CDEs.\n\nLet \u003cimg src=\"https://render.githubusercontent.com/render/math?math=%5Cpsi%20%5Ccolon%20%5Ba%2C%20b%5D%20%5Cto%20%5Bc%2C%20d%5D\"\u003e be differentiable and increasing, with \u003cimg src=\"https://render.githubusercontent.com/render/math?math=%5Cpsi(a)%20%3D%20c\"\u003e and \u003cimg src=\"https://render.githubusercontent.com/render/math?math=%5Cpsi(b)%20%3D%20d\"\u003e. Let \u003cimg src=\"https://render.githubusercontent.com/render/math?math=T%20%5Cin%20%5Bc%2C%20d%5D\"\u003e, let \u003cimg src=\"https://render.githubusercontent.com/render/math?math=%5Cwidetilde%7Bz%7D%20%3D%20z%20%5Ccirc%20%5Cpsi\"\u003e, let \u003cimg src=\"https://render.githubusercontent.com/render/math?math=%5Cwidetilde%7BX%7D%20%3D%20X%20%5Ccirc%20%5Cpsi\"\u003e, and let \u003cimg src=\"https://render.githubusercontent.com/render/math?math=%5Cmathcal%7BT%7D%20%3D%20%5Cpsi(T)\"\u003e. Then substituting \u003cimg src=\"https://render.githubusercontent.com/render/math?math=t%20%3D%20%5Cpsi(%5Ctau)\"\u003e into a CDE (and just using the standard change of variables formula):\n\n\u003cimg src=\"https://render.githubusercontent.com/render/math?math=%5Cbegin%7Balign*%7D%0A%5Cwidetilde%7Bz%7D(%5Cmathcal%7BT%7D)%20%26%3D%20z(T)%5C%5C%0A%20%26%3D%20z(c)%20%2B%20%5Cint_c%5ET%20f(z(t))%5C%2C%5Cmathrm%7Bd%7DX(t)%5C%5C%0A%26%3D%20z(c)%20%2B%20%5Cint_c%5ET%20f(z(t))%5C%2C%5Cfrac%7B%5Cmathrm%7Bd%7DX%7D%7B%5Cmathrm%7Bd%7Dt%7D(t)%5C%2C%5Cmathrm%7Bd%7Dt%5C%5C%0A%20%26%3D%20z(%5Cpsi(a))%20%2B%20%5Cint_a%5E%7B%5Cpsi%5E%7B-1%7D(T)%7D%20f(z(%5Cpsi(%5Ctau)))%5C%2C%5Cfrac%7B%5Cmathrm%7Bd%7DX%7D%7B%5Cmathrm%7Bd%7Dt%7D(%5Cpsi(%5Ctau))%5C%2C%5Cfrac%7B%5Cmathrm%7Bd%7D%5Cpsi%7D%7B%5Cmathrm%7Bd%7D%5Ctau%7D(%5Ctau)%5C%2C%5Cmathrm%7Bd%7D%5Ctau%5C%5C%0A%20%26%3D%20(z%5Ccirc%5Cpsi)(a)%20%2B%20%5Cint_a%5E%7B%5Cpsi%5E%7B-1%7D(T)%7D%20f((z%5Ccirc%5Cpsi)(%5Ctau))%5C%2C%5Cfrac%7B%5Cmathrm%7Bd%7D(X%5Ccirc%20%5Cpsi)%7D%7B%5Cmathrm%7Bd%7D%5Ctau%7D(%5Ctau)%5C%2C%5Cmathrm%7Bd%7D%5Ctau%5C%5C%0A%20%26%3D%20(z%5Ccirc%5Cpsi)(a)%20%2B%20%5Cint_a%5E%7B%5Cpsi%5E%7B-1%7D(T)%7D%20f((z%5Ccirc%20%5Cpsi)(%5Ctau))%5C%2C%5Cmathrm%7Bd%7D(X%5Ccirc%20%5Cpsi)(%5Ctau)%5C%5C%0A%26%3D%20%5Cwidetilde%7Bz%7D(c)%20%2B%20%5Cint_c%5E%5Cmathcal%7BT%7D%20f(%5Cwidetilde%7Bz%7D(%5Ctau))%20%5C%2C%5Cmathrm%7Bd%7D%5Cwidetilde%7BX%7D(%5Ctau)%0A%5Cend%7Balign*%7D\"\u003e\n\nWe see that \u003cimg src=\"https://render.githubusercontent.com/render/math?math=%5Cwidetilde%7Bz%7D\"\u003e **also** satisfies the neural CDE equation, just with \u003cimg src=\"https://render.githubusercontent.com/render/math?math=%5Cwidetilde%7BX%7D\"\u003e as input instead of \u003cimg src=\"https://render.githubusercontent.com/render/math?math=X\"\u003e. In other words, using \u003cimg src=\"https://render.githubusercontent.com/render/math?math=\\psi\"\u003e changes the speed at which we traverse the input \u003cimg src=\"https://render.githubusercontent.com/render/math?math=X\"\u003e, and correspondingly changes the speed at which we traverse the output \u003cimg src=\"https://render.githubusercontent.com/render/math?math=z\"\u003e -- and that's it! In particular the CDE itself doesn't need any adjusting.\n\nThis ends up being a really useful fact for writing neater software. We can handle things like messy data (e.g. variable length time series) just during data preprocessing, without it complicating the model code. In [time_series_classification.py](/example/time_series_classification.py), the region we integrate over is given by `X.interval` as a standardised region to integrate over. In the example [irregular_data.py](/example/irregular_data.py), we use this to handle variable-length data.\n\n#### Different interpolation methods\nFor a full breakdown into the interpolation schemes, see [Neural Controlled Differential Equations for Online Prediction Tasks](https://arxiv.org/pdf/2106.11028.pdf) where each interpolation scheme is scrutinised, and best practices are presented.\n\nIn brief:\n* Will your data: (a) be arriving in an online fashion at inference time; and (b) be multivariate; and (c) potentially have missing values?\n  * Yes: rectilinear interpolation.\n  * No: Are you using an adaptive step size solver (e.g. the default `dopri5`)?\n     * Yes: Hermite cubic splines with backwards differences.\n     * No: linear interpolation.\n     * Not sure / both: Hermite cubic splines with backwards differences.\n\nIn more detail:\n\n* Linear interpolation: these are \"kind-of\" causal.\n\nDuring inference we can simply wait at each time point for the next data point to arrive, and then interpolate towards the next data point when it arrives, and solve the CDE over that interval.\n\nIf there is missing data, however, then this isn't possible. (As some of the channels might not have observations you can interpolate to.) In this case use rectilinear interpolation, below.\n\nExample:\n```python\ncoeffs = linear_interpolation_coeffs(x)\nX = LinearInterpolation(coeffs)\ncdeint(X=X, ...)\n```\n\nLinear interpolation has kinks. If using adaptive step size solvers then it should be told about the kinks. (Rather than expensively finding them for itself -- slowing down to resolve the kink, and then speeding up again afterwards.) This is done with the `jump_t` option when using the `torchdiffeq` backend:\n```python\ncdeint(...,\n       backend='torchdiffeq',\n       method='dopri5',\n       options=dict(jump_t=X.grid_points))\n```\nAlthough adaptive step size solvers will probably find it easier to resolve Hermite cubic splines with backward differences, below.\n\n* Hermite cubic splines with backwards differences: these are \"kind-of\" causal in the same way as linear interpolation, but dont have kinks, which makes them faster with adaptive step size solvers. (But is simply an unnecessary overhead when working with fixed step size solvers, which is why we recommend linear interpolation is you know you're only going to be using fixed step size solvers.)\n\nExample:\n```python\ncoeffs = hermite_cubic_coefficients_with_backward_differences(x)\nX = CubicSpline(coeffs)\ncdeint(X=X, ...)\n```\n\n* Rectilinear interpolation: This is appropriate if there is multivariate missing data, and you need causality.\n\nWhat is done is to linearly interpolate forward in time (keeping the observations constant), and then linearly interpolate the values (keeping the time constant). This is possible because time is a channel (and doesn't need to line up with the \"time\" used in the differential equation solver, as per the reparameterisation invariance of the previous section).\n\nThis can be a bit unintuitive at first. We suggest firing up matplotlib and plotting things to get a feel for what's going on. As a fun sidenote, using rectilinear interpolation makes neural CDEs generalise [ODE-RNNs](https://arxiv.org/abs/1907.03907).\n\nExample:\n```python\n# standard setup for a neural CDE: include time as a channel\nt = torch.linspace(0, 1, 10)\nx = torch.rand(2, 10, 3)\nt_ = t.unsqueeze(0).unsqueeze(-1).expand(2, 10, 1)\nx = torch.cat([t_, x], dim=-1)\ndel t, t_  # won't need these again!\n# The `rectilinear` argument is the channel index corresponding to time\ncoeffs = linear_interpolation_coeffs(x, rectilinear=0)\nX = LinearInterpolation(coeffs)\ncdeint(X=X, ...)\n```\n\nAs before, if using an adaptive step size solver, it should be informed about the kinks.\n```python\ncdeint(...,\n       backend='torchdiffeq',\n       method='dopri5',\n       options=dict(jump_t=X.grid_points))\n```\n\n#### Fixed solvers\nSolving CDEs (regardless of the choice of interpolation scheme in a Neural CDE) with fixed solvers like `euler`, `midpoint`, `rk4` etc. is pretty much exactly the same as solving an ODE with a fixed solver. Just make sure to set the `step_size` option to something sensible; for example the smallest gap between times:\n```python\nX = LinearInterpolation(coeffs)\nstep_size = (X.grid_points[1:] - X.grid_points[:-1]).min()\ncdeint(\n    X=X, t=X.interval, func=..., method='rk4',\n    options=dict(step_size=step_size)\n)\n```\n\n#### Stacking CDEs\nYou may wish to use the output of one CDE to control another. That is, to solve the coupled CDEs:\n```\ndu(t) = g(t, u(t))dz(t)     u(t_0) = u0\ndz(t) = f(t, z(t))dX(t)     z(t_0) = z0\n```\n\nThere are two ways to do this. The first way is to put everything inside a single `cdeint` call, by solving the system\n```\nv = [u, z]\nv0 = [u0, z0]\nh(t, v) = [g(t, u)f(t, z), f(t, z)]\n\ndv(t) = h(t, v(t))dX(t)      v(t_0) = v0\n```\nand using `cdeint` as normal. This is usually the best way to do it! It's simpler and usually faster. (But forces you to use the same solver for the whole system, for example.)\n\nThe second way is to have `cdeint` output `z(t)` at multiple times `t`, interpolate the discrete output into a continuous path, and then call `cdeint` again. This is probably less memory efficient, but allows for different choices of solver for each call to `cdeint`.\n\n_For example, this could be used to create multi-layer Neural CDEs, just like multi-layer RNNs. Although as of writing this, no-one seems to have tried this yet!_\n\n#### The log-ODE method\nThis is a way of reducing the length of data by using extra channels. (For example, this may help train Neural CDE models faster, as the extra channels can be parallelised, but extra length cannot.)\n\nThis is done by splitting the control `X` up into windows, and computing the _logsignature_ of the control over each window. The logsignature is a transform known to extract the information that is most important to describing how `X` controls a CDE.\n\nThis is supported by the `logsig_windows` function, which takes in data, and produces a transformed path, that now exists in logsignature space:\n```python\nbatch, length, channels = 1, 100, 2\nx = torch.rand(batch, length, channels)\ndepth, window = 3, 10.0\nx = torchcde.logsig_windows(x, depth, window)\n# use x as you would normally: interpolate, etc.\n```\n\nSee the paper [Neural Rough Differential Equations for Long Time Series](https://arxiv.org/abs/2009.08295) for more information. See [logsignature_example.py](./example/logsignature_example.py) for a worked example.\n\n_Note that this requires installing the [Signatory](https://github.com/patrick-kidger/signatory) package._\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatrick-kidger%2Ftorchcde","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpatrick-kidger%2Ftorchcde","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatrick-kidger%2Ftorchcde/lists"}