https://github.com/anassinator/ilqr
Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models
https://github.com/anassinator/ilqr
auto-differentiation cartpole control-systems ddp differential-dynamic-programming dynamics-models ilqg ilqr model-predictive-control model-predictive-controller mpc mpc-control non-linear-optimization optimal-control pendulum theano trajectory-optimization trajectory-tracking
Last synced: 26 days ago
JSON representation
Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models
- Host: GitHub
- URL: https://github.com/anassinator/ilqr
- Owner: anassinator
- License: gpl-3.0
- Created: 2017-11-17T12:40:03.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-06-21T21:07:51.000Z (almost 3 years ago)
- Last Synced: 2024-08-01T05:15:30.845Z (9 months ago)
- Topics: auto-differentiation, cartpole, control-systems, ddp, differential-dynamic-programming, dynamics-models, ilqg, ilqr, model-predictive-control, model-predictive-controller, mpc, mpc-control, non-linear-optimization, optimal-control, pendulum, theano, trajectory-optimization, trajectory-tracking
- Language: Python
- Homepage:
- Size: 1.33 MB
- Stars: 358
- Watchers: 11
- Forks: 78
- Open Issues: 7
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
- awesome-robotic-tooling - ilqr - Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models. (Planning and Control / Vector Map)
README
Iterative Linear Quadratic Regulator
====================================.. image:: https://travis-ci.org/anassinator/ilqr.svg?branch=master
:target: https://travis-ci.org/anassinator/ilqrThis is an implementation of the Iterative Linear Quadratic Regulator (iLQR)
for non-linear trajectory optimization based on Yuval Tassa's
`paper `_.It is compatible with both Python 2 and 3 and has built-in support for
auto-differentiating both the dynamics model and the cost function using
`Theano `_.Install
-------To install, clone and run:
.. code-block:: bash
python setup.py install
You may also install the dependencies with `pipenv` as follows:
.. code-block:: bash
pipenv install
Usage
-----After installing, :code:`import` as follows:
.. code-block:: python
from ilqr import iLQR
You can see the `examples `_ directory for
`Jupyter `_ notebooks to see how common control problems
can be solved through iLQR.Dynamics model
^^^^^^^^^^^^^^You can set up your own dynamics model by either extending the :code:`Dynamics`
class and hard-coding it and its partial derivatives. Alternatively, you can
write it up as a `Theano` expression and use the :code:`AutoDiffDynamics` class
for it to be auto-differentiated. Finally, if all you have is a function, you
can use the :code:`FiniteDiffDynamics` class to approximate the derivatives
with finite difference approximation.This section demonstrates how to implement the following dynamics model:
.. math::
m \dot{v} = F - \alpha v
where :math:`m` is the object's mass in :math:`kg`, :math:`alpha` is the
friction coefficient, :math:`v` is the object's velocity in :math:`m/s`,
:math:`\dot{v}` is the object's acceleration in :math:`m/s^2`, and :math:`F` is
the control (or force) you're applying to the object in :math:`N`.Automatic differentiation
""""""""""""""""""""""""".. code-block:: python
import theano.tensor as T
from ilqr.dynamics import AutoDiffDynamicsx = T.dscalar("x") # Position.
x_dot = T.dscalar("x_dot") # Velocity.
F = T.dscalar("F") # Force.dt = 0.01 # Discrete time-step in seconds.
m = 1.0 # Mass in kg.
alpha = 0.1 # Friction coefficient.# Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m# Discrete dynamics model definition.
f = T.stack([
x + x_dot * dt,
x_dot + x_dot_dot * dt,
])x_inputs = [x, x_dot] # State vector.
u_inputs = [F] # Control vector.# Compile the dynamics.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
dynamics = AutoDiffDynamics(f, x_inputs, u_inputs)*Note*: If you want to be able to use the Hessians (:code:`f_xx`, :code:`f_ux`,
and :code:`f_uu`), you need to pass the :code:`hessians=True` argument to the
constructor. This will increase compilation time. Note that :code:`iLQR` does
not require second-order derivatives to function.Batch automatic differentiation
""""""""""""""""""""""""""""""".. code-block:: python
import theano.tensor as T
from ilqr.dynamics import BatchAutoDiffDynamicsstate_size = 2 # [position, velocity]
action_size = 1 # [force]dt = 0.01 # Discrete time-step in seconds.
m = 1.0 # Mass in kg.
alpha = 0.1 # Friction coefficient.def f(x, u, i):
"""Batched implementation of the dynamics model.Args:
x: State vector [*, state_size].
u: Control vector [*, action_size].
i: Current time step [*, 1].Returns:
Next state vector [*, state_size].
"""
x_ = x[..., 0]
x_dot = x[..., 1]
F = u[..., 0]# Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m# Discrete dynamics model definition.
return T.stack([
x_ + x_dot * dt,
x_dot + x_dot_dot * dt,
]).T# Compile the dynamics.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
dynamics = BatchAutoDiffDynamics(f, state_size, action_size)*Note*: This is a faster version of :code:`AutoDiffDynamics` that doesn't
support Hessians.Finite difference approximation
""""""""""""""""""""""""""""""".. code-block:: python
from ilqr.dynamics import FiniteDiffDynamics
state_size = 2 # [position, velocity]
action_size = 1 # [force]dt = 0.01 # Discrete time-step in seconds.
m = 1.0 # Mass in kg.
alpha = 0.1 # Friction coefficient.def f(x, u, i):
"""Dynamics model function.Args:
x: State vector [state_size].
u: Control vector [action_size].
i: Current time step.Returns:
Next state vector [state_size].
"""
[x, x_dot] = x
[F] = u# Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / mreturn np.array([
x + x_dot * dt,
x_dot + x_dot_dot * dt,
])# NOTE: Unlike with AutoDiffDynamics, this is instantaneous, but will not be
# as accurate.
dynamics = FiniteDiffDynamics(f, state_size, action_size)*Note*: It is possible you might need to play with the epsilon values
(:code:`x_eps` and :code:`u_eps`) used when computing the approximation if you
run into numerical instability issues.Usage
"""""Regardless of the method used for constructing your dynamics model, you can use
them as follows:.. code-block:: python
curr_x = np.array([1.0, 2.0])
curr_u = np.array([0.0])
i = 0 # This dynamics model is not time-varying, so this doesn't matter.>>> dynamics.f(curr_x, curr_u, i)
... array([ 1.02 , 2.01998])
>>> dynamics.f_x(curr_x, curr_u, i)
... array([[ 1. , 0.01 ],
[ 0. , 1.00999]])
>>> dynamics.f_u(curr_x, curr_u, i)
... array([[ 0. ],
[ 0.0001]])Comparing the output of the :code:`AutoDiffDynamics` and the
:code:`FiniteDiffDynamics` models should generally yield consistent results,
but the auto-differentiated method will always be more accurate. Generally, the
finite difference approximation will be faster unless you're also computing the
Hessians: in which case, Theano's compiled derivatives are more optimized.Cost function
^^^^^^^^^^^^^Similarly, you can set up your own cost function by either extending the
:code:`Cost` class and hard-coding it and its partial derivatives.
Alternatively, you can write it up as a `Theano` expression and use the
:code:`AutoDiffCost` class for it to be auto-differentiated. Finally, if all
you have are a loss functions, you can use the :code:`FiniteDiffCost` class to
approximate the derivatives with finite difference approximation.The most common cost function is the quadratic format used by Linear Quadratic
Regulators:.. math::
(x - x_{goal})^T Q (x - x_{goal}) + (u - u_{goal})^T R (u - u_{goal})
where :math:`Q` and :math:`R` are matrices defining your quadratic state error
and quadratic control errors and :math:`x_{goal}` is your target state. For
convenience, an implementation of this cost function is made available as the
:code:`QRCost` class.:code:`QRCost` class
"""""""""""""""""""".. code-block:: python
import numpy as np
from ilqr.cost import QRCoststate_size = 2 # [position, velocity]
action_size = 1 # [force]# The coefficients weigh how much your state error is worth to you vs
# the size of your controls. You can favor a solution that uses smaller
# controls by increasing R's coefficient.
Q = 100 * np.eye(state_size)
R = 0.01 * np.eye(action_size)# This is optional if you want your cost to be computed differently at a
# terminal state.
Q_terminal = np.array([[100.0, 0.0], [0.0, 0.1]])# State goal is set to a position of 1 m with no velocity.
x_goal = np.array([1.0, 0.0])# NOTE: This is instantaneous and completely accurate.
cost = QRCost(Q, R, Q_terminal=Q_terminal, x_goal=x_goal)Automatic differentiation
""""""""""""""""""""""""".. code-block:: python
import theano.tensor as T
from ilqr.cost import AutoDiffCostx_inputs = [T.dscalar("x"), T.dscalar("x_dot")]
u_inputs = [T.dscalar("F")]x = T.stack(x_inputs)
u = T.stack(u_inputs)x_diff = x - x_goal
l = x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)
l_terminal = x_diff.T.dot(Q_terminal).dot(x_diff)# Compile the cost.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
cost = AutoDiffCost(l, l_terminal, x_inputs, u_inputs)Batch automatic differentiation
""""""""""""""""""""""""""""""".. code-block:: python
import theano.tensor as T
from ilqr.cost import BatchAutoDiffCostdef cost_function(x, u, i, terminal):
"""Batched implementation of the quadratic cost function.Args:
x: State vector [*, state_size].
u: Control vector [*, action_size].
i: Current time step [*, 1].
terminal: Whether to compute the terminal cost.Returns:
Instantaneous cost [*].
"""
Q_ = Q_terminal if terminal else Q
l = x.dot(Q_).dot(x.T)
if l.ndim == 2:
l = T.diag(l)if not terminal:
l_u = u.dot(R).dot(u.T)
if l_u.ndim == 2:
l_u = T.diag(l_u)
l += l_ureturn l
# Compile the cost.
# NOTE: This can be slow as it's computing and compiling the derivatives.
# But that's okay since it's only a one-time cost on startup.
cost = BatchAutoDiffCost(cost_function, state_size, action_size)Finite difference approximation
""""""""""""""""""""""""""""""".. code-block:: python
from ilqr.cost import FiniteDiffCost
def l(x, u, i):
"""Instantaneous cost function.Args:
x: State vector [state_size].
u: Control vector [action_size].
i: Current time step.Returns:
Instantaneous cost [scalar].
"""
x_diff = x - x_goal
return x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)def l_terminal(x, i):
"""Terminal cost function.Args:
x: State vector [state_size].
i: Current time step.Returns:
Terminal cost [scalar].
"""
x_diff = x - x_goal
return x_diff.T.dot(Q_terminal).dot(x_diff)# NOTE: Unlike with AutoDiffCost, this is instantaneous, but will not be as
# accurate.
cost = FiniteDiffCost(l, l_terminal, state_size, action_size)*Note*: It is possible you might need to play with the epsilon values
(:code:`x_eps` and :code:`u_eps`) used when computing the approximation if you
run into numerical instability issues.Usage
"""""Regardless of the method used for constructing your cost function, you can use
them as follows:.. code-block:: python
>>> cost.l(curr_x, curr_u, i)
... 400.0
>>> cost.l_x(curr_x, curr_u, i)
... array([ 0., 400.])
>>> cost.l_u(curr_x, curr_u, i)
... array([ 0.])
>>> cost.l_xx(curr_x, curr_u, i)
... array([[ 200., 0.],
[ 0., 200.]])
>>> cost.l_ux(curr_x, curr_u, i)
... array([[ 0., 0.]])
>>> cost.l_uu(curr_x, curr_u, i)
... array([[ 0.02]])Putting it all together
^^^^^^^^^^^^^^^^^^^^^^^.. code-block:: python
N = 1000 # Number of time-steps in trajectory.
x0 = np.array([0.0, -0.1]) # Initial state.
us_init = np.random.uniform(-1, 1, (N, 1)) # Random initial action path.ilqr = iLQR(dynamics, cost, N)
xs, us = ilqr.fit(x0, us_init):code:`xs` and :code:`us` now hold the optimal state and control trajectory
that reaches the desired goal state with minimum cost.Finally, a :code:`RecedingHorizonController` is also bundled with this package
to use the :code:`iLQR` controller in Model Predictive Control.Important notes
^^^^^^^^^^^^^^^To quote from Tassa's paper: "Two important parameters which have a direct
impact on performance are the simulation time-step :code:`dt` and the horizon
length :code:`N`. Since speed is of the essence, the goal is to choose those
values which minimize the number of steps in the trajectory, i.e. the largest
possible time-step and the shortest possible horizon. The size of :code:`dt`
is limited by our use of Euler integration; beyond some value the simulation
becomes unstable. The minimum length of the horizon :code:`N` is a
problem-dependent quantity which must be found by trial-and-error."Contributing
------------Contributions are welcome. Simply open an issue or pull request on the matter.
Linting
-------We use `YAPF `_ for all Python formatting
needs. You can auto-format your changes with the following command:.. code-block:: bash
yapf --recursive --in-place --parallel .
You may install the linter as follows:
.. code-block:: bash
pipenv install --dev
License
-------See `LICENSE `_.
Credits
-------This implementation was partially based on Yuval Tassa's :code:`MATLAB`
`implementation `_,
and `navigator8972 `_'s
`implementation `_.