https://github.com/anassinator/ilqr

Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models
https://github.com/anassinator/ilqr

auto-differentiation cartpole control-systems ddp differential-dynamic-programming dynamics-models ilqg ilqr model-predictive-control model-predictive-controller mpc mpc-control non-linear-optimization optimal-control pendulum theano trajectory-optimization trajectory-tracking

Last synced: 4 months ago
JSON representation

Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models

Host: GitHub
URL: https://github.com/anassinator/ilqr
Owner: anassinator
License: gpl-3.0
Created: 2017-11-17T12:40:03.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-06-21T21:07:51.000Z (about 3 years ago)
Last Synced: 2024-08-01T05:15:30.845Z (11 months ago)
Topics: auto-differentiation, cartpole, control-systems, ddp, differential-dynamic-programming, dynamics-models, ilqg, ilqr, model-predictive-control, model-predictive-controller, mpc, mpc-control, non-linear-optimization, optimal-control, pendulum, theano, trajectory-optimization, trajectory-tracking
Language: Python
Homepage:
Size: 1.33 MB
Stars: 358
Watchers: 11
Forks: 78
Open Issues: 7
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

awesome-robotic-tooling - ilqr - Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models. (Planning and Control / Vector Map)

README

        Iterative Linear Quadratic Regulator

====================================

.. image:: https://travis-ci.org/anassinator/ilqr.svg?branch=master

  :target: https://travis-ci.org/anassinator/ilqr

This is an implementation of the Iterative Linear Quadratic Regulator (iLQR)

for non-linear trajectory optimization based on Yuval Tassa's

`paper `_.

It is compatible with both Python 2 and 3 and has built-in support for

auto-differentiating both the dynamics model and the cost function using

`Theano `_.

Install

-------

To install, clone and run:

.. code-block:: bash

  python setup.py install

You may also install the dependencies with `pipenv` as follows:

.. code-block:: bash

  pipenv install

Usage

-----

After installing, :code:`import` as follows:

.. code-block:: python

  from ilqr import iLQR

You can see the `examples `_ directory for

`Jupyter `_ notebooks to see how common control problems

can be solved through iLQR.

Dynamics model

^^^^^^^^^^^^^^

You can set up your own dynamics model by either extending the :code:`Dynamics`

class and hard-coding it and its partial derivatives. Alternatively, you can

write it up as a `Theano` expression and use the :code:`AutoDiffDynamics` class

for it to be auto-differentiated. Finally, if all you have is a function, you

can use the :code:`FiniteDiffDynamics` class to approximate the derivatives

with finite difference approximation.

This section demonstrates how to implement the following dynamics model:

.. math::

  m \dot{v} = F - \alpha v

where :math:`m` is the object's mass in :math:`kg`, :math:`alpha` is the

friction coefficient, :math:`v` is the object's velocity in :math:`m/s`,

:math:`\dot{v}` is the object's acceleration in :math:`m/s^2`, and :math:`F` is

the control (or force) you're applying to the object in :math:`N`.

Automatic differentiation

"""""""""""""""""""""""""

.. code-block:: python

  import theano.tensor as T

  from ilqr.dynamics import AutoDiffDynamics

  x = T.dscalar("x")  # Position.

  x_dot = T.dscalar("x_dot")  # Velocity.

  F = T.dscalar("F")  # Force.

  dt = 0.01  # Discrete time-step in seconds.

  m = 1.0  # Mass in kg.

  alpha = 0.1  # Friction coefficient.

  # Acceleration.

  x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m

  # Discrete dynamics model definition.

  f = T.stack([

      x + x_dot * dt,

      x_dot + x_dot_dot * dt,

  ])

  x_inputs = [x, x_dot]  # State vector.

  u_inputs = [F]  # Control vector.

  # Compile the dynamics.

  # NOTE: This can be slow as it's computing and compiling the derivatives.

  # But that's okay since it's only a one-time cost on startup.

  dynamics = AutoDiffDynamics(f, x_inputs, u_inputs)

*Note*: If you want to be able to use the Hessians (:code:`f_xx`, :code:`f_ux`,

and :code:`f_uu`), you need to pass the :code:`hessians=True` argument to the

constructor. This will increase compilation time. Note that :code:`iLQR` does

not require second-order derivatives to function.

Batch automatic differentiation

"""""""""""""""""""""""""""""""

.. code-block:: python

  import theano.tensor as T

  from ilqr.dynamics import BatchAutoDiffDynamics

  state_size = 2  # [position, velocity]

  action_size = 1  # [force]

  dt = 0.01  # Discrete time-step in seconds.

  m = 1.0  # Mass in kg.

  alpha = 0.1  # Friction coefficient.

  def f(x, u, i):

      """Batched implementation of the dynamics model.

      Args:

          x: State vector [*, state_size].

          u: Control vector [*, action_size].

          i: Current time step [*, 1].

      Returns:

          Next state vector [*, state_size].

      """

      x_ = x[..., 0]

      x_dot = x[..., 1]

      F = u[..., 0]

      # Acceleration.

      x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m

      # Discrete dynamics model definition.

      return T.stack([

          x_ + x_dot * dt,

          x_dot + x_dot_dot * dt,

      ]).T

  # Compile the dynamics.

  # NOTE: This can be slow as it's computing and compiling the derivatives.

  # But that's okay since it's only a one-time cost on startup.

  dynamics = BatchAutoDiffDynamics(f, state_size, action_size)

*Note*: This is a faster version of :code:`AutoDiffDynamics` that doesn't

support Hessians.

Finite difference approximation

"""""""""""""""""""""""""""""""

.. code-block:: python

  from ilqr.dynamics import FiniteDiffDynamics

  state_size = 2  # [position, velocity]

  action_size = 1  # [force]

  dt = 0.01  # Discrete time-step in seconds.

  m = 1.0  # Mass in kg.

  alpha = 0.1  # Friction coefficient.

  def f(x, u, i):

      """Dynamics model function.

      Args:

          x: State vector [state_size].

          u: Control vector [action_size].

          i: Current time step.

      Returns:

          Next state vector [state_size].

      """

      [x, x_dot] = x

      [F] = u

      # Acceleration.

      x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m

      return np.array([

          x + x_dot * dt,

          x_dot + x_dot_dot * dt,

      ])

  # NOTE: Unlike with AutoDiffDynamics, this is instantaneous, but will not be

  # as accurate.

  dynamics = FiniteDiffDynamics(f, state_size, action_size)

*Note*: It is possible you might need to play with the epsilon values

(:code:`x_eps` and :code:`u_eps`) used when computing the approximation if you

run into numerical instability issues.

Usage

"""""

Regardless of the method used for constructing your dynamics model, you can use

them as follows:

.. code-block:: python

  curr_x = np.array([1.0, 2.0])

  curr_u = np.array([0.0])

  i = 0  # This dynamics model is not time-varying, so this doesn't matter.

  >>> dynamics.f(curr_x, curr_u, i)

  ... array([ 1.02   ,  2.01998])

  >>> dynamics.f_x(curr_x, curr_u, i)

  ... array([[ 1.     ,  0.01   ],

             [ 0.     ,  1.00999]])

  >>> dynamics.f_u(curr_x, curr_u, i)

  ... array([[ 0.    ],

             [ 0.0001]])

Comparing the output of the :code:`AutoDiffDynamics` and the

:code:`FiniteDiffDynamics` models should generally yield consistent results,

but the auto-differentiated method will always be more accurate. Generally, the

finite difference approximation will be faster unless you're also computing the

Hessians: in which case, Theano's compiled derivatives are more optimized.

Cost function

^^^^^^^^^^^^^

Similarly, you can set up your own cost function by either extending the

:code:`Cost` class and hard-coding it and its partial derivatives.

Alternatively, you can write it up as a `Theano` expression and use the

:code:`AutoDiffCost` class for it to be auto-differentiated. Finally, if all

you have are a loss functions, you can use the :code:`FiniteDiffCost` class to

approximate the derivatives with finite difference approximation.

The most common cost function is the quadratic format used by Linear Quadratic

Regulators:

.. math::

  (x - x_{goal})^T Q (x - x_{goal}) + (u - u_{goal})^T R (u - u_{goal})

where :math:`Q` and :math:`R` are matrices defining your quadratic state error

and quadratic control errors and :math:`x_{goal}` is your target state. For

convenience, an implementation of this cost function is made available as the

:code:`QRCost` class.

:code:`QRCost` class

""""""""""""""""""""

.. code-block:: python

  import numpy as np

  from ilqr.cost import QRCost

  state_size = 2  # [position, velocity]

  action_size = 1  # [force]

  # The coefficients weigh how much your state error is worth to you vs

  # the size of your controls. You can favor a solution that uses smaller

  # controls by increasing R's coefficient.

  Q = 100 * np.eye(state_size)

  R = 0.01 * np.eye(action_size)

  # This is optional if you want your cost to be computed differently at a

  # terminal state.

  Q_terminal = np.array([[100.0, 0.0], [0.0, 0.1]])

  # State goal is set to a position of 1 m with no velocity.

  x_goal = np.array([1.0, 0.0])

  # NOTE: This is instantaneous and completely accurate.

  cost = QRCost(Q, R, Q_terminal=Q_terminal, x_goal=x_goal)

Automatic differentiation

"""""""""""""""""""""""""

.. code-block:: python

  import theano.tensor as T

  from ilqr.cost import AutoDiffCost

  x_inputs = [T.dscalar("x"), T.dscalar("x_dot")]

  u_inputs = [T.dscalar("F")]

  x = T.stack(x_inputs)

  u = T.stack(u_inputs)

  x_diff = x - x_goal

  l = x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)

  l_terminal = x_diff.T.dot(Q_terminal).dot(x_diff)

  # Compile the cost.

  # NOTE: This can be slow as it's computing and compiling the derivatives.

  # But that's okay since it's only a one-time cost on startup.

  cost = AutoDiffCost(l, l_terminal, x_inputs, u_inputs)

Batch automatic differentiation

"""""""""""""""""""""""""""""""

.. code-block:: python

  import theano.tensor as T

  from ilqr.cost import BatchAutoDiffCost

  def cost_function(x, u, i, terminal):

      """Batched implementation of the quadratic cost function.

      Args:

          x: State vector [*, state_size].

          u: Control vector [*, action_size].

          i: Current time step [*, 1].

          terminal: Whether to compute the terminal cost.

      Returns:

          Instantaneous cost [*].

      """

      Q_ = Q_terminal if terminal else Q

      l = x.dot(Q_).dot(x.T)

      if l.ndim == 2:

          l = T.diag(l)

      if not terminal:

          l_u = u.dot(R).dot(u.T)

          if l_u.ndim == 2:

              l_u = T.diag(l_u)

          l += l_u

      return l

  # Compile the cost.

  # NOTE: This can be slow as it's computing and compiling the derivatives.

  # But that's okay since it's only a one-time cost on startup.

  cost = BatchAutoDiffCost(cost_function, state_size, action_size)

Finite difference approximation

"""""""""""""""""""""""""""""""

.. code-block:: python

  from ilqr.cost import FiniteDiffCost

  def l(x, u, i):

      """Instantaneous cost function.

      Args:

          x: State vector [state_size].

          u: Control vector [action_size].

          i: Current time step.

      Returns:

          Instantaneous cost [scalar].

      """

      x_diff = x - x_goal

      return x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)

  def l_terminal(x, i):

      """Terminal cost function.

      Args:

          x: State vector [state_size].

          i: Current time step.

      Returns:

          Terminal cost [scalar].

      """

      x_diff = x - x_goal

      return x_diff.T.dot(Q_terminal).dot(x_diff)

  # NOTE: Unlike with AutoDiffCost, this is instantaneous, but will not be as

  # accurate.

  cost = FiniteDiffCost(l, l_terminal, state_size, action_size)

*Note*: It is possible you might need to play with the epsilon values

(:code:`x_eps` and :code:`u_eps`) used when computing the approximation if you

run into numerical instability issues.

Usage

"""""

Regardless of the method used for constructing your cost function, you can use

them as follows:

.. code-block:: python

  >>> cost.l(curr_x, curr_u, i)

  ... 400.0

  >>> cost.l_x(curr_x, curr_u, i)

  ... array([   0.,  400.])

  >>> cost.l_u(curr_x, curr_u, i)

  ... array([ 0.])

  >>> cost.l_xx(curr_x, curr_u, i)

  ... array([[ 200.,    0.],

             [   0.,  200.]])

  >>> cost.l_ux(curr_x, curr_u, i)

  ... array([[ 0.,  0.]])

  >>> cost.l_uu(curr_x, curr_u, i)

  ... array([[ 0.02]])

Putting it all together

^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

  N = 1000  # Number of time-steps in trajectory.

  x0 = np.array([0.0, -0.1])  # Initial state.

  us_init = np.random.uniform(-1, 1, (N, 1)) # Random initial action path.

  ilqr = iLQR(dynamics, cost, N)

  xs, us = ilqr.fit(x0, us_init)

:code:`xs` and :code:`us` now hold the optimal state and control trajectory

that reaches the desired goal state with minimum cost.

Finally, a :code:`RecedingHorizonController` is also bundled with this package

to use the :code:`iLQR` controller in Model Predictive Control.

Important notes

^^^^^^^^^^^^^^^

To quote from Tassa's paper: "Two important parameters which have a direct

impact on performance are the simulation time-step :code:`dt` and the horizon

length :code:`N`. Since speed is of the essence, the goal is to choose those

values which minimize the number of steps in the trajectory, i.e. the largest

possible time-step and the shortest possible horizon. The size of :code:`dt`

is limited by our use of Euler integration; beyond some value the simulation

becomes unstable. The minimum length of the horizon :code:`N` is a

problem-dependent quantity which must be found by trial-and-error."

Contributing

------------

Contributions are welcome. Simply open an issue or pull request on the matter.

Linting

-------

We use `YAPF `_ for all Python formatting

needs. You can auto-format your changes with the following command:

.. code-block:: bash

  yapf --recursive --in-place --parallel .

You may install the linter as follows:

.. code-block:: bash

  pipenv install --dev

License

-------

See `LICENSE `_.

Credits

-------

This implementation was partially based on Yuval Tassa's :code:`MATLAB`

`implementation `_,

and `navigator8972 `_'s

`implementation `_.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/anassinator/ilqr

Awesome Lists containing this project

README