{"id":13445565,"url":"https://github.com/anassinator/ilqr","last_synced_at":"2025-03-20T21:30:57.512Z","repository":{"id":47985822,"uuid":"111102117","full_name":"anassinator/ilqr","owner":"anassinator","description":"Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models","archived":false,"fork":false,"pushed_at":"2022-06-21T21:07:51.000Z","size":1392,"stargazers_count":358,"open_issues_count":7,"forks_count":78,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-08-01T05:15:30.845Z","etag":null,"topics":["auto-differentiation","cartpole","control-systems","ddp","differential-dynamic-programming","dynamics-models","ilqg","ilqr","model-predictive-control","model-predictive-controller","mpc","mpc-control","non-linear-optimization","optimal-control","pendulum","theano","trajectory-optimization","trajectory-tracking"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anassinator.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-17T12:40:03.000Z","updated_at":"2024-07-25T13:23:03.000Z","dependencies_parsed_at":"2022-08-21T08:40:18.413Z","dependency_job_id":null,"html_url":"https://github.com/anassinator/ilqr","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anassinator%2Filqr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anassinator%2Filqr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anassinator%2Filqr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anassinator%2Filqr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anassinator","download_url":"https://codeload.github.com/anassinator/ilqr/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221807736,"owners_count":16883639,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-differentiation","cartpole","control-systems","ddp","differential-dynamic-programming","dynamics-models","ilqg","ilqr","model-predictive-control","model-predictive-controller","mpc","mpc-control","non-linear-optimization","optimal-control","pendulum","theano","trajectory-optimization","trajectory-tracking"],"created_at":"2024-07-31T05:00:36.027Z","updated_at":"2025-03-20T21:30:57.506Z","avatar_url":"https://github.com/anassinator.png","language":"Python","readme":"Iterative Linear Quadratic Regulator\n====================================\n\n.. image:: https://travis-ci.org/anassinator/ilqr.svg?branch=master\n  :target: https://travis-ci.org/anassinator/ilqr\n\nThis is an implementation of the Iterative Linear Quadratic Regulator (iLQR)\nfor non-linear trajectory optimization based on Yuval Tassa's\n`paper \u003chttps://homes.cs.washington.edu/~todorov/papers/TassaIROS12.pdf\u003e`_.\n\nIt is compatible with both Python 2 and 3 and has built-in support for\nauto-differentiating both the dynamics model and the cost function using\n`Theano \u003chttp://deeplearning.net/software/theano/\u003e`_.\n\nInstall\n-------\n\nTo install, clone and run:\n\n.. code-block:: bash\n\n  python setup.py install\n\nYou may also install the dependencies with `pipenv` as follows:\n\n.. code-block:: bash\n\n  pipenv install\n\nUsage\n-----\n\nAfter installing, :code:`import` as follows:\n\n.. code-block:: python\n\n  from ilqr import iLQR\n\nYou can see the `examples \u003cexamples/\u003e`_ directory for\n`Jupyter \u003chttps://jupyter.org\u003e`_ notebooks to see how common control problems\ncan be solved through iLQR.\n\nDynamics model\n^^^^^^^^^^^^^^\n\nYou can set up your own dynamics model by either extending the :code:`Dynamics`\nclass and hard-coding it and its partial derivatives. Alternatively, you can\nwrite it up as a `Theano` expression and use the :code:`AutoDiffDynamics` class\nfor it to be auto-differentiated. Finally, if all you have is a function, you\ncan use the :code:`FiniteDiffDynamics` class to approximate the derivatives\nwith finite difference approximation.\n\nThis section demonstrates how to implement the following dynamics model:\n\n.. math::\n\n  m \\dot{v} = F - \\alpha v\n\nwhere :math:`m` is the object's mass in :math:`kg`, :math:`alpha` is the\nfriction coefficient, :math:`v` is the object's velocity in :math:`m/s`,\n:math:`\\dot{v}` is the object's acceleration in :math:`m/s^2`, and :math:`F` is\nthe control (or force) you're applying to the object in :math:`N`.\n\nAutomatic differentiation\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: python\n\n  import theano.tensor as T\n  from ilqr.dynamics import AutoDiffDynamics\n\n  x = T.dscalar(\"x\")  # Position.\n  x_dot = T.dscalar(\"x_dot\")  # Velocity.\n  F = T.dscalar(\"F\")  # Force.\n\n  dt = 0.01  # Discrete time-step in seconds.\n  m = 1.0  # Mass in kg.\n  alpha = 0.1  # Friction coefficient.\n\n  # Acceleration.\n  x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m\n\n  # Discrete dynamics model definition.\n  f = T.stack([\n      x + x_dot * dt,\n      x_dot + x_dot_dot * dt,\n  ])\n\n  x_inputs = [x, x_dot]  # State vector.\n  u_inputs = [F]  # Control vector.\n\n  # Compile the dynamics.\n  # NOTE: This can be slow as it's computing and compiling the derivatives.\n  # But that's okay since it's only a one-time cost on startup.\n  dynamics = AutoDiffDynamics(f, x_inputs, u_inputs)\n\n*Note*: If you want to be able to use the Hessians (:code:`f_xx`, :code:`f_ux`,\nand :code:`f_uu`), you need to pass the :code:`hessians=True` argument to the\nconstructor. This will increase compilation time. Note that :code:`iLQR` does\nnot require second-order derivatives to function.\n\nBatch automatic differentiation\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: python\n\n  import theano.tensor as T\n  from ilqr.dynamics import BatchAutoDiffDynamics\n\n  state_size = 2  # [position, velocity]\n  action_size = 1  # [force]\n\n  dt = 0.01  # Discrete time-step in seconds.\n  m = 1.0  # Mass in kg.\n  alpha = 0.1  # Friction coefficient.\n\n  def f(x, u, i):\n      \"\"\"Batched implementation of the dynamics model.\n\n      Args:\n          x: State vector [*, state_size].\n          u: Control vector [*, action_size].\n          i: Current time step [*, 1].\n\n      Returns:\n          Next state vector [*, state_size].\n      \"\"\"\n      x_ = x[..., 0]\n      x_dot = x[..., 1]\n      F = u[..., 0]\n\n      # Acceleration.\n      x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m\n\n      # Discrete dynamics model definition.\n      return T.stack([\n          x_ + x_dot * dt,\n          x_dot + x_dot_dot * dt,\n      ]).T\n\n  # Compile the dynamics.\n  # NOTE: This can be slow as it's computing and compiling the derivatives.\n  # But that's okay since it's only a one-time cost on startup.\n  dynamics = BatchAutoDiffDynamics(f, state_size, action_size)\n\n*Note*: This is a faster version of :code:`AutoDiffDynamics` that doesn't\nsupport Hessians.\n\nFinite difference approximation\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: python\n\n  from ilqr.dynamics import FiniteDiffDynamics\n\n  state_size = 2  # [position, velocity]\n  action_size = 1  # [force]\n\n  dt = 0.01  # Discrete time-step in seconds.\n  m = 1.0  # Mass in kg.\n  alpha = 0.1  # Friction coefficient.\n\n  def f(x, u, i):\n      \"\"\"Dynamics model function.\n\n      Args:\n          x: State vector [state_size].\n          u: Control vector [action_size].\n          i: Current time step.\n\n      Returns:\n          Next state vector [state_size].\n      \"\"\"\n      [x, x_dot] = x\n      [F] = u\n\n      # Acceleration.\n      x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m\n\n      return np.array([\n          x + x_dot * dt,\n          x_dot + x_dot_dot * dt,\n      ])\n\n  # NOTE: Unlike with AutoDiffDynamics, this is instantaneous, but will not be\n  # as accurate.\n  dynamics = FiniteDiffDynamics(f, state_size, action_size)\n\n*Note*: It is possible you might need to play with the epsilon values\n(:code:`x_eps` and :code:`u_eps`) used when computing the approximation if you\nrun into numerical instability issues.\n\nUsage\n\"\"\"\"\"\n\nRegardless of the method used for constructing your dynamics model, you can use\nthem as follows:\n\n.. code-block:: python\n\n  curr_x = np.array([1.0, 2.0])\n  curr_u = np.array([0.0])\n  i = 0  # This dynamics model is not time-varying, so this doesn't matter.\n\n  \u003e\u003e\u003e dynamics.f(curr_x, curr_u, i)\n  ... array([ 1.02   ,  2.01998])\n  \u003e\u003e\u003e dynamics.f_x(curr_x, curr_u, i)\n  ... array([[ 1.     ,  0.01   ],\n             [ 0.     ,  1.00999]])\n  \u003e\u003e\u003e dynamics.f_u(curr_x, curr_u, i)\n  ... array([[ 0.    ],\n             [ 0.0001]])\n\nComparing the output of the :code:`AutoDiffDynamics` and the\n:code:`FiniteDiffDynamics` models should generally yield consistent results,\nbut the auto-differentiated method will always be more accurate. Generally, the\nfinite difference approximation will be faster unless you're also computing the\nHessians: in which case, Theano's compiled derivatives are more optimized.\n\nCost function\n^^^^^^^^^^^^^\n\nSimilarly, you can set up your own cost function by either extending the\n:code:`Cost` class and hard-coding it and its partial derivatives.\nAlternatively, you can write it up as a `Theano` expression and use the\n:code:`AutoDiffCost` class for it to be auto-differentiated. Finally, if all\nyou have are a loss functions, you can use the :code:`FiniteDiffCost` class to\napproximate the derivatives with finite difference approximation.\n\nThe most common cost function is the quadratic format used by Linear Quadratic\nRegulators:\n\n.. math::\n\n  (x - x_{goal})^T Q (x - x_{goal}) + (u - u_{goal})^T R (u - u_{goal})\n\nwhere :math:`Q` and :math:`R` are matrices defining your quadratic state error\nand quadratic control errors and :math:`x_{goal}` is your target state. For\nconvenience, an implementation of this cost function is made available as the\n:code:`QRCost` class.\n\n:code:`QRCost` class\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: python\n\n  import numpy as np\n  from ilqr.cost import QRCost\n\n  state_size = 2  # [position, velocity]\n  action_size = 1  # [force]\n\n  # The coefficients weigh how much your state error is worth to you vs\n  # the size of your controls. You can favor a solution that uses smaller\n  # controls by increasing R's coefficient.\n  Q = 100 * np.eye(state_size)\n  R = 0.01 * np.eye(action_size)\n\n  # This is optional if you want your cost to be computed differently at a\n  # terminal state.\n  Q_terminal = np.array([[100.0, 0.0], [0.0, 0.1]])\n\n  # State goal is set to a position of 1 m with no velocity.\n  x_goal = np.array([1.0, 0.0])\n\n  # NOTE: This is instantaneous and completely accurate.\n  cost = QRCost(Q, R, Q_terminal=Q_terminal, x_goal=x_goal)\n\nAutomatic differentiation\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: python\n\n  import theano.tensor as T\n  from ilqr.cost import AutoDiffCost\n\n  x_inputs = [T.dscalar(\"x\"), T.dscalar(\"x_dot\")]\n  u_inputs = [T.dscalar(\"F\")]\n\n  x = T.stack(x_inputs)\n  u = T.stack(u_inputs)\n\n  x_diff = x - x_goal\n  l = x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)\n  l_terminal = x_diff.T.dot(Q_terminal).dot(x_diff)\n\n  # Compile the cost.\n  # NOTE: This can be slow as it's computing and compiling the derivatives.\n  # But that's okay since it's only a one-time cost on startup.\n  cost = AutoDiffCost(l, l_terminal, x_inputs, u_inputs)\n\nBatch automatic differentiation\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: python\n\n  import theano.tensor as T\n  from ilqr.cost import BatchAutoDiffCost\n\n  def cost_function(x, u, i, terminal):\n      \"\"\"Batched implementation of the quadratic cost function.\n\n      Args:\n          x: State vector [*, state_size].\n          u: Control vector [*, action_size].\n          i: Current time step [*, 1].\n          terminal: Whether to compute the terminal cost.\n\n      Returns:\n          Instantaneous cost [*].\n      \"\"\"\n      Q_ = Q_terminal if terminal else Q\n      l = x.dot(Q_).dot(x.T)\n      if l.ndim == 2:\n          l = T.diag(l)\n\n      if not terminal:\n          l_u = u.dot(R).dot(u.T)\n          if l_u.ndim == 2:\n              l_u = T.diag(l_u)\n          l += l_u\n\n      return l\n\n  # Compile the cost.\n  # NOTE: This can be slow as it's computing and compiling the derivatives.\n  # But that's okay since it's only a one-time cost on startup.\n  cost = BatchAutoDiffCost(cost_function, state_size, action_size)\n\nFinite difference approximation\n\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n\n.. code-block:: python\n\n  from ilqr.cost import FiniteDiffCost\n\n\n  def l(x, u, i):\n      \"\"\"Instantaneous cost function.\n\n      Args:\n          x: State vector [state_size].\n          u: Control vector [action_size].\n          i: Current time step.\n\n      Returns:\n          Instantaneous cost [scalar].\n      \"\"\"\n      x_diff = x - x_goal\n      return x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)\n\n\n  def l_terminal(x, i):\n      \"\"\"Terminal cost function.\n\n      Args:\n          x: State vector [state_size].\n          i: Current time step.\n\n      Returns:\n          Terminal cost [scalar].\n      \"\"\"\n      x_diff = x - x_goal\n      return x_diff.T.dot(Q_terminal).dot(x_diff)\n\n\n  # NOTE: Unlike with AutoDiffCost, this is instantaneous, but will not be as\n  # accurate.\n  cost = FiniteDiffCost(l, l_terminal, state_size, action_size)\n\n*Note*: It is possible you might need to play with the epsilon values\n(:code:`x_eps` and :code:`u_eps`) used when computing the approximation if you\nrun into numerical instability issues.\n\nUsage\n\"\"\"\"\"\n\nRegardless of the method used for constructing your cost function, you can use\nthem as follows:\n\n.. code-block:: python\n\n  \u003e\u003e\u003e cost.l(curr_x, curr_u, i)\n  ... 400.0\n  \u003e\u003e\u003e cost.l_x(curr_x, curr_u, i)\n  ... array([   0.,  400.])\n  \u003e\u003e\u003e cost.l_u(curr_x, curr_u, i)\n  ... array([ 0.])\n  \u003e\u003e\u003e cost.l_xx(curr_x, curr_u, i)\n  ... array([[ 200.,    0.],\n             [   0.,  200.]])\n  \u003e\u003e\u003e cost.l_ux(curr_x, curr_u, i)\n  ... array([[ 0.,  0.]])\n  \u003e\u003e\u003e cost.l_uu(curr_x, curr_u, i)\n  ... array([[ 0.02]])\n\nPutting it all together\n^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n  N = 1000  # Number of time-steps in trajectory.\n  x0 = np.array([0.0, -0.1])  # Initial state.\n  us_init = np.random.uniform(-1, 1, (N, 1)) # Random initial action path.\n\n  ilqr = iLQR(dynamics, cost, N)\n  xs, us = ilqr.fit(x0, us_init)\n\n:code:`xs` and :code:`us` now hold the optimal state and control trajectory\nthat reaches the desired goal state with minimum cost.\n\nFinally, a :code:`RecedingHorizonController` is also bundled with this package\nto use the :code:`iLQR` controller in Model Predictive Control.\n\nImportant notes\n^^^^^^^^^^^^^^^\n\nTo quote from Tassa's paper: \"Two important parameters which have a direct\nimpact on performance are the simulation time-step :code:`dt` and the horizon\nlength :code:`N`. Since speed is of the essence, the goal is to choose those\nvalues which minimize the number of steps in the trajectory, i.e. the largest\npossible time-step and the shortest possible horizon. The size of :code:`dt`\nis limited by our use of Euler integration; beyond some value the simulation\nbecomes unstable. The minimum length of the horizon :code:`N` is a\nproblem-dependent quantity which must be found by trial-and-error.\"\n\nContributing\n------------\n\nContributions are welcome. Simply open an issue or pull request on the matter.\n\nLinting\n-------\n\nWe use `YAPF \u003chttps://github.com/google/yapf\u003e`_ for all Python formatting\nneeds. You can auto-format your changes with the following command:\n\n.. code-block:: bash\n\n  yapf --recursive --in-place --parallel .\n\nYou may install the linter as follows:\n\n.. code-block:: bash\n\n  pipenv install --dev\n\nLicense\n-------\n\nSee `LICENSE \u003cLICENSE\u003e`_.\n\nCredits\n-------\n\nThis implementation was partially based on Yuval Tassa's :code:`MATLAB`\n`implementation \u003chttps://www.mathworks.com/matlabcentral/fileexchange/52069\u003e`_,\nand `navigator8972 \u003chttps://github.com/navigator8972\u003e`_'s\n`implementation \u003chttps://github.com/navigator8972/pylqr\u003e`_.\n","funding_links":[],"categories":["6. Planning","Planning and Control"],"sub_categories":["3.4 High Performance Inference","Vector Map"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanassinator%2Filqr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanassinator%2Filqr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanassinator%2Filqr/lists"}