{"id":26123471,"url":"https://github.com/cvxgrp/vgi","last_synced_at":"2025-04-13T14:34:55.875Z","repository":{"id":181327502,"uuid":"665288036","full_name":"cvxgrp/vgi","owner":"cvxgrp","description":"Value-gradient iteration for convex stochastic control","archived":false,"fork":false,"pushed_at":"2023-11-02T20:12:01.000Z","size":691,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-27T05:34:45.526Z","etag":null,"topics":["approximate-dynamic-programming","convex-optimization","mpc","python"],"latest_commit_sha":null,"homepage":"https://stanford.edu/~boyd/papers/vgi.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cvxgrp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-11T21:47:49.000Z","updated_at":"2025-02-04T18:53:12.000Z","dependencies_parsed_at":"2023-07-15T00:46:41.068Z","dependency_job_id":"d0a0a41f-a5bd-4ecd-a5a1-67b1fe0c1dcb","html_url":"https://github.com/cvxgrp/vgi","commit_stats":null,"previous_names":["cvxgrp/vgi"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvxgrp%2Fvgi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvxgrp%2Fvgi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvxgrp%2Fvgi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvxgrp%2Fvgi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cvxgrp","download_url":"https://codeload.github.com/cvxgrp/vgi/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248728732,"owners_count":21152285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-dynamic-programming","convex-optimization","mpc","python"],"created_at":"2025-03-10T15:53:19.679Z","updated_at":"2025-04-13T14:34:55.855Z","avatar_url":"https://github.com/cvxgrp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VGI - A method for convex stochastic control\n[![Main Test](https://github.com/cvxgrp/vgi/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/cvxgrp/vgi/actions/workflows/test.yml)\n\nValue-gradient iteration (VGI) is a method for designing policies for convex stochastic control problems\ncharacterized by random linear dynamics and convex stage cost. We consider policies\nthat employ quadratic approximate value functions as a substitute for the true value\nfunction. Evaluating the associated control policy involves solving a convex problem,\ntypically a quadratic program, which can be carried out reliably in real-time. VGI fits \nthe gradient of the value function with regularization that can include\nconstraints reflecting known bounds on the true value function. Our value-gradient\niteration method can yield a good approximate value function with few samples, and\nlittle hyperparameter tuning.\n\nFor more details, see our [manuscript](https://stanford.edu/~boyd/papers/pdf/vgi.pdf).\n\nTo install locally, clone the repository, and run\n```pip install -e .```\nin the repo directory. Optionally, create a pyenv or conda environment first. Note that the examples require additional dependencies,[```torch```](https://pytorch.org/) and [```cvxpylayers```](https://github.com/cvxgrp/cvxpylayers).\n\n## Convex stochastic control\nWe consider convex stochastic control problems, which have dynamics\n$$x_{t+1} = A_tx_t + B_tu_t + c_t,$$\nwhere $x_t$ is the state, $u_t$ is the input, and $(A_t,B_t,c_t)$ may be random (but indpendent in time).\n\nThe goal is to minimize the average cost\n$$J = \\lim_{T\\to\\infty}\\frac 1 T \\sum_{t=0}^{T-1} g(x_t, u_t),$$\nwhere $g$ is a convex stage cost. The stage cost can take on infinite values, to represent constraints on $(x_t, u_t)$.\n\nWe consider approximate dynamic programming (ADP) control policies of the form\n$$\\phi(x_t) = \\text{argmin}_u \\left(g(x_t, u) + \\mathbf{E} \\hat V(A_t x_t + B_t u + c_t)\\right),$$\nwhere $\\hat V$ is a quadratic approximate value function of the form $\\hat V(x) = (1/2)x^TPx + p^Tx$. If $\\hat V$ is an optimal value function, then the ADP policy is also optimal.\n\n## Example\n\nIn this example, we have a box-constrained linear quadratic regulator (LQR) problem, with dynamics\n$$x_{t+1} = Ax_t + Bu_t + c_t,$$\nwhere $A$ and $B$ are fixed and $c_t$ is a zero-mean Gaussian random variable. The stage cost is\n$$g(x_t,u_t) = x_t^TQx_t + u_t^TR u_t + I(\\|u_t\\|_{\\infty} \\le u_{\\max}),$$\nwhere $Q$ and $R$ are positive semidefinite matrices and the last term is an indicator function that encodes the constraint that the entries of $u_t$ all have magnitude at most $u_{\\max}$.\n\nWe can initialize a ```ControlProblem``` instance with state $x_t\\in\\mathbf{R}^{12}$ and input $u_t\\in\\mathbf{R}^{3}$ with\n```python\nn = 3\nm = 2\nproblem = BoxLQRProblem.create_problem_instance(n, m, seed=0, processes=5)\n```\nAdding the extra argument ```processes=5``` lets us run simulations in parallel using 5 processes. By default, the cost is evaluated by simulating ```eval_trajectories=5``` trajectories, each for ```eval_horizon=2*10**4``` steps. \n\nWe can get a quadratic lower bound on the optimal value function with\n```python\nV_lb = problem.V_lb()\n```\n\nTo create an ADP policy and MPC policy with 30-step lookahead, we call\n```python\npolicy = problem.create_policy(compile=True, name=\"box_lqr_policy\", V=V_lb)\nmpc = problem.create_policy(lookahead=30, compile=True, name=\"box_lqr_policy\")\n```\nSetting the argument ```compile=True``` generates a custom solver implementation in C using [CVXPYgen](https://github.com/cvxgrp/cvxpygen).\n\nTo find an ADP policy using VGI, we run\n```python\n# initialize VGI method\nvgi_method = vgi.VGI(\n    problem,\n    policy,\n    vgi.QuadGradReg(),\n    trajectory_len=50,\n    num_trajectories=1,\n    damping=0.5,\n)\n# find ADP policy by running VGI for 20 iterations\nadp_policy = vgi_method(20)\n```\n\nTo simulate the policy for 100 steps and plot the state trajectories, we can run\n```python\nsimulation = problem.simulate(adp_policy, 100)\n\nimport matplotlib.pyplot as plt\nplt.plot(simulation.states_matrix)\nplt.show()\n```\n\nTo evaluate the average cost of the policy via simulation, we can run\n```python\nadp_cost = problem.cost(adp_policy)\n```\n\n## Defining your own control problems\n\nExamples of control problems can be found in [The examples folder](examples/). To set up a new control problem, we can inherit the ```ControlProblem``` class. For example, to create a linear quadratic regulator (LQR) problem, we might write\n```python\nfrom vgi import ControlProblem\nclass LQRProblem(ControlProblem):\n\n    def __init__(self, A, B, Q, R):\n        \"\"\"Constructor for LQR problem\"\"\"\n        self.A = A\n        self.B = B\n        self.Q = Q\n        self.R = R\n        n, m = B.shape\n        super().__init__(n, m)\n\n    def step(self, x, u):\n        \"\"\"Dynamics for simulation. Returns next state, noise/observation/measurements, and stage cost\"\"\"\n        c = np.random.randn(self.n)\n        x_next = self.A @ x + self.B @ u + c\n        stage_cost = x.T @ self.Q @ x + u.T @ self.R @ u\n        return x_next, c, stage_cost\n\n    def sample_initial_condition(self):\n        return np.random.randn(self.n)\n```\nTo create a corresponding policy for the ```LQRProblem```, we can create a ```LQRPolicy```, which inherits from ```COCP```, the class for convex optimization control policies (COCPs):\n```python\nimport cvxpy as cp\nfrom vgi import COCP\n\nclass LQRPolicy(COCP):\n    def stage_cost(self, x, u):\n        constraints = []\n        return cp.quad_form(x, self.Q) + cp.quad_form(u, self.R), constraints\n```\nThe stage cost function takes in CVXPY variables ```x``` and ```u```, and returns an expression for the stage cost, and any constraints on ```x``` and ```u```. The COCP constructor takes the state and control dimensions ```n``` and ```m``` as arguments, as well as any additional named parameters, such as the positive semidefinite cost matrices ```Q``` and ```R```, as well as the dynamics matrices ```A``` and ```B```.\n\nFor example, suppose we have an LQR problem with state dimension 3, input dimension 2, and randomly generated dynamics:\n```python\n# problem dimensions\nimport numpy as np\nn = 3\nm = 2\n\n# generate random dynamics matrices\nnp.random.seed(0)\nA = np.random.randn(n, n)\nA /= np.max(np.abs(np.linalg.eigvals(A)))\nB = np.random.randn(n, m)\n\n# mean of c\nc = np.zeros(n)\n\n# cost parameters\nQ = np.eye(n)\nR = np.eye(m)\n\ncontrol_problem = LQRProblem(A, B, Q, R)\n``` \nTo create an ADP policy with randomly generated quadratic approximate value function,\n```python\nfrom vgi import QuadForm\nV_hat = QuadForm.random(n)\nadp_policy = LQRPolicy(n, m, Q=Q, R=R, A=A, B=B, c=c, V=V_hat)\n```\nTo compile the policy to a custom solver implementation in C using CVXPYgen, add the argument ```compile=True``` as well as a directory name for the generated code, e.g. ```name=\"lqr_policy\"```.\n\nTo simulate the policy for ```T``` steps, run\n```python\nT = 100\nsim = control_problem.simulate(adp_policy, T, seed=0)\n```\nThis yields a ```Simulation``` object. Calling ```sim.states_matrix``` gives a ```(T, n)``` matrix of the visited states.\n\nTo evaluate the average cost of the policy via simulation, we can run\n```python\nadp_cost = control_problem.cost(adp_policy, seed=0)\n```\nThis runs ```eval_trajectories``` simulations starting from different randomly sampled initial conditions, each for ```eval_horizon``` steps, and returns the average cost. The simulations may optionally be run in parallel.\n\nThose parameters may be set explicitly in the constructor for the control problem. For example, if we construct the ```LQRProblem``` as\n```python\ncontrol_problem = LQRProblem(A, B, Q, R, eval_horizon=1000, eval_trajectories=5, processes=5)\n```\nthen running\n```python\nadp_cost = control_problem.cost(adp_policy, seed=0)\n```\nwill run 5 simulations in parallel, each for 1000 steps, and return the average cost on those trajectories.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcvxgrp%2Fvgi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcvxgrp%2Fvgi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcvxgrp%2Fvgi/lists"}