{"id":13442767,"url":"https://github.com/UM-ARM-Lab/pytorch_mppi","last_synced_at":"2025-03-20T15:30:48.470Z","repository":{"id":43704620,"uuid":"230365910","full_name":"UM-ARM-Lab/pytorch_mppi","owner":"UM-ARM-Lab","description":"Model Predictive Path Integral (MPPI) with approximate dynamics implemented in pytorch","archived":false,"fork":false,"pushed_at":"2024-08-22T02:30:01.000Z","size":206,"stargazers_count":395,"open_issues_count":2,"forks_count":58,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-09-22T01:24:23.225Z","etag":null,"topics":["approximate-dynamics","controls","model-predictive-control","mppi","pytorch"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UM-ARM-Lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-27T03:23:24.000Z","updated_at":"2024-09-20T14:36:30.000Z","dependencies_parsed_at":"2023-07-26T10:48:53.126Z","dependency_job_id":"3fc9b7ac-1ad6-46ac-a02d-056465ea1d52","html_url":"https://github.com/UM-ARM-Lab/pytorch_mppi","commit_stats":{"total_commits":81,"total_committers":6,"mean_commits":13.5,"dds":"0.13580246913580252","last_synced_commit":"c1f3869102b61b4b0ac952aba25c2204fed52926"},"previous_names":["lemonpi/pytorch_mppi"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UM-ARM-Lab%2Fpytorch_mppi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UM-ARM-Lab%2Fpytorch_mppi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UM-ARM-Lab%2Fpytorch_mppi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UM-ARM-Lab%2Fpytorch_mppi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UM-ARM-Lab","download_url":"https://codeload.github.com/UM-ARM-Lab/pytorch_mppi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221772552,"owners_count":16878127,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-dynamics","controls","model-predictive-control","mppi","pytorch"],"created_at":"2024-07-31T03:01:50.381Z","updated_at":"2024-10-28T03:31:00.259Z","avatar_url":"https://github.com/UM-ARM-Lab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# PyTorch MPPI Implementation\nThis repository implements Model Predictive Path Integral (MPPI) \nwith approximate dynamics in pytorch. MPPI typically requires actual\ntrajectory samples, but [this paper](https://ieeexplore.ieee.org/document/7989202/)\nshowed that it could be done with approximate dynamics (such as with a neural network)\nusing importance sampling.\n\nThus it can be used in place of other trajectory optimization methods\nsuch as the Cross Entropy Method (CEM), or random shooting.\n\n---\nNew since Aug 2024 smoothing methods, including our own KMPPI, see the [section](#smoothing) below on smoothing\n\n# Installation\n```shell\npip install pytorch-mppi\n```\nfor autotuning hyperparameters, install with\n```shell\npip install pytorch-mppi[tune]\n```\n\nfor running tests, install with\n```shell\npip install pytorch-mppi[test]\n```\nfor development, clone the repository then install in editable mode\n```shell\npip install -e .\n```\n\n# Usage\nSee `tests/pendulum_approximate.py` for usage with a neural network approximating\nthe pendulum dynamics. See the `not_batch` branch for an easier to read\nalgorithm. Basic use case is shown below\n\n```python\nfrom pytorch_mppi import MPPI\n\n# create controller with chosen parameters\nctrl = MPPI(dynamics, running_cost, nx, noise_sigma, num_samples=N_SAMPLES, horizon=TIMESTEPS,\n            lambda_=lambda_, device=d,\n            u_min=torch.tensor(ACTION_LOW, dtype=torch.double, device=d),\n            u_max=torch.tensor(ACTION_HIGH, dtype=torch.double, device=d))\n\n# assuming you have a gym-like env\nobs = env.reset()\nfor i in range(100):\n    action = ctrl.command(obs)\n    obs, reward, done, _ = env.step(action.cpu().numpy())\n```\n\n# Requirements\n- pytorch (\u003e= 1.0)\n- `next state \u003c- dynamics(state, action)` function (doesn't have to be true dynamics)\n    - `state` is `K x nx`, `action` is `K x nu`\n- `cost \u003c- running_cost(state, action)` function\n    - `cost` is `K x 1`, state is `K x nx`, `action` is `K x nu`\n\n# Features\n- Approximate dynamics MPPI with importance sampling\n- Parallel/batch pytorch implementation for accelerated sampling\n- Control bounds via sampling control noise from rectified gaussian \n- Handle stochastic dynamic models (assuming each call is a sample) by sampling multiple state trajectories for the same\naction trajectory with `rollout_samples`\n- \n# Parameter tuning and hints\n`terminal_state_cost` - function(state (K x T x nx)) -\u003e cost (K x 1) by default there is no terminal\ncost, but if you experience your trajectory getting close to but never quite reaching the goal, then\nhaving a terminal cost can help. The function should scale with the horizon (T) to keep up with the\nscaling of the running cost.\n\n`lambda_` - higher values increases the cost of control noise, so you end up with more\nsamples around the mean; generally lower values work better (try `1e-2`)\n\n`num_samples` - number of trajectories to sample; generally the more the better.\nRuntime performance scales much better with `num_samples` than `horizon`, especially\nif you're using a GPU device (remember to pass that in!)\n\n`noise_mu` - the default is 0 for all control dimensions, which may work out\nreally poorly if you have control bounds and the allowed range is not 0-centered.\nRemember to change this to an appropriate value for non-symmetric control dimensions.\n\n## Smoothing\nFrom version 0.8.0 onwards, you can use MPPI variants that smooth the control signal. We've implemented\n[SMPPI](https://arxiv.org/pdf/2112.09988) as well our own kernel interpolation MPPI (KMPPI). In the base algorithm,\nyou can achieve somewhat smoother trajectories by increasing `lambda_`; however, that comes at the cost of\noptimality. Explicit smoothing algorithms can achieve smoothness without sacrificing optimality.\n\nWe used it and described it in our recent paper ([arxiv](https://arxiv.org/abs/2408.10450)) and you can cite it \nuntil we release a work dedicated to KMPPI. Below we show the difference between MPPI, SMPPI, and KMPPI on a toy\n2D navigation problem where the control is a constrained delta position. You can check it out in `tests/smooth_mppi.py`.\n\nThe API is mostly the same, with some additional constructor options:\n```python\nimport pytorch_mppi as mppi\nctrl = mppi.KMPPI(args, \n                 kernel=mppi.RBFKernel(sigma=2), # kernel in trajectory time space (1 dimensional)\n                 num_support_pts=5,              # number of control points to sample, \u003c= horizon\n                 **kwargs)\n```\nThe kernel can be any subclass of `mppi.TimeKernel`. It is a kernel in the trajectory time space (1 dimensional).\nNote that B-spline smoothing can be achieved by using a B-spline kernel. The number of support points is the number\nof control points to sample. Any trajectory points in between are interpolated using the kernel. For example if a\ntrajectory horizon is 20 and `num_support_pts` is 5, then 5 control points evenly spaced throughout the horizon\n(with the first and last corresponding to the actual start and end of the trajectory) are sampled. The rest of the\ntrajectory is interpolated using the kernel. The kernel is applied to the control signal, not the state signal.\n\nMPPI without smoothing\n\n![MPPI](https://imgur.com/9wEcT2s.gif) \n\n[SMPPI](https://arxiv.org/pdf/2112.09988) smoothing by sampling noise in the action derivative space doesn't work well on this problem\n\n![SMPPI](https://imgur.com/xwYy3aj.gif)\n\nKMPPI smoothing with RBF kernel works well\n\n![KMPPI](https://imgur.com/IG1Zrtd.gif)\n\n\n## Autotune\nfrom version 0.5.0 onwards, you can automatically tune the hyperparameters.\nA convenient tuner compatible with the popular [ray tune](https://docs.ray.io/en/latest/tune/index.html) library\nis implemented. You can select from a variety of cutting edge black-box optimizers such as \n[CMA-ES](https://github.com/CMA-ES/pycma), [HyperOpt](http://hyperopt.github.io/hyperopt/),\n[fmfn/BayesianOptimization](https://github.com/fmfn/BayesianOptimization), and so on.\nSee `tests/auto_tune_parameters.py` for an example. A tutorial based on it follows.\n\nThe tuner can be used for other controllers as well, but you will need to define the appropriate\n`TunableParameter` subclasses.\n\nFirst we create a toy 2D environment to do controls on and create the controller with some\ndefault parameters.\n```python\nimport torch\nfrom pytorch_mppi import MPPI\n\ndevice = \"cpu\"\ndtype = torch.double\n\n# create toy environment to do on control on (default start and goal)\nenv = Toy2DEnvironment(visualize=True, terminal_scale=10)\n\n# create MPPI with some initial parameters\nmppi = MPPI(env.dynamics, env.running_cost, 2,\n            terminal_state_cost=env.terminal_cost,\n            noise_sigma=torch.diag(torch.tensor([5., 5.], dtype=dtype, device=device)),\n            num_samples=500,\n            horizon=20, device=device,\n            u_max=torch.tensor([2., 2.], dtype=dtype, device=device),\n            lambda_=1)\n```\n\nWe then need to create an evaluation function for the tuner to tune on. \nIt should take no arguments and output a `EvaluationResult` populated at least by costs.\nIf you don't need rollouts for the cost evaluation, then you can set it to None in the return.\nTips for creating the evaluation function are described in comments below:\n\n```python\nfrom pytorch_mppi import autotune\n# use the same nominal trajectory to start with for all the evaluations for fairness\nnominal_trajectory = mppi.U.clone()\n# parameters for our sample evaluation function - lots of choices for the evaluation function\nevaluate_running_cost = True\nnum_refinement_steps = 10\nnum_trajectories = 5\n\ndef evaluate():\n    costs = []\n    rollouts = []\n    # we sample multiple trajectories for the same start to goal problem, but in your case you should consider\n    # evaluating over a diverse dataset of trajectories\n    for j in range(num_trajectories):\n        mppi.U = nominal_trajectory.clone()\n        # the nominal trajectory at the start will be different if the horizon's changed\n        mppi.change_horizon(mppi.T)\n        # usually MPPI will have its nominal trajectory warm-started from the previous iteration\n        # for a fair test of tuning we will reset its nominal trajectory to the same random one each time\n        # we manually warm it by refining it for some steps\n        for k in range(num_refinement_steps):\n            mppi.command(env.start, shift_nominal_trajectory=False)\n\n        rollout = mppi.get_rollouts(env.start)\n\n        this_cost = 0\n        rollout = rollout[0]\n        # here we evaluate on the rollout MPPI cost of the resulting trajectories\n        # alternative costs for tuning the parameters are possible, such as just considering terminal cost\n        if evaluate_running_cost:\n            for t in range(len(rollout) - 1):\n                this_cost = this_cost + env.running_cost(rollout[t], mppi.U[t])\n        this_cost = this_cost + env.terminal_cost(rollout, mppi.U)\n\n        rollouts.append(rollout)\n        costs.append(this_cost)\n    # can return None for rollouts if they do not need to be calculated\n    return autotune.EvaluationResult(torch.stack(costs), torch.stack(rollouts))\n```\n\nWith this we have enough to start tuning. For example, we can tune iteratively with the CMA-ES optimizer\n\n```python\n# these are subclass of TunableParameter (specifically MPPIParameter) that we want to tune\nparams_to_tune = [autotune.SigmaParameter(mppi), autotune.HorizonParameter(mppi), autotune.LambdaParameter(mppi)]\n# create a tuner with a CMA-ES optimizer\ntuner = autotune.Autotune(params_to_tune, evaluate_fn=evaluate, optimizer=autotune.CMAESOpt(sigma=1.0))\n# tune parameters for a number of iterations\niterations = 30\nfor i in range(iterations):\n  # results of this optimization step are returned\n  res = tuner.optimize_step()\n  # we can render the rollouts in the environment\n  env.draw_rollouts(res.rollouts)\n# get best results and apply it to the controller\n# (by default the controller will take on the latest tuned parameter, which may not be best)\nres = tuner.get_best_result()\ntuner.apply_parameters(res.param_values)\n```\nThis is a local search method that optimizes starting from the initially defined parameters.\nFor global searching, we use ray tune compatible searching algorithms. Note that you can modify the\nsearch space of each parameter, but default reasonable ones are provided.\n\n```python\n# can also use a Ray Tune optimizer, see\n# https://docs.ray.io/en/latest/tune/api_docs/suggestion.html#search-algorithms-tune-search\n# rather than adapting the current parameters, these optimizers allow you to define a search space for each\n# and will search on that space\nfrom pytorch_mppi import autotune_global\nfrom ray.tune.search.hyperopt import HyperOptSearch\nfrom ray.tune.search.bayesopt import BayesOptSearch\n\n# the global version of the parameters define a reasonable search space for each parameter\nparams_to_tune = [autotune_global.SigmaGlobalParameter(mppi),\n                  autotune_global.HorizonGlobalParameter(mppi),\n                  autotune_global.LambdaGlobalParameter(mppi)]\n\n# be sure to close any figures before ray tune optimization or they will be duplicated\nenv.visualize = False\nplt.close('all')\ntuner = autotune_global.AutotuneGlobal(params_to_tune, evaluate_fn=evaluate,\n                                       optimizer=autotune_global.RayOptimizer(HyperOptSearch))\n# ray tuners cannot be tuned iteratively, but you can specify how many iterations to tune for\nres = tuner.optimize_all(100)\nres = tuner.get_best_result()\ntuner.apply_parameters(res.params)\n```\n\nFor example tuning hyperparameters (with CMA-ES) only on the toy problem (the nominal trajectory is reset each time so they are sampling from noise):\n\n![toy tuning](https://i.imgur.com/2qtYMwu.gif)\n\nIf you want more than just the best solution found, such as if you want diversity\nacross hyperparameter values, or if your evaluation function has large uncertainty,\nthen you can directly query past results by\n```python\nfor res in tuner.optim.all_res:\n    # the cost\n    print(res.metrics['cost'])\n    # extract the parameters\n    params = tuner.config_to_params(res.config)\n    print(params)\n    # apply the parameters to the controller\n    tuner.apply_parameters(params)\n```\n\nAlternatively you can try Quality Diversity optimization using the \n[CMA-ME optimizer](https://github.com/icaros-usc/pyribs). This optimizer will\ntry to optimize for high quality parameters while ensuring there is diversity across\nthem. However, it is very slow and you might be better using a `RayOptimizer` and selecting\nfor top results while checking for diversity.\nTo use it, you need to install\n```python\npip install ribs\n```\n\nYou then use it as\n\n```python\nimport pytorch_mppi.autotune_qd\n\noptim = pytorch_mppi.autotune_qd.CMAMEOpt()\ntuner = autotune_global.AutotuneGlobal(params_to_tune, evaluate_fn=evaluate,\n                                       optimizer=optim)\n\niterations = 10\nfor i in range(iterations):\n  # results of this optimization step are returned\n  res = tuner.optimize_step()\n  # we can render the rollouts in the environment\n  best_params = optim.get_diverse_top_parameters(5)\n  for res in best_params:\n    print(res)\n```\n\n# Tests\nUnder `tests` you can find the `MPPI` method applied to known pendulum dynamics\nand approximate pendulum dynamics (with a 2 layer feedforward net \nestimating the state residual). Using a continuous angle representation\n(feeding `cos(\\theta), sin(\\theta)` instead of `\\theta` directly) makes\na huge difference. Although both works, the continuous representation\nis much more robust to controller parameters and random seed. In addition,\nthe problem of continuing to spin after over-swinging does not appear.\n\nSample result on approximate dynamics with 100 steps of random policy data\nto initialize the dynamics:\n\n![pendulum results](https://i.imgur.com/euYQJ25.gif)\n\n# Related projects\n- [pytorch CEM](https://github.com/LemonPi/pytorch_cem) - an alternative MPC shooting method with similar API as this\nproject\n- [pytorch iCEM](https://github.com/UM-ARM-Lab/pytorch_icem) - alternative sampling based MPC\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUM-ARM-Lab%2Fpytorch_mppi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FUM-ARM-Lab%2Fpytorch_mppi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FUM-ARM-Lab%2Fpytorch_mppi/lists"}