{"id":31021368,"url":"https://github.com/sisl/mpopis","last_synced_at":"2025-09-13T11:20:56.568Z","repository":{"id":65558052,"uuid":"425359819","full_name":"sisl/MPOPIS","owner":"sisl","description":"Adaptive importance sampling modification to MPPI","archived":false,"fork":false,"pushed_at":"2024-04-01T22:03:48.000Z","size":50582,"stargazers_count":109,"open_issues_count":2,"forks_count":14,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-09-07T16:53:46.598Z","etag":null,"topics":["model-predictive-control","mpc","mppi","nonlinear-control","optimal-control","sampling-based-control","sampling-based-planning"],"latest_commit_sha":null,"homepage":"","language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sisl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-11-06T22:14:05.000Z","updated_at":"2025-09-05T08:12:22.000Z","dependencies_parsed_at":"2024-04-01T22:52:13.447Z","dependency_job_id":null,"html_url":"https://github.com/sisl/MPOPIS","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/sisl/MPOPIS","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPOPIS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPOPIS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPOPIS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPOPIS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sisl","download_url":"https://codeload.github.com/sisl/MPOPIS/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sisl%2FMPOPIS/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274955746,"owners_count":25380669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-13T02:00:10.085Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["model-predictive-control","mpc","mppi","nonlinear-control","optimal-control","sampling-based-control","sampling-based-planning"],"created_at":"2025-09-13T11:20:54.325Z","updated_at":"2025-09-13T11:20:56.556Z","avatar_url":"https://github.com/sisl.png","language":"Julia","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MPOPIS (Model Predictive Optimized Path Integral Strategies)\n\n[Short YouTube video talking about MPOPIS](https://youtu.be/-7jHJP37Nio)\n\n[arXiv Paper](https://arxiv.org/abs/2203.16633)\n\nA version of model predictive path integral control (MPPI) that allows for the implementation of adaptive importance sampling (AIS) algorithms into the original importance sampling step. Model predictive optimized path integral control (MPOPI) is more sample efficient than MPPI achieving better performance with fewer samples. A video of MPPI and MPOPI controlling 3 cars side by side for comparison can be seen [here](https://youtu.be/dDifSfxtuls). More details can be found in the [wiki](../../wiki/MPOPIS-Details).\n\nThe addition of AIS enables the algorithm to use a better set of samples for the calculation of the control. A depiction of how the samples evolve over iterations can be seen in the following gif.\n#### MPOPI (CE) 150 Samples, 10 Iterations\n\u003cimg src=\"https://github.com/sisl/MPOPIS/blob/main/gifs/CE%20150-10%20AIS%20Iteration.gif\" width=\"600\" height=\"337\" /\u003e\n\n## Policy Options\nVersions of MPPI and MPOPI implemented\n - MPPI and GMPPI\n   - [MPPI](https://github.com/sisl/MPOPIS/blob/157f2d8dc94d71e39208eba8c8bc4f8061e31d83/src/mppi_mpopi_policies.jl#L104) (`:mppi`): Model Predictive Path Integral Control[^1][^2]\n   - [GMPPI](https://github.com/sisl/MPOPIS/blob/157f2d8dc94d71e39208eba8c8bc4f8061e31d83/src/mppi_mpopi_policies.jl#L280) (`:gmppi`): generalized version of MPPI, treating the control sequence as one control vector with a combined covariance matrix\n - MPOPI\n   - [i-MPPI](https://github.com/sisl/MPOPIS/blob/157f2d8dc94d71e39208eba8c8bc4f8061e31d83/src/mppi_mpopi_policies.jl#L317) (`:imppi`): iterative version of MPOPI similar to μ-AIS but without the decoupled inverse temperature parameter. μ-AIS is equivalent to IMPPI when λ_ais = λ.\n   - [PMC](https://github.com/sisl/MPOPIS/blob/157f2d8dc94d71e39208eba8c8bc4f8061e31d83/src/mppi_mpopi_policies.jl#L744) (`:pmcmppi`): population Monte Carlo algorithm with one distribution[^3]\n   - [μ-AIS](https://github.com/sisl/MPOPIS/blob/157f2d8dc94d71e39208eba8c8bc4f8061e31d83/src/mppi_mpopi_policies.jl#L608) (`:μaismppi`): mean only moment matching AIS algorithm\n   - [μΣ-AIS](https://github.com/sisl/MPOPIS/blob/157f2d8dc94d71e39208eba8c8bc4f8061e31d83/src/mppi_mpopi_policies.jl#L673) (`:μΣaismppi`): mean and covariance moment matching AIS algorithm similar to Mixture-PMC[^4]\n   - [CE](https://github.com/sisl/MPOPIS/blob/157f2d8dc94d71e39208eba8c8bc4f8061e31d83/src/mppi_mpopi_policies.jl#L375) (`:cemppi`): cross-entropy method[^5][^6]\n   - [CMA](https://github.com/sisl/MPOPIS/blob/157f2d8dc94d71e39208eba8c8bc4f8061e31d83/src/mppi_mpopi_policies.jl#L474) (`:cmamppi`): covariance matrix adaptation evolutionary strategy[^5][^7]\n\n**For implementation details reference the source code. For simulation parameters used, reference the [wiki](../../wiki/MPOPIS-Details).**\n\n## Getting Started\nUse the julia package manager to add the MPOPIS module:\n```julia\n] add https://github.com/sisl/MPOPIS\nusing MPOPIS\n```\nIf you want to use the MuJoCo environments, ensure you have `envpool` installed in your `PyCall` distribution:\n```julia\ninstall_mujoco_requirements()\n```\n\nNow, we can use the built-in example to simulate the MountainCar environment:\n```julia\nsimulate_mountaincar(policy_type=:cemppi, num_trials=5)\n```\n\nSimulate the Car Racing environment and save a gif:\n```julia\nsimulate_car_racing(save_gif=true)\n```\n\n\u003cimg src=\"https://github.com/sisl/MPOPIS/blob/main/gifs/cr-1-cemppi-150-50-10.0-1.0-10-0.8-ss-1-2.gif\" width=\"450\" height=\"450\" /\u003e\n\nAlso plotting the trajectories and simulating multiple cars\n```julia\nsimulate_car_racing(num_cars=3, plot_traj=true, save_gif=true)\n```\n\u003cimg src=\"https://github.com/sisl/MPOPIS/blob/main/gifs/mcr-3-cemppi-150-50-10.0-1.0-10-0.8-ss-1-2.gif\" width=\"450\" height=\"450\" /\u003e\n\nRun a MuJoCo environment:\n```julia\nsimulate_envpool_env(\n    \"HalfCheetah-v4\";\n    frame_skip = 5,\n    num_trials = 2,\n    policy_type = :cemppi,\n    num_steps = 50,\n    num_samples = 100,\n    ais_its = 5,\n    λ = 1.0,\n    ce_Σ_est = :ss,\n    seed = 1,\n    output_acts_file = true,\n)\n```\nThe output should be something similar to:\n```\nEnv Name:                     HalfCheetah-v4\nNum Trails:                   2\nNum Steps:                    50\nPolicy Type:                  cemppi\nNum samples                   100\nHorizon                       50\nλ (inverse temp):             1.00\nα (control cost param):       1.00\n# AIS Iterations:             5\nCE Elite Threshold:           0.80\nCE Σ Est Method:              ss\nU₀                            [0.0000, ..., 0.0000]\nΣ                             0.2500 0.2500 0.2500 0.2500 0.2500 0.2500 \nSeed:                         1\n\nTrial    #:       Reward :   Steps:  Reward/Step : Ex Time\nTrial    1:       115.46 :      50:         2.31 :   19.55\nTrial    2:       126.08 :      50:         2.52 :   19.53\n-----------------------------------\nTrials AVE:       120.77 :   50.00:         2.42 :   19.54\nTrials STD:         7.51 :    0.00:         0.15 :    0.02\nTrials MED:       120.77 :   50.00:         2.42 :   19.54\nTrials L95:       115.46 :   50.00:         2.31 :   19.53\nTrials U95:       126.08 :   50.00:         2.52 :   19.55\nTrials MIN:       115.46 :   50.00:         2.31 :   19.53\nTrials MAX:       126.08 :   50.00:         2.52 :   19.55\n```\n\nThe `output_acts_file` option, outputs a csv with the actions for the given environment. If you have the required python libraries installed (i.e. gym, numpy, imageio, and argparse), you can use the provided python script to generate a gif. By default, the `simulate_envpool_env` function outputs the action csv into the `./acts` directory. The parameters to `make_mujoco_gif.py` are\n - `-env`: environment name (e.g. 'Ant-v4')\n - `-af`: action csv file\n - `-o`: output gif file name without the extension (e.g. 'output_fname')\n\nUsing one of the above action files:\n```\npython ./src/envs/make_mujoco_gif.py -env HalfCheetah-v4 -af ./acts/HalfCheetah-v4_5_cemppi_50_2_1_50_1.0_1.0_0.0_0.25_100_5_0.8_sstrial-2.csv -o HalfCheetah-v4_output_gif\n```\n\u003cimg src=\"https://github.com/sisl/MPOPIS/blob/main/gifs/HalfCheetah-v4_output_gif.gif\" width=\"450\" height=\"450\" /\u003e\n\n\n# Citation\n```\n@inproceedings{Asmar2023},\ntitle = {Model Predictive Optimized Path Integral Strategies},\nauthor = {Dylan M. Asmar and Rasalu Senanayake and Shawn Manuel and Mykel J. Kochenderfer},\nbooktitle = {IEEE International Conference on Robotics and Automation (ICRA)},\nyear = {2023}\n```\n\n# Questions\n### In the paper, during the control cost computation (Algorithm 1, line 9) the noise is sampled from Σ′, but the given Σ is utilized for the inversion, is this a typo?\nThe control cost is computed using the covariance matrix Σ. As we adjust our distribution, we calculate the control cost based on the original distribution and account for the appropriate change in control amount as we change our proposal distribution.\n\n### The algorithm does not use the updated covariance from one MPC iteration to the next, why is this the case?\nUsing the updated covariance matrix in subsequent iterations of the MPC algorithm could result in faster convergence and during the next AIS iterations. However, it would likely decrease robustness (without robustness considerations in the AIS step). We considered showing results that used the updated covariance matrix, but wanted to focus on the core contributions of the paper and left that for future work.\n\n### In the algorithm, the trajectory costs utilized to update the control parameters are not centered or normalized, is this intentional? \nThis was intentional to align with previous versions of MPPI in the literature. There are other approaches that adjust the costs for numerical stability and to ease tuning across different environments. We do not anticipate a major change in performance if a step to adjust the costs is added. \n\n### How does this compare to a version where MPPI is allowed to run a few iterations before using the control output? This approach is similar to previous work[^8] and other versions of MPC approaches.\nAn iterative version of MPPI is similar the approach we take in the paper. The main differences are the decoupling of the inverse temperature parameter and the ability to sample from a joint distribution versus separate distributions at each control step. The performance of μ-AIS is similar to the iterative version and outperformed a pure iterative MPPI version in our experiments.\n\n\n# References\n\n[^1]: G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou. Information theoretic MPC for model-based reinforcement learning. Proceedings - IEEE International Conference on Robotics and Automation, 2017.\n[^2]: G. R. Williams. Model predictive path integral control: Theoretical foundations and applications to autonomous driving. PhD thesis, Georgia Institute of Technology, 2019.\n[^3]: O. Capp´e, A. Guillin, J. M. Marin, and C. P. Robert. Population Monte Carlo. Journal of Computational and Graphical Statistics, 13:907–929, 2004.\n[^4]: O. Capp´e, R. Douc, A. Guillin, J. M. Marin, and C. P. Robert. Adaptive importance sampling in general mixture classes. Statistics and Computing, 18, 2008. \n[^5]: M. J. Kochenderfer and T. A. Wheeler. Algorithms for Optimization. MIT Press, 2019.\n[^6]: R. Y. Rubinstein and D. P. Kroese. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Vol. 133. New York: Springer, 2004.\n[^7]: Y. El-Laham, V. Elvira, and M. F. Bugallo. Robust covariance adaptation in adaptive importance sampling. IEEE Signal Processing Letters, 25, 2018. \n[^8]: J. Pravitra, E. A. Theodorou, and E. N. Johnson, “Flying complex maneuvers with model predictive path integral control,” in AIAA SciTech Forum, 2021.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsisl%2Fmpopis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsisl%2Fmpopis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsisl%2Fmpopis/lists"}