{"id":17606407,"url":"https://github.com/atcold/pytorch-ppuu","last_synced_at":"2025-04-09T13:09:44.616Z","repository":{"id":43026339,"uuid":"122864667","full_name":"Atcold/pytorch-PPUU","owner":"Atcold","description":"Code for Prediction and Planning Under Uncertainty (PPUU)","archived":false,"fork":false,"pushed_at":"2022-10-14T03:29:00.000Z","size":106392,"stargazers_count":209,"open_issues_count":6,"forks_count":55,"subscribers_count":24,"default_branch":"master","last_synced_at":"2025-04-02T12:07:03.137Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://youtu.be/X2s7gy3wIYw","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Atcold.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-02-25T18:41:55.000Z","updated_at":"2025-03-20T09:12:59.000Z","dependencies_parsed_at":"2023-01-20T01:31:42.780Z","dependency_job_id":null,"html_url":"https://github.com/Atcold/pytorch-PPUU","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Atcold%2Fpytorch-PPUU","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Atcold%2Fpytorch-PPUU/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Atcold%2Fpytorch-PPUU/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Atcold%2Fpytorch-PPUU/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Atcold","download_url":"https://codeload.github.com/Atcold/pytorch-PPUU/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248045245,"owners_count":21038554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-22T15:44:25.162Z","updated_at":"2025-04-09T13:09:44.597Z","avatar_url":"https://github.com/Atcold.png","language":"Jupyter Notebook","readme":"# Prediction and Policy-learning Under Uncertainty (PPUU)\n[Gitter chatroom](http://gitter.im/PPUU), [video summary](http://youtu.be/X2s7gy3wIYw), [slides](http://bit.ly/PPUU-slides), [poster](http://bit.ly/PPUU-poster), [website](http://bit.ly/PPUU-web).  \nImplementing [Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic](http://bit.ly/PPUU-article) in [PyTorch](https://pytorch.org).\n\n![planning](doc/planning.png)\n\nThe objective is to train an *agent* (pink brain drawing) who's going to plan its own trajectory in a densely (stochastic) traffic highway.\nTo do so, it minimises a few costs over trajectories unrolled while interacting with a *world model* (blue world drawing).\nWe need to start, then, by training the *world model* with observational data from the real world (Earth's photo), which needs to be downloaded from the Internet.\n\n## Getting the real data\n\nTo get started, you need to fetch the real world data.\nGo to this [address](http://bit.ly/PPUU-data), and download the TGZ file (330 MB) on your machine.\nOpen a terminal, go to the location where you've downloaded the file, and type:\n\n```bash\ntar xf xy-trajectories.tgz\n```\n\nThis will expand the NGSIM (Next Generation Simulation) data set compressed archive, consisting of all cars trajectories for the 4 available maps (now 1.6 GB).\nIts content is the following:\n\n```bash\nxy-trajectories\n├── i80\n│   ├── trajectories-0400-0415.txt\n│   ├── trajectories-0500-0515.txt\n│   ├── trajectories-0515-0530.txt\n│   └── trajectory-data-dictionary.htm\n├── lanker\n│   ├── trajectories-0830am-0845am.txt\n│   ├── trajectories-0845am-0900am.txt\n│   └── trajectory-data-dictionary.htm\n├── peach\n│   ├── trajectories-0400pm-0415pm.txt\n│   ├── trajectories-1245pm-0100pm.txt\n│   └── trajectory-data-dictionary.htm\n└── us101\n    ├── trajectories-0750am-0805am.txt\n    ├── trajectories-0805am-0820am.txt\n    ├── trajectories-0820am-0835am.txt\n    └── trajectory-data-dictionary.htm\n\n4 directories, 14 files\n```\n\nFinally, move the `xy-trajectories` directory inside a folder named `traffic-data`.\n\n## Setting up the environment\n\nIn this section we will fetch the repo, install the dependencies, and view the data we just downloaded, so that we can see if everything runs fine.\nSo, open up your terminal, and type:\n\n```bash\ngit clone git@github.com:Atcold/pytorch-PPUU.git\n# or with the https protocol\n# git clone https://github.com/Atcold/pytorch-PPUU\n```\n\nNow move (or symlink) the `traffic-data` folder inside the repo:\n\n```bash\ncd pytorch-PPUU\nmv \u003ctraffic-data_folder_path\u003e .\n# or\n# ln -s \u003ctraffic-data_folder_path\u003e\n```\n\nNow install the `PPUU` environment (this expects you have `conda` on your system, go [here](https://conda.io/docs/user-guide/install/) if this is not the case):\n\n```bash\nconda env create -f environment.yaml\n#\n# To activate this environment, use:\n# \u003e source activate PPUU\n#\n# To deactivate an active environment, use:\n# \u003e source deactivate\n#\n```\n\nAs prescribed, activate it by typing:\n\n```bash\nsource activate PPUU  # or\nconda activate PPUU\n```\n\nFinally, have a look at the four maps available in the NGSIM data set, namely: *I-80*, *US-101*, *Lankershim*, and *Peachtree*.\nThere is a \"bonus\" map, called *AI*, where I've hard coded a policy for the vehicles, which are using a PID controller.\nType the following command:\n\n```bash\npython play_maps.py -map \u003cmap\u003e\n# where \u003cmap\u003e can be one of {i80,us101,peach,lanker,ai}\n# add -h to see the full list of options available\n```\n\nThe frame rate should be greater than 20 Hz.\nOften it will be larger than 60 Hz.\nTo be noted, here the vehicles are performing the actions extracted from the trajectories, and not simply following the original spatial coordinates.\n\n## Dumping the \"state, action, cost\" triple\n\nIn order to train both the *world* and *agent models*, we need to create the observations, starting from the NGSIM trajectories and the simulator.\nThis can be done with the following command:\n\n```bash\nfor t in 0 1 2; do python generate_trajectories.py -map i80 -time_slot $t; done\n# to dump the triple for the i80 map, otherwise replace i80 with the map you want\n```\n\nUpon the script termination, we will find a folder named `state-action-cost` within our `traffic-data`.\nThe content of the latter is now the following:\n\n```bash\ntraffic-data/\n├── state-action-cost\n│   └── data_i80_v0\n│       ├── trajectories-0400-0415\n│       │   ├── car1.pkl\n│       │   └── ...\n│       ├── trajectories-0500-0515\n│       │   └── ...\n│       └── trajectories-0515-0530\n│           └── ...\n└── xy-trajectories\n    └── ...\n```\n\n\u003e ### Additional info\n\u003e Each pickled vehicle observation is stored as `car{idx}.pkl`.\n\u003e Its content is a `dict` which includes the items and corresponding sizes (shapes):\n\u003e\n\u003e ```python\n\u003e images               (309, 3, 117, 24)\n\u003e actions              (309, 2)\n\u003e lane_cost            (309,)\n\u003e pixel_proximity_cost (309,)\n\u003e states               (309, 7, 4)\n\u003e frames               (309,)\n\u003e ```\n\u003e For example, this vehicle was alive for 309 frames (time steps).\n\u003e The `images` represent the occupancy grid, which is as large as 4 lanes width (24 pixels, here).\n\u003e\n\u003e  - The *R* channel represents the lane markings.\n\u003e  - The *G* channel encodes the position and shape of the neighbouring vehicles.\n\u003e  - The *B* channel depits our own vehicle.\n\u003e\n\u003e The `actions` is a collection of 2D vectors, encoding the positive and negative acceleration in both *x* and *y* directions.\n\u003e The `lane_cost` and `pixel_proximity_cost` are the task specific costs (see [slides](http://bit.ly/alf-PPUU) for details).\n\u003e The `states` encode position and velocity of the current vehicle and the most closest 6 ones: left/current/right lanes, front/back.\n\u003e Finally, `frames` tells us the snapshot time stamp, so that we can go back to the simulator, and inspect strange situations present in the observations.\n\nFinally (this will likely be automated soon, and made avaiable for every map), extract the car sizes for the *I-80* map with:\n\n```python\npython extract_car_size.py\n```\n\n## Training the world model\n\nAs we have stated above, we need to start by learning how the real world evolve.\nTo do so, we train a neural net, which tries to predict what happens next, given that we start in a given *state*, and a specific *action* is performed.\nMore precisely, we are going to train an *action conditional variational predictive net*, which resembles much a variational autoencoder (VAE) that has three inputs (concatenated sequence of `states`, `images`, `action`) and its output is set to be the next item in the sequence (`states`, `images`).  \n\nIn the code, the world model is shortened as `fm`, which stands for *forward dynamics model*.\nSo, let's train the forward dynamics model (`fm`) on the observational dataset.\nThis can be done by running:\n\n```bash\npython train_fm.py -model_dir \u003cfm_save_path\u003e\n```\n\n## Training the cost model\nAlong with the dynamics model, we have a separate model to predict the costs of state and action pairs, which can be trained by running:\n\n```bash\n python train_cost.py\n ```\n\n## Training the agent\n\n![agent training](doc/agent_train.png)\n\n![uncertainty computation](doc/uncertainty.png)\n\nOnce the dynamics model is trained, it can be used to train the policy network, using *MPUR*, *MPER*, or *IL*.\nThese corresponds to:\n\n- *MPUR*: Model-based Policy learning with Uncertainty Regularisation (shown in the figure above)\n- *MPER*: Model-based Policy learning with Expert Regularisation (model-based IL)\n- *IL*: Imitation Learning (copying the expert actions given the past observations)\n\nThis is done by running:\n\n```bash\npython train_{MPUR,MPER,IL}.py -model_dir \u003cfm_load_path\u003e -mfile \u003cfm_filename\u003e\n```\n\n## Evaluating the agent\n\nTo evaluate a trained policy, run the script `eval_policy.py` in one of the three following modes.\nType `-h` to see other options and details.\n\n```bash\npython eval_policy.py -model_dir \u003cload_path\u003e -policy_model \u003cpolicy_filename\u003e -method policy-{MPUR,MPER,IL}\n```\n\nYou can also specify `-method bprop` to perform \"brute force\" planning, which will be computationally expensive.\n\n### Parallel evaluation\nEvaluation happens in parallel. By default, evaluator script uses min(10, #cores_available) processes. It doesn't go above 10 because then it hits GPU memory limits.\nTo change the number of processes, you can pass `-num-processes` argument to `eval_policy.py` script. Also, for this to work, you need to request cpu cores using `--cpus-per-task=X` argument for slurm.\nThe slurm limits cpu usage to 64 cores per user, and gpus to 18 per user, therefore 3 is a reasonable limit to enable us to use all the gpus without hitting the gpu limit when running multiple evaluations. The CPU limit can be extended, but you need to email the IT helpdesk.\n\n## Pre-trained models\n\n[Here](https://drive.google.com/file/d/1XahspfgFlBVF6ne479LCJgBr0luZGQt7/) you can download the predictive model and the policy we've trained on our servers (they are bundled together in the `model` field of this *Python* dictionary). The agent achieves 82.0% of success rate.  \n[Here](https://drive.google.com/file/d/1di7hGnyzUiCADfxOhq6zGnRX0AwhEdLo/), instead, you can download only the predictive models (one for the state and one for the cost), and try to train the policy by your own.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatcold%2Fpytorch-ppuu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fatcold%2Fpytorch-ppuu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fatcold%2Fpytorch-ppuu/lists"}