{"id":19401001,"url":"https://github.com/google-research/ibc","last_synced_at":"2025-04-05T21:11:23.314Z","repository":{"id":38294319,"uuid":"406512045","full_name":"google-research/ibc","owner":"google-research","description":"Official implementation of Implicit Behavioral Cloning, as described in our CoRL 2021 paper, see more at https://implicitbc.github.io/","archived":false,"fork":false,"pushed_at":"2024-01-25T16:50:18.000Z","size":29828,"stargazers_count":340,"open_issues_count":15,"forks_count":34,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-29T20:09:11.866Z","etag":null,"topics":["deep-learning","imitation-learning","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-14T20:27:54.000Z","updated_at":"2025-03-26T03:31:04.000Z","dependencies_parsed_at":"2024-01-25T18:05:07.962Z","dependency_job_id":null,"html_url":"https://github.com/google-research/ibc","commit_stats":{"total_commits":24,"total_committers":9,"mean_commits":"2.6666666666666665","dds":0.75,"last_synced_commit":"db89ddbb852603fe9b64cf0f502b1fd3d6037d33"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fibc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fibc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fibc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Fibc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/ibc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399885,"owners_count":20932880,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","imitation-learning","reinforcement-learning"],"created_at":"2024-11-10T11:16:35.169Z","updated_at":"2025-04-05T21:11:23.293Z","avatar_url":"https://github.com/google-research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Implicit Behavioral Cloning\n\nThis codebase contains the official implementation of the *Implicit Behavioral Cloning (IBC)* algorithm from our paper:\n\n\n\n**Implicit Behavioral Cloning [(website link)](https://implicitbc.github.io/)  [(arXiv link)](https://arxiv.org/abs/2109.00137)** \u003c/br\u003e\n*Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson* \u003c/br\u003e\nConference on Robot Learning (CoRL) 2021\n\n![](./docs/insert.gif)  |  ![](./docs/sort.gif)\n:-------------------------:|:-------------------------:|\n\n\u003cimg src=\"docs/energy_pop_teaser.png\"/\u003e\n\n## Abstract\n\nWe find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.\n\n## Prerequisites\n\nThe code for this project uses python 3.7+ and the following pip packages:\n\n```bash\npython3 -m pip install --upgrade pip\npip install \\\n  absl-py==0.12.0 \\\n  gin-config==0.4.0 \\\n  matplotlib==3.4.3 \\\n  mediapy==1.0.3 \\\n  opencv-python==4.5.3.56 \\\n  pybullet==3.1.6 \\\n  scipy==1.7.1 \\\n  tensorflow==2.6.0 \\\n  keras==2.6.0 \\\n  tf-agents==0.11.0rc0 \\\n  tqdm==4.62.2\n```\n\n(Optional): For Mujoco support, see [`docs/mujoco_setup.md`](docs/mujoco_setup.md).  Recommended to skip it\nunless you specifically want to run the Adroit and Kitchen environments.\n\n## Quickstart: from 0 to a trained IBC policy in 10 minutes.\n\n**Step 1**: Install listed Python packages above in  [Prerequisites](#Prequisites).\n\n**Step 2**: Run unit tests (should take less than a minute), and do this from the directory *just above the top-level `ibc` directory*:\n\n```bash\n./ibc/run_tests.sh\n```\n\n**Step 3**: Check that Tensorflow has GPU access:\n\n```bash\npython3 -c \"import tensorflow as tf; print(tf.test.is_gpu_available())\"\n```\n\nIf the above prints `False`, see the following requirements, notably CUDA 11.2 and cuDNN 8.1.0: https://www.tensorflow.org/install/gpu#software_requirements.\n\n**Step 4**: Let's do an example Block Pushing task, so first let's **download oracle data** (or see [Tasks](#tasks) for how to generate it):\n\n```bash\ncd ibc/data\nwget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip\nunzip block_push_states_location.zip \u0026\u0026 rm block_push_states_location.zip\ncd ../..\n```\n\n**Step 5**: Set PYTHONPATH to include the directory *just above top-level `ibc`*, so if you've been following the commands above it is:\n\n```bash\nexport PYTHONPATH=$PYTHONPATH:${PWD}\n```\n\n**Step 6**: On that example Block Pushing task, we'll next do a **training + evaluation** with Implicit BC:\n\n```bash\n./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh\n```\n\n*Some notes*:\n\n- On an example single-GPU machine (GTX 2080 Ti), the above trains at about 18 steps/sec, and should get to high success rates in 5,000 or 10,000 steps (roughly 5-10 minutes of training).\n- The `mlp_ebm.gin` is just one config, which is meant to be reasonably fast to train, with only 20 evals at each interval, and is not suitable for all tasks.  See [Tasks](#tasks) for more configs.\n- Due to the `--video` flag above, you can watch a video of the learned policy in action at: `/tmp/ibc_logs/mlp_ebm/ibc_dfo/`... navigate to the `videos/ttl=7d` subfolder, and by default there should be one example `.mp4` video saved every time you do an evaluation interval.\n\n**(Optional) Step 7**: For the pybullet-based tasks, we also have real-time interactive visualization set up through a visualization server, so in one terminal:\n\n```bash\ncd \u003cpath_to\u003e/ibc/..\nexport PYTHONPATH=$PYTHONPATH:${PWD}\npython3 -m pybullet_utils.runServer\n```\n\nAnd in a different terminal run the oracle a few times with the `--shared_memory` flag:\n\n```bash\ncd \u003cpath_to\u003e/ibc/..\nexport PYTHONPATH=$PYTHONPATH:${PWD}\npython3 ibc/data/policy_eval.py -- \\\n  --alsologtostderr \\\n  --shared_memory \\\n  --num_episodes=3 \\\n  --policy=oracle_push \\\n  --task=PUSH\n```\n\n**You're done with Quickstart!**  See below for more [Tasks](#tasks), and also see [`docs/codebase_overview.md`](docs/codebase_overview.md) and [`docs/workflow.md`](docs/workflow.md) for additional info.\n\n\n\n\n## Tasks\n\n### Task: Particle\n\nIn this task, the goal is for the agent (black dot) to first go to the green dot, then the blue dot.\n\nExample IBC policy  | Example MSE policy\n:-------------------------:|:-------------------------:\n![](./docs/particle_langevin_10000.gif)  |  ![](./docs/particle_mse_10000.gif) |\n\n#### Get Data\n\nWe can either generate data from scratch, for example for 2D (takes 15 seconds):\n\n```bash\n./ibc/ibc/configs/particle/collect_data.sh\n```\n\nOr just download all the data for all different dimensions: \u003ca name=\"particle-data\"\u003e\u003c/a\u003e\n\n```bash\ncd ibc/data/\nwget https://storage.googleapis.com/brain-reach-public/ibc_data/particle.zip\nunzip particle.zip \u0026\u0026 rm particle.zip\ncd ../..\n```\n\n#### Train and Evaluate\n\nLet's start with some small networks, on just the 2D version since it's easiest to visualize, and compare MSE and IBC.  Here's a small-network (256x2) IBC-with-Langevin config, where `2` is the argument for the environment dimensionality.\n\n\u003c!--  partial verified: 96% success, 10k steps, 50 episodes evaluated, 13.3 steps/sec  --\u003e\n```bash\n./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 2\n```\n\nAnd here's an idenitcally sized network (256x2) but with MSE config:\n\n\u003c!--  partial verified: 5% success, 10k steps, 20 episodes evaluated, 21.7 steps/sec  --\u003e\n```bash\n./ibc/ibc/configs/particle/run_mlp_mse.sh 2\n```\n\nFor the above configurations, we suggest comparing the rollout videos, which you can find at `/tmp/ibc_logs/...corresponding_directory../videos/`. At the top of this section is shown a comparison at 10,000 training steps for the two different above configs.\n\n\nAnd here are the **best configs** respectfully for **IBC** (with langevin) and **MSE**, in this case run on the 16-dimensional environment: \u003ca name=\"particle-train\"\u003e\u003c/a\u003e\n\n```\n./ibc/ibc/configs/particle/run_mlp_ebm_langevin_best.sh 16\n./ibc/ibc/configs/particle/run_mlp_mse_best.sh 16\n```\n\nNote: the *`_best`* config is kind of slow for Langevin to train, but even just `./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 16` (smaller network) seems to solve the 16-D environment pretty well, and is much faster to train.\n\n\n\n### Task: Block Pushing (from state observations)\n\n#### Get Data\n\nWe can either generate data from scratch (~2 minutes for 2,000 episodes: 200 each across 10 replicas):\n\n```bash\n./ibc/ibc/configs/pushing_states/collect_data.sh\n```\n\nOr we can download data from the web:\u003ca name=\"pushing-states-data\"\u003e\u003c/a\u003e\n\n```bash\ncd ibc/data/\nwget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip\nunzip 'block_push_states_location.zip' \u0026\u0026 rm block_push_states_location.zip\ncd ../..\n```\n\n#### Train and Evaluate\n\nHere's reasonably fast-to-train config for *IBC with DFO*:\n\n\u003c!--  partial verified: 100% in 10k steps, 18 steps/sec --\u003e\n```bash\n./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh\n```\n\nOr here's a config for *IBC with Langevin*:\n\n\u003c!--  partial verified: 95% in 5k steps, 6.5 steps/sec --\u003e\n```bash\n./ibc/ibc/configs/pushing_states/run_mlp_ebm_langevin.sh\n```\n\nOr here's a comparable, reasonably fast-to-train config for *MSE*:\n\n\u003c!--  partial verified: 85% in 10k steps, 18 steps/sec --\u003e\n```bash\n./ibc/ibc/configs/pushing_states/run_mlp_mse.sh\n```\n\nOr to run the **best configs** respectfully **for IBC, MSE, and MDN** (some of these might be slower to train than the above): \u003ca name=\"pushing-states-train\"\u003e\u003c/a\u003e\n\n\u003c!--  partial verified: 100% at 15k steps, 18 steps/sec --\u003e\n\u003c!--  partial verified: 87% at 15k steps, 18 steps/sec --\u003e\n\u003c!--  partial verified: 75% at 5k steps, 18 steps/sec --\u003e\n```bash\n./ibc/ibc/configs/pushing_states/run_mlp_ebm_best.sh\n./ibc/ibc/configs/pushing_states/run_mlp_mse_best.sh\n./ibc/ibc/configs/pushing_states/run_mlp_mdn_best.sh\n```\n\n### Task: Block Pushing (from image observations)\n\n#### Get Data\n\nDownload data from the web: \u003ca name=\"pushing-pixels-data\"\u003e\u003c/a\u003e\n\n```bash\ncd ibc/data/\nwget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_visual_location.zip\nunzip 'block_push_visual_location.zip' \u0026\u0026 rm block_push_visual_location.zip\ncd ../..\n```\n\n#### Train and Evaluate\n\nHere is an *IBC with Langevin* configuration which should actually converge faster than the IBC-with-DFO that we reported in the paper:\n\n\u003c!--  partial verified: 100% at 10k steps, 6.5 steps/sec, at 90x120 w/ 128 batch--\u003e\n\u003c!--  partial verified: 100% at 5k steps, 4.1 steps/sec, at 180x240 w/ 128 batch--\u003e\n```bash\n./ibc/ibc/configs/pushing_pixels/run_pixel_ebm_langevin.sh\n```\n\nAnd here are the **best configs** respectfully for **IBC** (with DFO), **MSE**, and **MDN**: \u003ca name=\"pushing-pixels-train\"\u003e\u003c/a\u003e\n\n\u003c!-- partial verified: 94% at 10k steps, 8.0 steps/sec, 180x240 w/ 128 batch--\u003e\n\u003c!-- partial verified: 68% at 10k steps, 9.0 steps/sec, 180x240 w/ 128 batch --\u003e\n\u003c!-- partial verified: 94% at 15k steps, 9.0 steps/sec, 90x120 w/ 128 batch --\u003e\n```bash\n./ibc/ibc/configs/pushing_pixels/run_pixel_ebm_best.sh\n./ibc/ibc/configs/pushing_pixels/run_pixel_mse_best.sh\n./ibc/ibc/configs/pushing_pixels/run_pixel_mdn_best.sh\n```\n\n\n### Task: D4RL Adroit and Kitchen\n\n#### Get Data\n\nThe D4RL human demonstration training data used for the paper submission can be downloaded using the commands below.  This data has been processed into a `.tfrecord` format from the original D4RL data format: \u003ca name=\"d4rl-data\"\u003e\u003c/a\u003e\n\n```bash\ncd ibc/data \u0026\u0026 mkdir -p d4rl_trajectories \u0026\u0026 cd d4rl_trajectories\nwget https://storage.googleapis.com/brain-reach-public/ibc_data/door-human-v0.zip \\\n     https://storage.googleapis.com/brain-reach-public/ibc_data/hammer-human-v0.zip \\\n     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-complete-v0.zip \\\n     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-mixed-v0.zip \\\n     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-partial-v0.zip \\\n     https://storage.googleapis.com/brain-reach-public/ibc_data/pen-human-v0.zip \\\n     https://storage.googleapis.com/brain-reach-public/ibc_data/relocate-human-v0.zip\nunzip '*.zip' \u0026\u0026 rm *.zip\ncd ../../..\n```\n\n### Run Train Eval:\n\n\nHere are the **best configs** respectfully for **IBC** (with Langevin), and **MSE**: \u003ca name=\"d4rl-train\"\u003e\u003c/a\u003e\nOn a 2080 Ti GPU test, this IBC config trains at only 1.7 steps/sec, but it is about 10x faster on TPUv3.\n\n\n\u003c!--  partial verified: 2704.5 avg return on pen, 10k steps, 100 episodes evaluated, 1.7 steps/sec  --\u003e\n\u003c!--  partial verified: 1660.4 avg return on pen, 10k steps, 100 episodes evaluated, 25.5 steps/sec  --\u003e\n\n```bash\n./ibc/ibc/configs/d4rl/run_mlp_ebm_langevin_best.sh pen-human-v0\n./ibc/ibc/configs/d4rl/run_mlp_mse_best.sh pen-human-v0\n```\n\nThe above commands will run on the `pen-human-v0` environment, but you can swap this arg for whichever of the provided Adroit/Kitchen environments.\n\nHere also is an MDN config you can try.  The network size is tiny but if you increase it heavily then it seems to get NaNs during training. In general MDNs can be finicky.  A solution should be possible though.\n\n```\n./ibc/ibc/configs/d4rl/run_mlp_mdn.sh pen-human-v0\n```\n\n## Summary for Reproducing Results\n\nFor the tasks that we've been able to open-source, results from the paper should be reproducible by using the linked data and command-line args below.\n\n| Task  | Figure/Table in paper | Data | Train + Eval commands |\n| --- | --- | --- | --- |\n| Coordinate regression  | Figure 4  | See colab | See colab |\n| D4RL Adroit + Kitchen  | Table 2 | [Link](#d4rl-data) | [Link](#d4rl-train) |\n| N-D particle  | Figure 6 | [Link](#particle-data) | [Link](#particle-train) |\n| Simulated pushing, single target, states  | Table 3 | [Link](#pushing-states-data) | [Link](#pushing-states-train) |\n| Simulated pushing, single target, pixels | Table 3 | [Link](#pushing-pixels-data) | [Link](#pushing-pixels-train) |\n\n\n## Citation\n\nIf you found our paper/code useful in your research, please consider citing:\n\n```\n@article{florence2021implicit,\n    title={Implicit Behavioral Cloning},\n    author={Florence, Pete and Lynch, Corey and Zeng, Andy and Ramirez, Oscar and Wahid, Ayzaan and Downs, Laura and Wong, Adrian and Lee, Johnny and Mordatch, Igor and Tompson, Jonathan},\n    journal={Conference on Robot Learning (CoRL)},\n    month = {November},\n    year={2021}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fibc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2Fibc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2Fibc/lists"}