{"id":43516422,"url":"https://github.com/dunnolab/laom","last_synced_at":"2026-02-03T13:41:09.977Z","repository":{"id":295075023,"uuid":"988381199","full_name":"dunnolab/laom","owner":"dunnolab","description":"Official implementation for \"Latent Action Learning Requires Supervision in the Presence of Distractors\", ICML 2025","archived":false,"fork":false,"pushed_at":"2025-05-23T13:27:28.000Z","size":21971,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-23T15:02:23.949Z","etag":null,"topics":["icml","imitation-learning","latent-action-learning","learning-from-observation","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dunnolab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-22T13:14:52.000Z","updated_at":"2025-05-23T13:27:31.000Z","dependencies_parsed_at":"2025-05-23T15:12:37.284Z","dependency_job_id":null,"html_url":"https://github.com/dunnolab/laom","commit_stats":null,"previous_names":["dunnolab/laom"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dunnolab/laom","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunnolab%2Flaom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunnolab%2Flaom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunnolab%2Flaom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunnolab%2Flaom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dunnolab","download_url":"https://codeload.github.com/dunnolab/laom/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dunnolab%2Flaom/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29046683,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-03T10:09:22.136Z","status":"ssl_error","status_checked_at":"2026-02-03T10:09:16.814Z","response_time":96,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["icml","imitation-learning","latent-action-learning","learning-from-observation","reinforcement-learning"],"created_at":"2026-02-03T13:41:09.314Z","updated_at":"2026-02-03T13:41:09.945Z","avatar_url":"https://github.com/dunnolab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Latent Action Learning Requires Supervision in the Presence of Distractors\n\n[[Project]](https://laom.dunnolab.ai/)\n[[Paper]](https://arxiv.org/abs/2502.00379)\n[[Twitter]](https://x.com/how_uhh/status/1927487077345841576)\n\nOfficial implementation of the [**Latent Action Learning Requires Supervision in the Presence of Distractors**](https://arxiv.org/abs/2502.00379). Through empirical investigation, we demonstrate that supervision is necessary for good performance in latent action learning, highlighting a major limitation of current methods.\n\n\u003cimg src=\"images/final_result_comb.jpg\" alt=\"Environments\" width=\"1000\"\u003e\n\n## Setup instructions\n\nTo set up python environment (with dev-tools of your taste, in our workflow we used conda and python 3.11), just install all the requirements:\n```bash\npip install -r requirements.txt\n```\nFor convienece we also provide the Dockerfile used in the experiments.\n\n### Distracting Control Suite\n\nWe use slightly modified version of the original [Distracting Control Suite](https://arxiv.org/pdf/2101.02722), which we provide in the `src/dcs` for reproducibility. We changed difficulties and removed tensorflow from the dependencies, rewriting neccessary parts with numpy or PIL.  \n\nYou also need to get the DAVIS dataset, which is used for distracting backgrounds in the original DCS and in our work. We refer to the instructions in the [original repo](https://github.com/google-research/google-research/tree/master/distracting_control). You can put it wherever you like – all our scripts just need a path to it.\n\n## Data \n\n\u003cimg src=\"images/envs-vis.png\" alt=\"Environments\" width=\"600\"\u003e\n\n### Downloading\n\nWe provide the exact datasets we used for the experiments. Each dataset is around 60GB (without labeled in the name) and consits of 5k trajectories, 1000 steps each (so 5M transitions in total). All datasets combined (for four envs, with and without distractors, and for ablations) are around 1.1TB, so be carefull. The links for datasets downloading from our s3 bucket are in the `data-links.txt`.\n\nWe provide small sample in the `data/example-data.hdf5` for convienece, just to demonstrate the format.\n\n### Collecting from scratch\n\nWe provide scripts (and checkpoints) used for datasets collection in `scripts/data_collection`. \n\nWe pre-trained expert policies with PPO for `cheetah-run`, `walker-run` and `hopper-hop`. PPO was adapted from beautiful [CleanRL](https://github.com/vwxyzjn/cleanrl) library. See `scripts/data_collection/collection/cleanrl_ppo.py`. We used almost default hyperparameters and trained for 1_000_000_000 transitions. Example wandb runs: [cheetah-run logs](https://wandb.ai/state-machine/lapo/runs/2a1dfdha), [walker-run logs](https://wandb.ai/state-machine/lapo/runs/t6xlpt7v), [hopper-hop logs](https://wandb.ai/state-machine/lapo/runs/6ejbglhv). You can find the exact hyperparameters in Overview-\u003eConfig. Unfortunately, we were unable to get satisfactory performance on `humanoid-walk` with PPO, so instead we used SAC from [stable-baselines3](https://github.com/DLR-RM/stable-baselines3) library with default hyperparameters and trained for 2_000_000 transitions. See `scripts/data_collection/sb3_sac.py`. \n\nAll experts were pre-trained with proprioceptive observations. We render images only during data collection. Returnes of the experts we provide (in checkpoints and the datasets):\n\n| Dataset        | Average Return |\n|----------------|----------------|\n| cheetah-run    | 837.70         |\n| walker-run     | 739.79         |\n| hopper-hop     | 306.63         |\n| humanoid-walk  | 617.22         |\n\nWith checkpoints available, data in required format can be collected with following scripts:\n```bash\n# example for ppo checkpoints\n# use dcs_difficulty=vanilla to collectd data without distractors\npython -m scripts.data_collection.collect_data \\\n    --checkpoint_path=\"scripts/data_collection/checkpoints/hopper-hop-expert\" \\\n    --checkpoint_name=\"checkpoint.pt\" \\\n    --dcs_backgrounds_path=\"DAVIS/JPEGImages/480p\" \\\n    --save_path=\"data/hopper-hop-test.hdf5\" \\\n    --num_trajectories=5 \\\n    --dcs_difficulty=\"scale_easy_video_hard\" \\  \n    --dcs_backgrounds_split=\"train\" \\\n    --dcs_img_hw=64 \\\n    --seed=0 \\\n    --cuda=False\n\n# example for sac checkpoints\npython -m scripts.data_collection.collect_data_sb3 \\\n    --checkpoint_path=\"scripts/data_collection/checkpoints/sac-humanoid-walk\" \\\n    --dcs_backgrounds_path=\"DAVIS/JPEGImages/480p\" \\\n    --save_path=\"data/humanoid-walk-test.hdf5\" \\\n    --num_trajectories=10 \\\n    --dcs_difficulty=\"scale_easy_video_hard\" \\\n    --dcs_backgrounds_split=\"train\" \\\n    --dcs_img_hw=64 \\\n    --seed=0 \\\n    --cuda=False\n```\n\nTo simulate access to small datasets with ground-truth action labels we used simple script which samples trajectories from full dataset, see `scripts/sample_labeled_data.py`. However, you can also collect these with scripts for collection above, there is no real difference.\n```bash\npython -m scripts.sample_labeled_data \\\n    --data_path=\"path/to/full/dataset\" \\\n    --save_path=\"path/to/full/dataset-labeled-1000x$num_traj.hdf5\" \\\n    --chunk_size=1000 \\\n    --num_trajectories=$num_traj\n```\n\n## Running experiments\n\nWe provide training scripts for all methods from the paper: IDM, LAPO, LAOM, LAOM+supervision. For clarity and educational purposes, all scrips are single-file and implement all stages of the LAM pipline at once: latent action model pre-training, behavioral cloning, action decoder fine-tuning. \n\n\u003cimg src=\"images/lapo-pipeline.jpg\" alt=\"Environments\" width=\"600\"\u003e\n\n\u003e [!NOTE] \n\u003e **WARN**: This is not the most efficient implementation if you need to run a lot of experiments with different hyperparameters, as it will waste time re-training from scratch duplicate parts of the pipeline. In such a case, it would be better to split these scripts into several modular ones (one for each stage).\n\nWe provide the configs used in the experiments in the `configs`. You only need to provide all the paths to the required datasets:\n```bash\npython -m train_laom_labels \\\n    --config_path=\"configs/laom-labels.yaml\" \\\n    --lapo.data_path=\"data/example-data.hdf5\" \\\n    --lapo.labeled_data_path=\"data/example-data.hdf5\" \\\n    --lapo.eval_data_path=\"data/example-data.hdf5\" \\\n    --bc.data_path=\"data/example-data.hdf5\" \\\n    --bc.dcs_backgrounds_path=\"DAVIS/JPEGImages/480p\" \\\n    --decoder.dcs_backgrounds_path=\"DAVIS/JPEGImages/480p\"\n```\n\n## Reproducing figures\n\nFor reproducibility purposes, we provide jupyter notebook which can reproduce all main figures from the paper based on our wandb logs (which are public).\n\nSee `scripts/reproducing_figures.ipynb`.\n\n## Citing\n\n```\n@article{nikulin2025latent,\n  title={Latent Action Learning Requires Supervision in the Presence of Distractors},\n  author={Nikulin, Alexander and Zisman, Ilya and Tarasov, Denis and Lyubaykin, Nikita and Polubarov, Andrei and Kiselev, Igor and Kurenkov, Vladislav},\n  journal={arXiv preprint arXiv:2502.00379},\n  year={2025}\n}\n```\n\n## Acknowledgments\n\nThis work was supported by [Artificial Intelligence Research Institute](https://airi.net/?force=en) (AIRI).\n\n\u003cimg src=\"images/logo.png\" align=\"center\" width=\"20%\" style=\"margin:15px;\"\u003e\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdunnolab%2Flaom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdunnolab%2Flaom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdunnolab%2Flaom/lists"}