{"id":19682768,"url":"https://github.com/ml-jku/l2m","last_synced_at":"2025-10-24T06:41:07.498Z","repository":{"id":176544775,"uuid":"624868542","full_name":"ml-jku/L2M","owner":"ml-jku","description":"Learning to Modulate pre-trained Models in RL (Decision Transformer, LoRA, Fine-tuning)","archived":false,"fork":false,"pushed_at":"2024-10-06T09:24:45.000Z","size":754,"stargazers_count":56,"open_issues_count":0,"forks_count":6,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-29T05:35:10.984Z","etag":null,"topics":["continual-learning","decision-transformers","fine-tuning","lora","multitask-learning","reinforcement-learning","robotics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ml-jku.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-04-07T13:08:03.000Z","updated_at":"2025-04-02T15:05:55.000Z","dependencies_parsed_at":"2023-12-19T16:10:38.529Z","dependency_job_id":"ddbc2643-dae8-4a5f-bb2b-66de70f8e6fb","html_url":"https://github.com/ml-jku/L2M","commit_stats":null,"previous_names":["ml-jku/l2m"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ml-jku/L2M","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-jku%2FL2M","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-jku%2FL2M/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-jku%2FL2M/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-jku%2FL2M/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ml-jku","download_url":"https://codeload.github.com/ml-jku/L2M/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ml-jku%2FL2M/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265322347,"owners_count":23746604,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["continual-learning","decision-transformers","fine-tuning","lora","multitask-learning","reinforcement-learning","robotics"],"created_at":"2024-11-11T18:12:08.515Z","updated_at":"2025-10-24T06:41:07.426Z","avatar_url":"https://github.com/ml-jku.png","language":"Python","readme":"# Learning to Modulate pre-trained Models in RL\n[![arXiv](https://img.shields.io/badge/arXiv-2306.14884-b31b1b.svg)](https://arxiv.org/abs/2306.14884)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nThomas Schmied\u003csup\u003e**1**\u003c/sup\u003e, Markus Hofmarcher\u003csup\u003e**2**\u003c/sup\u003e, Fabian Paischer\u003csup\u003e**1**\u003c/sup\u003e, Razvan Pacscanu\u003csup\u003e**3,4**\u003c/sup\u003e, Sepp Hochreiter\u003csup\u003e**1,5**\u003c/sup\u003e \n\n\u003csup\u003e**1**\u003c/sup\u003eELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria\\\n\u003csup\u003e**2**\u003c/sup\u003eJKU LIT SAL eSPML Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria\\\n\u003csup\u003e**3**\u003c/sup\u003eGoogle DeepMind\\\n\u003csup\u003e**4**\u003c/sup\u003eUCL\\\n\u003csup\u003e**5**\u003c/sup\u003eInstitute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria\n\nThis repository contains the source code for **\"Learning to Modulate pre-trained Models in RL\"** accepted at NeurIPS 2023.\nThe paper is available [here](https://arxiv.org/abs/2306.14884). \n\n![Multi-domain Decision Transformer (MDDT)](./figures/mddt.png) \n\n## Overview\nThis codebase supports training [Decision Transformer (DT)](https://arxiv.org/abs/2106.01345) models online or from offline datasets on the following domains: \n- [Meta-World](https://github.com/Farama-Foundation/Metaworld) / [Continual-World](https://github.com/awarelab/continual_world)\n- [Atari](https://github.com/openai/gym)\n- [Gym-MuJoCo](https://github.com/openai/gym)\n- [ProcGen](https://github.com/openai/procgen)\n- [DMControl](https://github.com/deepmind/dm_control)\n\nThis codebase relies on open-source frameworks, including: \n- [PyTorch](https://github.com/pytorch/pytorch)\n- [Huggingface transformers](https://github.com/huggingface/transformers)\n- [stable-baselines3](https://github.com/DLR-RM/stable-baselines3)\n- [wandb](https://github.com/wandb/wandb)\n- [Hydra](https://github.com/facebookresearch/hydra)\n\nWhat is in this repository?\n```\n.\n├── configs                    # Contains all .yaml config files for Hydra to configure agents, envs, etc.\n│   ├── agent_params            \n│   ├── wandb_callback_params\n│   ├── env_params\n│   ├── eval_params\n│   ├── run_params\n│   └── config.yaml            # Main config file for Hydra - specifies log/data/model directories.\n├── continual_world            # Submodule for Continual-World.\n├── dmc2gym_custom             # Custom wrapper for DMControl.\n├── figures             \n├── scripts                    # Scrips for running experiments on Slurm/PBS in multi-gpu/node setups.\n├── src                        # Main source directory.\n│   ├── algos                  # Contains agent/model/prompt classes.\n│   ├── augmentations          # Image augmentations.\n│   ├── buffers                # Contains replay trajectory buffers.\n│   ├── callbacks              # Contains callbacks for training (e.g., WandB, evaluation, etc.).\n│   ├── data                   # Contains data utilities (e.g., for downloading Atari)\n│   ├── envs                   # Contains functionality for creating environments.\n│   ├── exploration            # Contains exploration strategies.\n│   ├── optimizers             # Contains (custom) optimizers.\n│   ├── schedulers             # Contains learning rate schedulers.\n│   ├── tokenizers_custom      # Contains custom tokenizers for discretizing states/actions.\n│   ├── utils                  \n│   └── __init__.py\n├── LICENSE\n├── README.md\n├── environment.yaml\n├── requirements.txt\n└── main.py                     # Main entry point for training/evaluating agents.\n```\n## Installation\nEnvironment configuration and dependencies are available in `environment.yaml` and `requirements.txt`.\n\nFirst, create the conda environment.\n```\nconda env create -f environment.yaml\nconda activate mddt\n```\n\nThen install the remaining requirements (with MuJoCo already downloaded, if not see [here](#MuJoCo-installation)): \n```\npip install -r requirements.txt\n```\n\n\u003c!-- It may be necessary to install PyTorch again, in case GPU is not detected: \n```\npip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113\n``` --\u003e\n\nInit the `continualworld` submodule and install: \n```\ngit submodule init\ngit submodule update\ncd continualworld\npip install .\n```\nInstall `meta-world`:\n```\npip install git+https://github.com/rlworkgroup/metaworld.git@18118a28c06893da0f363786696cc792457b062b\n```\n\nInstall custom version of [dmc2gym](https://github.com/denisyarats/dmc2gym). Our version makes `flatten_obs` optional, \nand, thus, allows us to construct the full observation space of all DMControl envs. \n```\ncd dmc2gym_custom\npip install -e .\n```\n\n### MuJoCo installation\nDownload MuJoCo:\n```\nmkdir ~/.mujoco\ncd ~/.mujoco\nwget https://www.roboti.us/download/mujoco200_linux.zip\nunzip mujoco200_linux.zip\nmv mujoco200_linux mujoco200\nwget https://www.roboti.us/file/mjkey.txt\n```\nThen add the following line to `.bashrc`:\n```\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco200/bin\n```\n\n#### Troubleshooting on cluster (without root access)\nThe following issues were helpful: \n- https://github.com/openai/mujoco-py/issues/96#issuecomment-678429159\n- https://github.com/openai/mujoco-py/issues/627#issuecomment-1383054926\n- https://github.com/openai/mujoco-py/issues/323#issuecomment-618365770\n\nFirst, install the following packages: \n```\nconda install -c conda-forge glew mesalib\nconda install -c menpo glfw3 osmesa\npip install patchelf\n```\nCreate the symlink manually: \n- https://github.com/openai/mujoco-py/issues/763#issuecomment-1519090452 \n```\ncp /usr/lib64/libGL.so.1 $CONDA_PREFIX/lib\nln -s $CONDA_PREFIX/lib/libGL.so.1 $CONDA_PREFIX/lib/libGL.so\n```\nThen do: \n```\nmkdir ~/rpm\ncd ~/rpm\ncurl -o libgcrypt11.rpm ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/bosconovic:/branches:/home:/elimat:/lsi/openSUSE_Leap_15.1/x86_64/libgcrypt11-1.5.4-lp151.23.29.x86_64.rpm\nrpm2cpio libgcrypt11.rpm | cpio -id\n```\nFinally, export the path to `rpm` dir (add to `~/.bashrc`):\n```\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/rpm/usr/lib64\nexport LDFLAGS=\"-L/~/rpm/usr/lib64\"\n```\n\n## Setup\n\n### Experiment configuration\nThis codebase relies on [Hydra](https://github.com/facebookresearch/hydra), which configures experiments via `.yaml` files. \nHydra automatically creates the log folder structure for a given run, as specified in the respective `config.yaml` file.\n\nThe `config.yaml` is the main configuration entry point and contains the default parameters. The file references the respective default parameter files under the block\n`defaults`. In addition, `config.yaml` contains 4 important constants that configure the directory paths: \n```\nLOG_DIR: ../logs\nDATA_DIR: ../data\nSSD_DATA_DIR: ../data\nMODELS_DIR: ../models\n```\n\n### Datasets\nThe genereated datasets are currently hosted via our web-server. Download Meta-World and DMControl datasets to the specified `DATA_DIR`: \n```\n# Meta-World\nwget --recursive --no-parent --no-host-directories --cut-dirs=2 -R \"index.html*\" https://ml.jku.at/research/l2m/metaworld\n# DMControl\nwget --recursive --no-parent --no-host-directories --cut-dirs=2 -R \"index.html*\" https://ml.jku.at/research/l2m/dm_control_1M\n```\nThe datasets are also available on the Huggingface hub. Download using the `huggingface-cli`: \n```\n# Meta-World\nhuggingface-cli download ml-jku/meta-world --local-dir=./meta-world --repo-type dataset\n# DMControl\nhuggingface-cli download ml-jku/dm_control --local-dir=./dm_control --repo-type dataset\n```\nThe framework also supports Atari, D4RL, and visual DMControl datasets. \nFor [Atari](src/data/atari/README.md) and [visual DMControl](src/data/dm_control/README.md), we refer to the respective READMEs.\n\n## Running experiments\nIn the following, we provide some illustrative examples of how to run the experiments in the paper. \n\n### Pre-training runs\nTo train a 40M multi-domain Decision Transformer (MDDT) model on MT40 + DMC10 with 3 seeds on a single GPU, run: \n```\npython main.py -m experiment_name=pretrain seed=42,43,44 env_params=multi_domain_mtdmc run_params=pretrain eval_params=pretrain_disc agent_params=cdt_pretrain_disc agent_params.kind=MDDT agent_params/model_kwargs=multi_domain_mtdmc agent_params/data_paths=mt40v2_dmc10 +agent_params/replay_buffer_kwargs=multi_domain_mtdmc +agent_params.accumulation_steps=2\n```\n\n### Single-task fine-tuning\nTo fine-tune the pre-trained model using LoRA on a single CW10 task with 3 seeds, run: \n```\npython main.py -m experiment_name=cw10_lora seed=42,43,44 env_params=mt50_pretrain run_params=finetune eval_params=finetune agent_params=cdt_mpdt_disc agent_params/model_kwargs=mdmpdt_mtdmc agent_params/data_paths=cw10_v2_cwnet_2M +agent_params/replay_buffer_kwargs=mtdmc_ft agent_params/model_kwargs/prompt_kwargs=lora env_params.envid=hammer-v2 agent_params.data_paths.names='${env_params.envid}.pkl' env_params.eval_env_names=\n```\n\n### Continual fine-tuning\nTo fine-tune the pre-trained model using L2M on all CW10 tasks in a sequential manner with 3 seeds, run: \n```\npython main.py -m experiment_name=cw10_cl_l2m seed=42,43,44 env_params=multi_domain_ft env_params.eval_env_names=cw10_v2 run_params=finetune_coff eval_params=finetune_md_cl agent_params=cdt_mpdt_disc +agent_params.steps_per_task=100000 agent_params/model_kwargs=mdmpdt_mtdmc agent_params/data_paths=cw10_v2_cwnet_2M +agent_params/replay_buffer_kwargs=mtdmc_ft +agent_params.replay_buffer_kwargs.kind=continual agent_params/model_kwargs/prompt_kwargs=l2m_lora\n```\n\n### Multi-GPU training \nFor multi-GPU training, we use `torchrun`. The tool conflicts with `hydra`. \nTherefore, a launcher plugin [hydra_torchrun_launcher](https://github.com/facebookresearch/hydra/tree/main/contrib/hydra_torchrun_launcher) was created.\n\nTo enable the plugin, clone the `hydra` repo, cd to `contrib/hydra_torchrun_launcher`, and pip install the plugin: \n```\ngit clone https://github.com/facebookresearch/hydra.git\ncd hydra/contrib/hydra_torchrun_launcher\npip install -e .\n```\nThe plugin can be used from the commandline: \n```\npython main.py -m hydra/launcher=torchrun hydra.launcher.nproc_per_node=4 [...]\n```\nRunning experiments on a local cluster on a single node can be done via `CUDA_VISIBLE_DEVICES` to specify the GPUs to use: \n```\nCUDA_VISIBLE_DEVICES=0,1,2,3 python main.py -m hydra/launcher=torchrun hydra.launcher.nproc_per_node=4 [...]\n```\n\nOn Slurm, executing `torchrun` on a single node works alike. E.g., to run on 2 GPUs on a single node: \n```\n#!/bin/bash\n#SBATCH --account=X\n#SBATCH --qos=X\n#SBATCH --partition=X\n#SBATCH --nodes=1\n#SBATCH --gpus=2\n#SBATCH --cpus-per-task=32\n\nsource activate mddt\npython main.py -m hydra/launcher=torchrun hydra.launcher.nproc_per_node=2 [...]\n```\nExample scripts for multi-gpu training on Slurm or PBS are available in `scripts`.\n\n### Multi-node training\nRunning on Slurm/PBS in a multi-node setup requires a little more care. Example scripts are provided in `scripts`.\n\n## Citation\nIf you find this useful, please consider citing our work: \n```\n@article{schmied2024learning,\n  title={Learning to Modulate pre-trained Models in RL},\n  author={Schmied, Thomas and Hofmarcher, Markus and Paischer, Fabian and Pascanu, Razvan and Hochreiter, Sepp},\n  journal={Advances in Neural Information Processing Systems},\n  volume={36},\n  year={2024}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-jku%2Fl2m","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fml-jku%2Fl2m","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fml-jku%2Fl2m/lists"}