{"id":17838479,"url":"https://github.com/haoliuhl/instructrl","last_synced_at":"2025-07-31T18:33:42.499Z","repository":{"id":112162982,"uuid":"556154926","full_name":"haoliuhl/instructrl","owner":"haoliuhl","description":"Instruction Following Agents with Multimodal Transforemrs","archived":false,"fork":false,"pushed_at":"2022-11-03T06:52:32.000Z","size":196,"stargazers_count":52,"open_issues_count":4,"forks_count":5,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-28T21:40:49.896Z","etag":null,"topics":["flax","instruction-following","instructions","jax","machine-learning","reinforcement-learning","transformer","vision-language-transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/haoliuhl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-23T07:15:40.000Z","updated_at":"2024-11-24T15:11:02.000Z","dependencies_parsed_at":"2023-05-10T18:00:37.369Z","dependency_job_id":null,"html_url":"https://github.com/haoliuhl/instructrl","commit_stats":null,"previous_names":["forhaoliu/instructrl","haoliuhl/instructrl"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haoliuhl%2Finstructrl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haoliuhl%2Finstructrl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haoliuhl%2Finstructrl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/haoliuhl%2Finstructrl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/haoliuhl","download_url":"https://codeload.github.com/haoliuhl/instructrl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244031024,"owners_count":20386534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["flax","instruction-following","instructions","jax","machine-learning","reinforcement-learning","transformer","vision-language-transformer"],"created_at":"2024-10-27T20:57:05.165Z","updated_at":"2025-03-19T23:30:54.368Z","avatar_url":"https://github.com/haoliuhl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# InstructRL\n\nThis is a Jax implementation for the *Instruct*RL method.\n\nPaper: [Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models](https://arxiv.org/abs/2210.13431).\n\n![model archiecture](./pictures/model.jpg)\n\nThis implementation has been tested on GPU and Google Cloud TPU and supports multi-host training with TPU Pods.\n\nThe code supports the following methods and baselines\n- From scratch: Training transformer policy from scratch w/ and w/o instructions.\n- CLIP-RL: Training transformer policy with pretrained OpenAI CLIP-VIT w/ and w/o instructions.\n- *Instruct*RL: Training transformer policy with pretrained multimodal MAE encoding w/ and w/o instructions.\n\nThe code also supports training and evaluating with both continuous and discretized robot action.\n\n## Installation\nIf this is on GPU, install CoppeliaSim with [coppeliasim script](./scripts/coppeliasim.sh), then install the dependencies with pip.\n```\ncat gpu_requirements.txt | xargs -n 1 -L 1 pip install\n```\n\nIf this is on TPU, install the dependencies using the [TPU setup script](./scripts/tpu_vm_setup.sh).\nAfter installing dependencies, add this repo directory to your `PYTHONPATH` environment variable\n```\nexport PYTHONPATH=\"$PYTHONPATH:$(pwd)\"\n```\n\n## Usage\nExperiments can be launched via the following commands.\n\nTraining a policy transformer using pretrained multimodal MAE encoding\n```\nexport PYTHONPATH=\"$PYTHONPATH:$PROJECT_DIR\"\nexport PYTHONPATH=\"$PYTHONPATH:$PROJECT_DIR/instructrl/models\"\necho $PYTHONPATH\nexport WANDB_API_KEY=''\n\nexport bucket_name='instruct-rl'\n\nexport experiment_name='instructrl'\n\nONLINE=True\nDATASET=\"reach_target\"\nMODEL_TYPE=\"vit_base\"\nTRANSFER_TYPE=\"m3ae_vit_b16\"\nBATCH_SIZE=2048\nINSTRUCTION=\"moving to one of the spheres\"\nNOTE=\"pt: $TRANSFER_TYPE inst: $INSTRUCTION batch size: $BATCH_SIZE policy: $MODEL_TYPE dataset: $DATASET\"\n\npython3 -m instructrl.instructrl_main \\\n    --is_tpu=True \\\n    --dataset_name=\"$DATASET\" \\\n    --model.model_type=\"$MODEL_TYPE\" \\\n    --model.transfer_type=\"$TRANSFER_TYPE\" \\\n    --window_size=4 \\\n    --val_every_epochs=1 \\\n    --test_every_epochs=1 \\\n    --instruct=\"$INSTRUCTION\" \\\n    --batch_size=\"$BATCH_SIZE\" \\\n    --weight_decay=0.0 \\\n    --lr=3e-4 \\\n    --auto_scale_lr=False \\\n    --lr_schedule=cos \\\n    --warmup_epochs=5 \\\n    --momentum=0.9 \\\n    --clip_gradient=10.0 \\\n    --epochs=200 \\\n    --dataloader_n_workers=16 \\\n    --dataloader_shuffle=False \\\n    --log_all_worker=False \\\n    --logging.online=\"$ONLINE\" \\\n    --logging.prefix='' \\\n    --logging.project=\"$experiment_name\" \\\n    --logging.gcs_output_dir=\"gs://$bucket_name/instructrl/experiment_output/$experiment_name\" \\\n    --logging.output_dir=\"$HOME/experiment_output/$experiment_name\" \\\n    --logging.random_delay=0.0 \\\n    --logging.notes=\"$NOTE\"\n```\n\nThe *model.transfer_type* argument determines pretrained transformers, with the following options supported\n- VIT training from scratch \"None\"\n- M3AE pretrained model \"m3ae_vit_b16\" (with larger models coming soon)\n- CLIP pretrained model \"clip_vit_b32\" and \"clip_vit_b16\"\n\nThe *model.model_type* argument determines the type of trained from scratch policy transformer, it can be one of vit_small, vit_base, and vit_large.\n\nFor large-scale training (e.g. training on near 100 tasks), it is recommended to use large TPU pod.\nWe provide the job script for launching large-scale training in [jobs](./jobs/tpu_control.sh).\n\nEvaluating trained model\n```\nexport PYTHONPATH=\"$PYTHONPATH:$PROJECT_DIR\"\nexport PYTHONPATH=\"$PYTHONPATH:$PROJECT_DIR/instructrl/models\"\necho $PYTHONPATH\nexport WANDB_API_KEY=''\n\nexport bucket_name='instruct-rl'\n\nexport experiment_name='instructrl'\n\nONLINE=True\nDATASET=\"reach_target\"\nMODEL_TYPE=\"vit_base\"\nTRANSFER_TYPE=\"m3ae_vit_b16\"\nINSTRUCTION=\"moving to one of the spheres\"\nCKPT=\"\"\nNOTE=\"Local rollout. pt: $TRANSFER_TYPE inst: $INSTRUCTION policy: $MODEL_TYPE dataset: $DATASET\"\n\npython3 -m instructrl.local_run \\\n    --load_checkpoint \"$CKPT\" \\\n    --dataset_name=\"$DATASET\" \\\n    --model.model_type=\"$MODEL_TYPE\" \\\n    --model.transfer_type=\"$TRANSFER_TYPE\" \\\n    --window_size=1 \\\n    --instruct=\"$INSTRUCTION\" \\\n    --log_all_worker=False \\\n    --data.path=\"$PROJECT_DIR/data/variation\" \\\n    --logging.online=\"$ONLINE\" \\\n    --logging.prefix='' \\\n    --logging.project=\"$experiment_name\" \\\n    --logging.gcs_output_dir=\"gs://$bucket_name/instructrl/experiment_output/$experiment_name\" \\\n    --logging.output_dir=\"$HOME/experiment_output/$experiment_name\" \\\n    --logging.random_delay=0.0 \\\n    --logging.notes=\"$NOTE\"\n```\n\n## Data generation\nOur preprocessed single-task and multi-task data will be released soon.\nIn order to facilitate large scale training on cloud, we store all the dataset\nas HDF5 files and read them from cloud storage buckets.\nThe HDF5 data contains images, states and actions.\nAn example of data generation can be found in [collect_data script](./data/collect_data.py).\n\n## Acknowledgement\nThe Multimodal MAE implementation is largely based on [m3ae](https://github.com/young-geng/m3ae_public) and the CLIP implementation is largely based on [scenic](https://github.com/google-research/scenic/tree/main/scenic/projects/baselines/clip).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaoliuhl%2Finstructrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhaoliuhl%2Finstructrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhaoliuhl%2Finstructrl/lists"}