{"id":50510415,"url":"https://github.com/ZJLAB-AMMI/LLM4Teach","last_synced_at":"2026-06-19T14:00:37.217Z","repository":{"id":209796758,"uuid":"724958191","full_name":"ZJLAB-AMMI/LLM4Teach","owner":"ZJLAB-AMMI","description":"Python code to implement LLM4Teach, a policy distillation approach for teaching reinforcement learning agents with Large Language Model  ","archived":false,"fork":false,"pushed_at":"2024-04-19T08:41:24.000Z","size":41,"stargazers_count":6,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-19T09:42:20.615Z","etag":null,"topics":["agents","decison-making","distillation","embodied-agent","llm","minigrid","reinforcement-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2311.13373","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZJLAB-AMMI.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-11-29T06:19:58.000Z","updated_at":"2024-04-05T02:58:53.000Z","dependencies_parsed_at":"2023-12-06T04:25:50.915Z","dependency_job_id":"3525be2e-85dc-433d-8ff6-010f8b3acd08","html_url":"https://github.com/ZJLAB-AMMI/LLM4Teach","commit_stats":{"total_commits":8,"total_committers":1,"mean_commits":8.0,"dds":0.0,"last_synced_commit":"49e7ece40b321953269138da572e946288acfe4d"},"previous_names":["zjlab-ammi/llm4teach"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ZJLAB-AMMI/LLM4Teach","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZJLAB-AMMI%2FLLM4Teach","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZJLAB-AMMI%2FLLM4Teach/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZJLAB-AMMI%2FLLM4Teach/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZJLAB-AMMI%2FLLM4Teach/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZJLAB-AMMI","download_url":"https://codeload.github.com/ZJLAB-AMMI/LLM4Teach/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZJLAB-AMMI%2FLLM4Teach/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34534278,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-19T02:00:06.005Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","decison-making","distillation","embodied-agent","llm","minigrid","reinforcement-learning"],"created_at":"2026-06-02T20:00:26.252Z","updated_at":"2026-06-19T14:00:37.205Z","avatar_url":"https://github.com/ZJLAB-AMMI.png","language":"Python","funding_links":[],"categories":["🤖 Agent \u0026 Embodied OPD (by application)"],"sub_categories":["🔁 Iterative Self-Bootstrapping"],"readme":"# [Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents](https://arxiv.org/abs/2311.13373)\n\n## Abstract \nRecent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.\n\n## Purpose\nThis repo is intended to serve as a foundation with which you can reproduce the results of the experiments detailed in our paper, [Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents](https://arxiv.org/abs/2311.13373).\n\n\n## Running experiments\n### Setup the LLMs\n\n1. For ChatGLM models, please use your own api_key and run the following code to launch the API\n```bash\npython3 -m utils.chatglm_api --host \u003cAPI_host\u003e --port \u003cAPI_port\u003e\n```\n\n2. For Vicuna models, please follow the instruction from [FastChat](https://github.com/lm-sys/FastChat) to install Vicuna model on local sever. Here are the commands to launch the API in terminal: \n\n```bash\npython3 -m fastchat.serve.controller --host localhost --port \u003ccontroller_port\u003e        ### Launch the controller\npython3 -m fastchat.serve.model_worker --model-name '\u003cmodel_name\u003e' --model-path \u003cVicuna_path\u003e --controller http://localhost:\u003ccontroller_port\u003e --port \u003cmodel_port\u003e --worker_address http://localhost:\u003cmodel_port\u003e        ### Launch the model worker\npython3 -m fastchat.serve.api --host \u003cAPI_host\u003e --port \u003cAPI_port\u003e        ### Launch the API\n```\n\n\n### Train and evaluate the models\nAny algorithm can be run from the main.py entry point.\n\nTo train on a SimpleDoorKey environment,\n\n```bash\npython main.py train --task SimpleDoorKey --savedir train\n```\n\n\u003c!--to train with given query result from LLM as teacher,\n\n```bash\npython main.py train --task SimpleDoorKey --savedir train --offline_planner\n```--\u003e\n\nTo evaluate the trained model,\n\n```bash\npython main.py eval --task SimpleDoorKey --loaddir train --savedir eval\n```\n\nTo evaluate the LLM-based teacher baseline,\n```bash\npython main.py eval --task SimpleDoorKey --loaddir train --savedir eval --eval_teacher\n```\n\n## Logging details \nTensorboard logging is enabled by default for all algorithms. The logger expects that you supply an argument named ```logdir```, containing the root directory you want to store your logfiles\n\nThe resulting directory tree would look something like this:\n```\nlog/                         # directory with all of the saved models and tensorboard \n└── ppo                                 # algorithm name\n    └── simpledoorkey                   # environment name\n        └── save_name                   # unique save name \n            ├── acmodel.pt              # actor and critic network for algo\n            ├── events.out.tfevents     # tensorboard binary file\n            └── config.json             # readable hyperparameters for this run\n```\n\nUsing tensorboard makes it easy to compare experiments and resume training later on.\n\nTo see live training progress\n\nRun ```$ tensorboard --logdir=log``` then navigate to ```http://localhost:6006/``` in your browser\n\n## Citation\nIf you find [our work](https://arxiv.org/abs/2311.13373) useful, please kindly cite: \n```bibtex\n@inproceedings{zhou2024large,\n  title={Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents},\n  author={Zhou, Zihao and Hu, Bin and Zhao, Chenyang and Zhang, Pu and Liu, Bin},\n  booktitle={The 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)},\n  year={2024}\n}\n```\n\n## Acknowledgements\nThis work is supported by Exploratory Research Project (No.2022RC0AN02) of Zhejiang Lab.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZJLAB-AMMI%2FLLM4Teach","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FZJLAB-AMMI%2FLLM4Teach","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FZJLAB-AMMI%2FLLM4Teach/lists"}