{"id":28676538,"url":"https://github.com/zjunlp/wkm","last_synced_at":"2025-06-13T23:05:04.308Z","repository":{"id":247345392,"uuid":"795790529","full_name":"zjunlp/WKM","owner":"zjunlp","description":"[NeurIPS 2024] Agent Planning with World Knowledge Model","archived":false,"fork":false,"pushed_at":"2024-12-17T07:43:40.000Z","size":16097,"stargazers_count":133,"open_issues_count":0,"forks_count":10,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-05-07T20:34:16.620Z","etag":null,"topics":["agent","agent-planning","artificial-intelligence","knowledge","large-language-models","natural-language-processing","world-knowledge-model","world-model"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-04T04:21:08.000Z","updated_at":"2025-05-06T14:25:51.000Z","dependencies_parsed_at":"2024-07-08T08:55:07.861Z","dependency_job_id":null,"html_url":"https://github.com/zjunlp/WKM","commit_stats":null,"previous_names":["zjunlp/wkm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjunlp/WKM","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FWKM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FWKM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FWKM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FWKM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/WKM/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FWKM/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259732771,"owners_count":22903087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","agent-planning","artificial-intelligence","knowledge","large-language-models","natural-language-processing","world-knowledge-model","world-model"],"created_at":"2025-06-13T23:05:03.725Z","updated_at":"2025-06-13T23:05:04.278Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e WKM \u003c/h1\u003e\n\u003ch3 align=\"center\"\u003e Agent Planning with World Knowledge Model \u003c/h3\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://arxiv.org/abs/2405.14205\"\u003e📄arXiv\u003c/a\u003e •\n  \u003ca href=\"https://www.zjukg.org/project/WKM/\"\u003e🌐Web\u003c/a\u003e •\n    \u003ca href=\"https://x.com/omarsar0/status/1793851075411296761\"\u003e𝕏 Blog\u003c/a\u003e\n    •\n    \u003ca href=\"https://huggingface.co/collections/zjunlp/wkm-6684c611102213b6d8104f84\"\u003e🤗 HF\u003c/a\u003e •\n    \u003ca href=\"https://notebooklm.google.com/notebook/a3f13ad1-1bc9-4ab2-ace6-9ae4276bc970/audio\"\u003e🎧NotebookLM Audio\u003c/a\u003e\n\n\n  \n\u003c/p\u003e\n\n[![Awesome](https://awesome.re/badge.svg)](https://github.com/zjunlp/WKM) \n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n![](https://img.shields.io/github/last-commit/zjunlp/WKM?color=green) \n\n## Table of Contents\n\n- 🌻[Acknowledgement](#acknowledgement)\n- 🌟[Overview](#overview)\n- 🔧[Installation](#installation)\n- 📚[World Knowledge Build](#world-knowledge-build)\n- 📉[Model Training](#model-training)\n- 🧐[Evaluation](#evaluation)\n- 🚩[Citation](#citation)\n\n---\n\n\n\n## 🌻Acknowledgement\n\nOur code of the training module is referenced and adapted from [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), while the code of the inference module is implemented based on [ETO](https://github.com/Yifan-Song793/ETO). Various baseline codes are sourced from [ReAct](https://github.com/ysymyth/ReAct), [Reflexion](https://github.com/noahshinn/reflexion), [NAT](https://github.com/reason-wang/nat), [ETO](https://github.com/Yifan-Song793/ETO). We use LangChain with open models via [Fastchat](https://github.com/lm-sys/FastChat/blob/main/docs/langchain_integration.md). Thanks for their great contributions!\n\n\n\n![alt text](model_pic.png)\n\n## 🌟Overview\n\nRecent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results. Despite their achievements, however, they still struggle with brainless trial-and-error in global planning and generating hallucinatory actions in local planning due to their poor understanding of the \"real\" physical world. Imitating humans' world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric World Knowledge Model (***WKM***) to facilitate agent planning. Concretely, we steer the agent model to self-synthesize knowledge from both expert and sampled trajectories. Then we develop ***WKM***, providing prior task knowledge to guide the global planning and dynamic state knowledge to assist the local planning. Experimental results on three complex real-world simulated datasets with three state-of-the-art open-source LLMs, Mistral-7B, Gemma-7B, and Llama-3-8B, demonstrate that our method can achieve superior performance compared to various strong baselines. Besides, we analyze to illustrate that WKM can effectively alleviate the blind trial-and-error and hallucinatory action issues, providing strong support for the agent's understanding of the world. \nOther interesting findings include: \n1) our instance-level task knowledge can generalize better to unseen tasks, \n2) weak WKM can guide strong agent model planning\n3) unified WKM training has promising potential for further development\n\n\n\n## 🔧Installation\n\n```bash\ngit clone https://github.com/zjunlp/WKM\ncd WKM\npip install -r requirements.txt\n```\n\n## 📚World Knowledge Build\n\nTo build the task knowledge\n```sh\npython world_knowledge_build.py \\\n    --dataset_path your/rejected and chosen/data/pair \\\n    --task your/task \\\n    --gen task_knowledge \\\n    --model_name your/model/name \\\n    --output_path your/output/path\n```\n\nTo build the state knowledge\n```sh\npython world_knowledge_build.py \\\n    --dataset_path your/rejected and chosen/data/pair \\\n    --task your/task \\\n    --gen state_knowledge \\\n    --model_name your/model/name \\\n    --output_path your/output/path\n```\n\nAfter your get task_knowledge and state_knowledge, process the data to train format\n```\npython train_data_process.py \\\n    --task alfworld \\\n    --file_path your/path \\\n    --mode model_type\n    --output_path your/output/path\n```\n\nAnd use the state knowledege train data to build state knowledge cache base\n```\npython state_base_build.py \\\n    --state_file_path your/state/knowledge/path \\\n    --state_action_pair_path path/to/store/state_action/pair \\\n    --vector_cache_path path/to/store/vector/cache\n```\nOur training data has been uploaded to [huggingface](https://huggingface.co/datasets/zjunlp/WKM-train-data).\n\n## 📉Model Training\n\nUse LLama-Factory to train the agent model and world model \n```sh\nCUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \\\n    --config_file ./examples/accelerate/single_config.yaml \\\n    src/train_bash.py \\\n    --ddp_timeout 180000000 \\\n    --stage sft \\\n    --do_train \\\n    --model_name_or_path /base/model/path \\\n    --dataset_dir ./data \\\n    --dataset train_data_for_agent \\\n    --template model_template \\\n    --finetuning_type lora \\\n    --lora_target q_proj,v_proj \\\n    --output_dir ../lora/peft_model_name \\\n    --overwrite_cache \\\n    --per_device_train_batch_size 4\\\n    --gradient_accumulation_steps 2 \\\n    --lr_scheduler_type cosine \\\n    --logging_steps 1 \\\n    --save_steps 1000 \\\n    --learning_rate 1e-4 \\\n    --num_train_epochs 3 \\\n    --plot_loss \\\n    --fp16 \\\n    --cutoff_len 2048 \\\n    --save_safetensors False \\\n    --overwrite_output_dir \\\n    --train_on_prompt False\n```\n\n## 🧐Evaluation\n\n\nTo evaluate the task, you should first lanuch a local API server with fastchat. Our lora model adapter weights can be downloaded from [here](https://huggingface.co/collections/zjunlp/wkm-6684c611102213b6d8104f84).\n```sh\ncd .src/eval\n# agent_model api server\npython -u -m fastchat.serve.model_worker \\\n    --model-path /path/peft/agent_model \\\n    --port 21020 \\ \n    --worker-address http://localhost:21020 \\\n    --max-gpu-memory 31GiB \\\n    --dtype float16\n# world_knowledge_model api server\npython -u -m fastchat.serve.model_worker \\\n    --model-path /path/peft/world_model \\\n    --port 21021 \\ \n    --worker-address http://localhost:21021 \\\n    --max-gpu-memory 31GiB \\\n    --dtype float16\n```\n\nEvaluate the task\n```sh\npython -m eval_agent.eto_multi_main_probs \\\n    --agent_config fastchat \\\n    --agent_model_name agent_model \\\n    --world_model_name world_model \\\n    --exp_config alfworld \\\n    --exp_name eval \\\n    --split test\n```\n\n## 🚩Citation\n\nPlease cite our repository if you use WKM in your work. Thanks!\n\n```bibtex\n@article{DBLP:journals/corr/abs-2405-14205,\n  author       = {Shuofei Qiao and\n                  Runnan Fang and\n                  Ningyu Zhang and\n                  Yuqi Zhu and\n                  Xiang Chen and\n                  Shumin Deng and\n                  Yong Jiang and\n                  Pengjun Xie and\n                  Fei Huang and\n                  Huajun Chen},\n  title        = {Agent Planning with World Knowledge Model},\n  journal      = {CoRR},\n  volume       = {abs/2405.14205},\n  year         = {2024},\n  url          = {https://doi.org/10.48550/arXiv.2405.14205},\n  doi          = {10.48550/ARXIV.2405.14205},\n  eprinttype    = {arXiv},\n  eprint       = {2405.14205},\n  timestamp    = {Wed, 19 Jun 2024 08:52:49 +0200},\n  biburl       = {https://dblp.org/rec/journals/corr/abs-2405-14205.bib},\n  bibsource    = {dblp computer science bibliography, https://dblp.org}\n}\n```\n\n\n\n## 🎉Contributors\n\n\u003ca href=\"https://github.com/zjunlp/WKM/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=zjunlp/WKM\" /\u003e\u003c/a\u003e\n\nWe will offer long-term maintenance to fix bugs and solve issues. So if you have any problems, please put issues to us.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fwkm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Fwkm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fwkm/lists"}