{"id":23587647,"url":"https://github.com/OSU-NLP-Group/TravelPlanner","last_synced_at":"2025-08-30T04:31:24.646Z","repository":{"id":218903297,"uuid":"737668213","full_name":"OSU-NLP-Group/TravelPlanner","owner":"OSU-NLP-Group","description":"[ICML'24 Spotlight] \"TravelPlanner: A Benchmark for Real-World Planning with Language Agents\"","archived":false,"fork":false,"pushed_at":"2025-06-14T17:08:35.000Z","size":85735,"stargazers_count":374,"open_issues_count":1,"forks_count":54,"subscribers_count":14,"default_branch":"main","last_synced_at":"2025-06-14T18:23:16.126Z","etag":null,"topics":["autonomous-agents","language-agent","large-language-models","planning"],"latest_commit_sha":null,"homepage":"https://osu-nlp-group.github.io/TravelPlanner/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OSU-NLP-Group.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-01T02:08:53.000Z","updated_at":"2025-06-14T17:08:38.000Z","dependencies_parsed_at":"2025-01-20T12:11:55.453Z","dependency_job_id":null,"html_url":"https://github.com/OSU-NLP-Group/TravelPlanner","commit_stats":null,"previous_names":["osu-nlp-group/travelplanner"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/OSU-NLP-Group/TravelPlanner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OSU-NLP-Group%2FTravelPlanner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OSU-NLP-Group%2FTravelPlanner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OSU-NLP-Group%2FTravelPlanner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OSU-NLP-Group%2FTravelPlanner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OSU-NLP-Group","download_url":"https://codeload.github.com/OSU-NLP-Group/TravelPlanner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OSU-NLP-Group%2FTravelPlanner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272805296,"owners_count":24995909,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autonomous-agents","language-agent","large-language-models","planning"],"created_at":"2024-12-27T05:01:25.769Z","updated_at":"2025-08-30T04:31:24.633Z","avatar_url":"https://github.com/OSU-NLP-Group.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话","Benchmarks"],"sub_categories":["大语言对话模型及数据"],"readme":"\u003ch1 align=\"center\"\u003eTravelPlanner\u003cbr\u003e A Benchmark for Real-World Planning\u003cbr\u003e with Language Agents \u003c/h1\u003e\n\n![Travel Planner](https://img.shields.io/badge/Task-Planning-blue)\n![Travel Planner](https://img.shields.io/badge/Task-Tool_Use-blue) \n![Travel Planner](https://img.shields.io/badge/Task-Language_Agents-blue)  \n![GPT-4](https://img.shields.io/badge/Model-GPT--4-green) \n![LLMs](https://img.shields.io/badge/Model-LLMs-green)\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"images/icon.png\" width=\"10%\"\u003e \u003cbr\u003e\n\u003c/p\u003e\n\nCode for the Paper \"[TravelPlanner: A Benchmark for Real-World Planning with Language Agents](http://arxiv.org/abs/2402.01622)\".\n\n![Demo Video GIF](images/TravelPlanner.gif)\n\n\u003cp align=\"center\"\u003e\n[\u003ca href=\"https://osu-nlp-group.github.io/TravelPlanner/\"\u003eWebsite\u003c/a\u003e] •\n[\u003ca href=\"http://arxiv.org/abs/2402.01622\"\u003ePaper\u003c/a\u003e] •\n[\u003ca href=\"https://huggingface.co/datasets/osunlp/TravelPlanner\"\u003eDataset\u003c/a\u003e] •\n[\u003ca href=\"https://github.com/OSU-NLP-Group/TravelPlanner/blob/main/README.md#model-release\"\u003eModels\u003c/a\u003e] •\n[\u003ca href=\"https://huggingface.co/spaces/osunlp/TravelPlannerLeaderboard\"\u003eLeaderboard\u003c/a\u003e] •\n[\u003ca href=\"https://huggingface.co/spaces/osunlp/TravelPlannerEnvironment\"\u003eEnvironment\u003c/a\u003e] •\n[\u003ca href=\"https://twitter.com/ysu_nlp/status/1754365367294562680\"\u003eTwitter\u003c/a\u003e]\n\u003c/p\u003e\n\n## Updates\n\n- 2024/10/23: Release the [models](https://github.com/OSU-NLP-Group/TravelPlanner/blob/main/README.md#model-release) finetuned on TravelPlanner. The data could be found [here](./finetuning_data). We use [LLama-Factory](https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/llama3_lora_sft.sh) for fine-tuning.\n- 2024/7/14: Support [reference information](./database) in JSON format.\n- 2024/4/28: Update the [warnings](https://github.com/OSU-NLP-Group/TravelPlanner/tree/main?tab=readme-ov-file#%EF%B8%8Fwarnings), please note that we strictly prohibit any form of cheating.\n- 2024/4/21: Provide [format check tool](./postprocess/format_check.py)  for testset submission files.  You can run it to check if there are any format errors in your file.\n\n# TravelPlanner\n\nTravelPlanner is a benchmark crafted for evaluating language agents in tool-use and complex planning within multiple constraints.\n\nFor a given query, language agents are expected to formulate a comprehensive plan that includes transportation, daily meals, attractions, and accommodation for each day.\n\nFor constraints, from the perspective of real world applications, TravelPlanner includes three types of them: Environment Constraint, Commonsense Constraint, and Hard Constraint. \n\n\n## Setup Environment\n\n1. Create a conda environment and install dependencies:\n```bash\nconda create -n travelplanner python=3.9\nconda activate travelplanner\npip install -r requirements.txt\n```\n\n2. Download the [database](https://drive.google.com/file/d/1pF1Sw6pBmq2sFkJvm-LzJOqrmfWoQgxE/view?usp=drive_link) and unzip it to the `TravelPlanner` directory (i.e., `your/path/TravelPlanner`).\n\n## Running\n### Two-stage Mode\n\nIn the two-stage mode, language agents are tasked with employing various search tools to gather information.\nBased on the collected information, language agents are expected to deliver a plan that not only meets the user’s needs specified in the query but also adheres to commonsense constraints.\n\n```bash\nexport OUTPUT_DIR=path/to/your/output/file\n# We support MODEL in ['gpt-3.5-turbo-X','gpt-4-1106-preview','gemini','mistral-7B-32K','mixtral']\nexport MODEL_NAME=MODEL_NAME\nexport OPENAI_API_KEY=YOUR_OPENAI_KEY\n# if you do not want to test google models, like gemini, just input \"1\".\nexport GOOGLE_API_KEY=YOUR_GOOGLE_KEY\n# SET_TYPE in ['validation', 'test']\nexport SET_TYPE=validation\ncd agents\npython tool_agents.py  --set_type $SET_TYPE --output_dir $OUTPUT_DIR --model_name $MODEL_NAME\n```\nThe generated plan will be stored in OUTPUT_DIR/SET_TYPE.\n\n### Sole-Planning Mode\n\nTravelPlanner also provides an easier mode solely focused on testing their planning ability.\nThe sole-planning mode ensures that no crucial information is missed, thereby enabling agents to focus on planning itself.\n\nPlease refer to the paper for more details.\n\n```bash\nexport OUTPUT_DIR=path/to/your/output/file\n# We support MODEL in ['gpt-3.5-turbo-X','gpt-4-1106-preview','gemini','mistral-7B-32K','mixtral']\nexport MODEL_NAME=MODEL_NAME\nexport OPENAI_API_KEY=YOUR_OPENAI_KEY\n# if you do not want to test google models, like gemini, just input \"1\".\nexport GOOGLE_API_KEY=YOUR_GOOGLE_KEY\n# SET_TYPE in ['validation', 'test']\nexport SET_TYPE=validation\n# STRATEGY in ['direct','cot','react','reflexion']\nexport STRATEGY=direct\n\ncd tools/planner\npython sole_planning.py  --set_type $SET_TYPE --output_dir $OUTPUT_DIR --model_name $MODEL_NAME --strategy $STRATEGY\n```\n\n## Postprocess\n\nIn order to parse natural language plans, we use gpt-4 to convert these plans into json formats. We encourage developers to try different parsing prompts to obtain better-formatted plans.\n\n```bash\nexport OUTPUT_DIR=path/to/your/output/file\nexport MODEL_NAME=MODEL_NAME\nexport OPENAI_API_KEY=YOUR_OPENAI_KEY\nexport SET_TYPE=validation\nexport STRATEGY=direct\n# MODE in ['two-stage','sole-planning']\nexport MODE=two-stage\nexport TMP_DIR=path/to/tmp/parsed/plan/file\nexport SUBMISSION_DIR=path/to/your/evaluation/file\n\ncd postprocess\npython parsing.py  --set_type $SET_TYPE --output_dir $OUTPUT_DIR --model_name $MODEL_NAME --strategy $STRATEGY --mode $MODE --tmp_dir $TMP_DIR\n\n# Then these parsed plans should be stored as the real json formats.\npython element_extraction.py  --set_type $SET_TYPE --output_dir $OUTPUT_DIR --model_name $MODEL_NAME --strategy $STRATEGY --mode $MODE --tmp_dir $TMP_DIR\n\n# Finally, combine these plan files for evaluation. We also provide a evaluation example file \"example_evaluation.jsonl\" in the postprocess folder.\npython combination.py --set_type $SET_TYPE --output_dir $OUTPUT_DIR --model_name $MODEL_NAME --strategy $STRATEGY --mode $MODE  --submission_file_dir $SUBMISSION_DIR\n```\n\n## Evaluation\n\nWe support the offline validation set evaluation using the provided evaluation script. To avoid data contamination, please use our official [leaderboard](https://huggingface.co/spaces/osunlp/TravelPlannerLeaderboard) for test set evaluation.\n\n```bash\nexport SET_TYPE=validation\nexport EVALUATION_FILE_PATH=your/evaluation/file/path\n\ncd evaluation\npython eval.py --set_type $SET_TYPE --evaluation_file_path $EVALUATION_FILE_PATH\n```\n\n## ⚠️Warnings\n\nWe release our evaluation scripts to foster innovation and aid the development of new methods.  We encourage the use of evaluation feedback in training set, such as implementing reinforcement learning techniques, to enhance learning. However, we strictly prohibit any form of cheating in the validation and test sets to uphold the fairness and reliability of the benchmark's evaluation process. We reserve the right to disqualify results if we find any of the following violations:\n\n1. Reverse engineering of our dataset, which includes, but is not limited to:\n   - Converting our natural language queries in the test set to structured formats (e.g., JSON) for optimization and unauthorized evaluation.\n   - Deriving data point entries using the hard rules from our data construction process, without accessing the actual database.\n   - Other similar manipulations.\n2. Hard coding or explicitly writing evaluation cues into prompts by hand, such as direct hints of common sense, which contradicts our goals as it lacks generalizability and is limited to this specific benchmark.\n3. Any other human interference strategies that are tailored specifically to this benchmark but lack generalization capabilities.\n\n(The content above is intended solely for use within the TravelPlanner evaluation framework. Extending and editing our database to create new tasks or benchmarks is permitted, provided that you adhere to the licensing terms.)\n\n## Load Datasets\n\n```python\nfrom datasets import load_dataset\n# \"test\" can be substituted by \"train\" or \"validation\".\ndata = load_dataset('osunlp/TravelPlanner','test')['test']\n```\n\n## Model Release\n\nWe fine-tune **Llama3.1-8B-Instruct** and **Qwen2-7B-Instruct** on TravelPlanner ('sole-planning' mode). The fine-tuned model weights are available on the HuggingFace 🤗.\n\n- **[Llama-3.1-8B-Instruct-travelplanner-SFT](https://huggingface.co/hsaest/Llama-3.1-8B-Instruct-travelplanner-SFT)**\n- **[Qwen2-7B-Instruct-travelplanner-SFT](https://huggingface.co/hsaest/Qwen2-7B-Instruct-travelplanner-SFT)**\n\n|                    | Commonsense (Micro) | Commonsense (Macro) | Hard (Micro) | Hard (Macro) | Final Pass Rate |\n|--------------------|:-------------------:|:-------------------:|:------------:|:------------:|:---------------:|\n| **Direct Prompting**|                     |                     |              |              |                 |\n| Llama3.1-8B        |        60.1          |         0.0          |      7.9     |      2.8     |       0.0       |\n| Qwen2-7B           |        49.9          |         1.1          |      2.1     |      0.0     |       0.0       |\n| **Fine-tuning** |           |                     |              |              |                 |\n| Llama3.1-8B        |        78.3          |        17.8          |     19.3     |      6.1     |       3.8       |\n| Qwen2-7B           |        59.0          |         0.6          |      0.2     |      0.0     |       0.0       |\n\n\n## Contact\n\nIf you have any problems, please contact \n[Jian Xie](mailto:jianx0321@gmail.com),\n[Kai Zhang](mailto:zhang.13253@osu.edu),\n[Yu Su](mailto:su.809@osu.edu)\n\n## Citation Information\n\nIf our paper or related resources prove valuable to your research, we kindly ask for citation. \n\n\u003ca href=\"https://github.com/OSU-NLP-Group/TravelPlanner\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/OSU-NLP-Group/TravelPlanner?style=social\u0026label=TravelPanner\" alt=\"GitHub Stars\"\u003e\u003c/a\u003e\n\n```\n@inproceedings{xie2024travelplanner,\n  title={TravelPlanner: A Benchmark for Real-World Planning with Language Agents},\n  author={Xie, Jian and Zhang, Kai and Chen, Jiangjie and Zhu, Tinghui and Lou, Renze and Tian, Yuandong and Xiao, Yanghua and Su, Yu},\n  booktitle={Forty-first International Conference on Machine Learning},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOSU-NLP-Group%2FTravelPlanner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOSU-NLP-Group%2FTravelPlanner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOSU-NLP-Group%2FTravelPlanner/lists"}