{"id":24782617,"url":"https://github.com/oaklight/mango","last_synced_at":"2025-12-14T06:07:18.146Z","repository":{"id":231983791,"uuid":"631306149","full_name":"Oaklight/mango","owner":"Oaklight","description":"repo for paper: MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models","archived":false,"fork":false,"pushed_at":"2024-06-03T00:23:48.000Z","size":4022,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"camera-ready","last_synced_at":"2024-06-05T09:13:12.689Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://mango.ttic.edu","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Oaklight.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-22T15:48:32.000Z","updated_at":"2024-06-05T09:13:21.463Z","dependencies_parsed_at":"2024-06-05T09:13:18.964Z","dependency_job_id":"d94f271c-4774-44e9-8282-092f161eeb13","html_url":"https://github.com/Oaklight/mango","commit_stats":null,"previous_names":["oaklight/mango","oaklight/graphutils"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Oaklight%2Fmango","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Oaklight%2Fmango/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Oaklight%2Fmango/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Oaklight%2Fmango/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Oaklight","download_url":"https://codeload.github.com/Oaklight/mango/tar.gz/refs/heads/camera-ready","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245215778,"owners_count":20579043,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-29T11:18:19.655Z","updated_at":"2025-12-14T06:07:13.082Z","avatar_url":"https://github.com/Oaklight.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MANGO\n\nRepository for the paper: *[MANGO: A Benchmark for Evaluating \u003cu\u003eMa\u003c/u\u003epping and \u003cu\u003eN\u003c/u\u003eavi\u003cu\u003eg\u003c/u\u003eati\u003cu\u003eo\u003c/u\u003en Abilities of Large Language Models](https://arxiv.org/abs/2403.19913)*\n\nMore details can be found on our [official website](https://mango.ttic.edu).\n\nFor questions or issues, please open an [issue on GitHub](https://github.com/Oaklight/mango/issues).\n\n## Abstract\n\nLarge language models (LLMs), such as ChatGPT and GPT-4, have shown remarkable performance in various natural language processing tasks. In this paper, we introduce **MANGO**, a benchmark to assess the ability of LLMs to perform text-based mapping and navigation.\n\nMANGO comprises 53 mazes from a suite of text-based games. Each maze is paired with a walkthrough that covers key locations but not all paths. The benchmark involves question-answering tasks where the LLM reads the walkthrough and answers hundreds of mapping and navigation questions, such as:\n\n- *\"How should you go to the Attic from West of House?\"*\n- *\"Where would you be if you go north and east from Cellar?\"*\n\nWhile these questions are simple for humans, even the state-of-the-art model GPT-4 struggles with them. Our findings indicate that strong mapping and navigation capabilities are crucial for LLMs to perform downstream tasks, such as playing text-based games.\n\nWe host the **leaderboard**, **data**, **code**, and **evaluation** tools for MANGO [here](https://mango.ttic.edu), facilitating future research in this area.\n\n## Setup\n\nTo set up the environment for MANGO, follow these steps:\n\n```bash\ngit clone https://github.com/Oaklight/mango.git\ncd mango\n\nconda create -n mango python=3.11 -y\nconda activate mango\n\n# For evaluation\npip install -e .\n\n# For evaluation and inference\npip install -e .[infer]\n```\n\n## Dataset\n\nOur data is hosted on [Hugging Face](https://huggingface.co/mango-ttic). More information is available [here](https://oaklight.github.io/mgwb/data/).\n\nTo download the dataset for the first 70 moves of each game:\n\n```bash\ncd mango\nwget https://huggingface.co/datasets/mango-ttic/data/resolve/main/data-70steps.tar.zst\nzstd -d -c data-70steps.tar.zst | tar -xvf -\nrm data-70steps.tar.zst\nmv data-70steps data\n```\n\nAlternatively, the dataset is available in the `data` folder within this repository.\n\n## Inference\n\nThe inference code is located in the `mango/inference/` directory. You can find additional details in the README file in that folder.\n\nTo query the `claude-instant-1` model for inference:\n\n```bash\nexport ANTHROPIC_API_KEY=\u003cYOUR KEY\u003e\n\npython mango/inference/main.py --exp_tag debug --data_folder ./data --save_folder ./results --game_name '905' --task_type 'route_finding' --model_name 'claude-instant-1'\n```\n\n## Evaluation\nThe Evaluation script currently supports 70-step data and full data except for the game `curse` (it would be a curse on your compute).\n\nEvaluation can be performed using the script located at `mango/evaluation/scripts/evaluate.py`.\n\nFor the required output format for destination-finding evaluation, refer to the following sample:\n\n```\n/mango/examples/llm_output_example/claude-instant-1_desti_finding_debug/905/result_sample_id_1f51a779e76851bcc0bd9a9ce26ab9145349ea63f0810d7e5357b46b45c01f82.json\n```\n\nFor route-finding evaluation, refer to:\n\n```\n/mango/examples/llm_output_example/claude-instant-1_route_finding_debug/905/result_sample_id_4ac913314591fb251c6b13678324b508e5cd383638938482322bd02be1718de0.json\n```\n\nMake sure the `response` field is a list of dictionaries with the required keys, such as:\n\n```json\n[{\"location_before\": \"driveway\", \"action\": \"north\", \"location_after\": \"living room\"}, ...]\n```\n\nYou can customize the key names in `mango/evaluation/config.py`. For example:\n\n```python\n\"location_before\": \"location_before\" --\u003e \"location_before\": \"prev_location\"\n```\n\n### Evaluation Examples\nCheck Arguments:\n\n```bash\nmango-eval --help\n```\n\nFor destination-finding:\n\n```bash\nmango-eval --mode df --rst-dir ./examples/llm_output_example/claude-instant-1_desti_finding_debug --map-dir ./data\n```\n\nFor route-finding:\n\n```bash\nmango-eval --mode rf --rst-dir ./examples/llm_output_example/claude-instant-1_route_finding_debug --map-dir ./data\n```\n\n## Citation\n\nIf you use MANGO in your research, please cite our paper:\n\n```bibtex\n@misc{ding2024mango,\n      title={MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models}, \n      author={Peng Ding and Jiading Fang and Peng Li and Kangrui Wang and Xiaochen Zhou and Mo Yu and Jing Li and Matthew R. Walter and Hongyuan Mei},\n      year={2024},\n      eprint={2403.19913},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foaklight%2Fmango","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foaklight%2Fmango","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foaklight%2Fmango/lists"}