{"id":13754129,"url":"https://github.com/Abbey4799/CELLO","last_synced_at":"2025-05-09T22:31:10.431Z","repository":{"id":195287913,"uuid":"692435939","full_name":"Abbey4799/CELLO","owner":"Abbey4799","description":"Code and data for the paper \"Can Large Language Models Understand Real-World Complex Instructions?\"(AAAI2024)","archived":false,"fork":false,"pushed_at":"2024-04-19T02:57:49.000Z","size":6599,"stargazers_count":37,"open_issues_count":1,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-08-03T09:06:51.540Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Abbey4799.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-16T13:23:31.000Z","updated_at":"2024-07-30T06:55:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"7566d136-3d6b-4e0f-a054-6946f4d89bcb","html_url":"https://github.com/Abbey4799/CELLO","commit_stats":null,"previous_names":["abbey4799/cello"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abbey4799%2FCELLO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abbey4799%2FCELLO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abbey4799%2FCELLO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abbey4799%2FCELLO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Abbey4799","download_url":"https://codeload.github.com/Abbey4799/CELLO/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224884612,"owners_count":17386121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:41.323Z","updated_at":"2025-05-09T22:31:10.399Z","avatar_url":"https://github.com/Abbey4799.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话","Uncategorized"],"sub_categories":["大语言对话模型及数据","Uncategorized"],"readme":"# CELLO\n\nCELLO is a benchmark for evaluating the**C**ompl**E**x instruction understanding ability of **L**arge **L**anguage M**O**dels systematically (AAAI 2024).\n\n- We design **eight features** for complex instructions and construct **a comprehensive evaluation dataset** from real-world scenarios.\n- We establish **four criteria** and develop **corresponding metrics**, as current ones are inadequate, biased or too strict and coarse-grained.\n- We compare the performance of representative **Chinese-oriented and English-oriented models** in following complex instructions through extensive experiments.\n\n\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"framework.png\" width=\"900\"/\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\n## Install Dependencies\n\n```\nconda create -n cello python=3.10.9\nconda activate cello\nconda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia\npip install -r requirements.txt\n```\n\n## Evaluate Models\n\nYou can evaluate any desired model via the following scirpt `eval.sh`:\n\n```\ncd CELLO/\nCUDA_VISIBLE_DEVICES=0 python code/eval.py --model_name chatglm --save_name chatglm\n```\n\nAll the models are implemented in the folder [code/evaluators](code/evaluators/).\nAll the model results are in the folder [results/](results/).\n\n## Scoring System\n\nThe metrics for our designed four criteria can be calculated using the following script  `score.sh`:\n\n```\ncd CELLO/\npython code/score.py\n```\n\nAll the scorers are implemented in the folder [code/scorers](code/scorers/).\nAll the scoring results are in the folder [scores/](scores/).\n\n## Data\n\nThe collected data can be found in the [data/](data/). All samples have been anonymized.\n\n## Citation\n\n```\n@inproceedings{he2024can,\n  title={Can Large Language Models Understand Real-World Complex Instructions?},\n  author={He, Qianyu and Zeng, Jie and Huang, Wenhao and Chen, Lina and Xiao, Jin and He, Qianxi and Zhou, Xunzhe and Liang, Jiaqing and Xiao, Yanghua},\n  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},\n  volume={38},\n  number={16},\n  pages={18188--18196},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAbbey4799%2FCELLO","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAbbey4799%2FCELLO","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAbbey4799%2FCELLO/lists"}