{"id":20298081,"url":"https://github.com/LAION-AI/AIW","last_synced_at":"2025-05-07T20:34:10.895Z","repository":{"id":242576955,"uuid":"804807048","full_name":"LAION-AI/AIW","owner":"LAION-AI","description":"Alice in Wonderland code base for experiments and raw experiments data","archived":false,"fork":false,"pushed_at":"2025-02-13T06:46:34.000Z","size":18237,"stargazers_count":131,"open_issues_count":3,"forks_count":10,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-07T18:14:04.440Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LAION-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-05-23T10:01:09.000Z","updated_at":"2025-04-23T19:19:15.000Z","dependencies_parsed_at":"2025-02-13T07:38:15.237Z","dependency_job_id":null,"html_url":"https://github.com/LAION-AI/AIW","commit_stats":null,"previous_names":["laion-ai/aiw"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LAION-AI%2FAIW","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LAION-AI%2FAIW/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LAION-AI%2FAIW/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LAION-AI%2FAIW/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LAION-AI","download_url":"https://codeload.github.com/LAION-AI/AIW/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252953716,"owners_count":21830890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T16:02:06.517Z","updated_at":"2025-05-07T20:34:10.878Z","avatar_url":"https://github.com/LAION-AI.png","language":"Python","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"\u003ch1 align=\"center\"\u003e\n        🎩🐇 Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models\n    \u003c/h1\u003e\n \u003cp align=\"center\"\u003eAlice in Wonderland code base for experiments and raw experiments data\u003c/p\u003e\n\u003ch4 align=\"center\"\u003e\u003ca href=\"https://marianna13.github.io/aiw/\" target=\"_blank\"\u003eHomepage\u003c/a\u003e | \u003ca href=\"https://arxiv.org/pdf/2406.02061\" target=\"_blank\"\u003e Paper\u003c/a\u003e | \u003ca href=\"https://arxiv.org/abs/2406.02061\"target=\"_blank\"\u003eArXiv\u003c/a\u003e | \u003ca href=\"https://huggingface.co/spaces/marianna13/AIW-responses\"target=\"_blank\"\u003eDataset explorer\u003c/a\u003e\u003c/h4\u003e\n\n\n## Usage\n\nInstall requirements:\n`pip install requirements.txt`\n\n### Collect experiments data\n\n**Collect using [LiteLLM](https://github.com/BerriAI/litellm):**\nRefer to the [LiteLLM Docs](https://docs.litellm.ai/docs/) on how to setup your account and API keys.\n\nWorkflow init:\n\n```bash\nexport SHARED_MINICONDA=/path/to/miniconda_install\nexport CONDA_ENV=/path/to/conda_env\nexport AIW_REPO_PATH=/path/to_local_cloned_AIW_repo\n\nsource ${SHARED_MINICONDA}/bin/activate ${CONDA_ENV}\nexport PYTHONPATH=$PYTHONPATH:$AIW_REPO_PATH\n\n# export your API keys\nexport TOGETHERAI_API_KEY=\nexport OPENAI_API_KEY=\nexport ANTHROPIC_API_KEY=\nexport MISTRAL_API_KEY=\nexport GEMINI_API_KEY=\nexport COHERE_API_KEY=\n\ncd $AIW_REPO_PATH\n\n```\n\n\n### Execution example for a single selected prompt ID:\n\n```bash\n\n# LiteLLM based experiments; 30 trials for STANDARD prompt type, AIW Variation 1 (Prompt ID 55 in prompts.json)\npython examples/example_litellm.py --prompt_id=55 --n_trials=30 --n_sessions=1 --prompts_json=lmsys_tools/prompts.json --models_json=lmsys_tools/models_plot_set.json --exp_name=model_set_STANDARD_run-1\n\n# 30 trials for THINKING prompt type, AIW Variation 2 (Prompt ID 58 in prompts.json)\npython examples/example_litellm.py --prompt_id=58 --n_trials=30 --n_sessions=1 --prompts_json=lmsys_tools/prompts.json --models_json=lmsys_tools/models_plot_set.json --exp_name=model_set_THINKING_run-1\n\n# 30 trials for RESTRICTED prompt type, AIW Variation 2 (Prompt ID 58 in prompts.json)\npython examples/example_litellm.py --prompt_id=53 --n_trials=30 --n_sessions=1 --prompts_json=lmsys_tools/prompts.json --models_json=lmsys_tools/models_plot_set.json --exp_name=model_set_RESTRICTED_run-1\n\n# Same for LMSys based experiments\npython examples/example_lmsys.py --prompt_id=53 --n_trials=30 --n_sessions=1 --prompts_json=lmsys_tools/prompts.json --models_json=lmsys_tools/models_plot_set.json --exp_name=model_set_RESTRICTED_run-1\n\n\n```\n\nHint: n_sessions is now purely a dummy, and can be set to 1; the only thing that matters is number of trials\n\n**Execution example for a whole range of ID:**\n\nHint: rename script file names inside the script files, they have to be adapted, as those are using local own naming)\n\n\n```bash\n\n# Execute experiments over LiteLLM: 30 trials, start with run counter set to 1, perform 2 rounds; for AIW Variations 1-2, prompt types STANDARD, THINKING, RESTRICTED (as defined in prompt ID)\nsource execute_litellm_data_gathering.sh 30 1 2 models_plot_set.json \"55 56 57 58 53 54\" \"model_set_STANDARD model_set_STANDARD model_set_THINKING model_set_THINKING model_set_RESTRICTED model_set_RESTRICTED\"\n\n# Do the same for AIW Variation 3, prompt types STANDARD, THINKING, RESTRICTED (as defined in prompt ID)\nsource execute_litellm_data_gathering.sh 30 1 2 models_plot_set.json \"63 64 65\" \"model_set_AIW-VAR-3_STANDARD model_set_AIW-VAR-3_THINKING model_set_AIW-VAR-3_RESTRICTED\"\n\n# Execute experiments over lmsys: 7 trials, start with run counter set to 1, perform 2 rounds\nsource execute_lmsys_data_gathering.sh 7 1 2 models_plot_set.json \"55 56 57 58 53 54\" \"model_set_STANDARD model_set_STANDARD model_set_THINKING model_set_THINKING model_set_RESTRICTED model_set_RESTRICTED\"\n\nsource execute_lmsys_data_gathering.sh 7 1 2 models_plot_set.json \"63 64 65\" \"model_set_EASY_STANDARD model_set_EASY_THINKING model_set_EASY_RESTRICTED\"\n```\n\n- Usage for the script call:\n\n```bash\nsource execute_litellm_data_gathering.sh NUM_TRIALS RUN_ID_START NUM_ROUNDS models_plot_set.json \"PROMPT_ID_1 PROMPT_ID_2 PROMPT_ID_3\" \"EXP_NAME_1 EXP_NAME_2 EXP_NAME_3\"\n\nsource execute_lmsys_data_gathering.sh NUM_TRIALS RUN_ID_START NUM_ROUNDS models_plot_set.json \"PROMPT_ID_1 PROMPT_ID_2 PROMPT_ID_3\" \"EXP_NAME_1 EXP_NAME_2 EXP_NAME_3\"\n```\n\nwhere \n\n- NUM_TRIALS: trials to conduct in each round\n- START_RUN_ID: starting from run id (will apply to file name run-ID)\n- NUM_ROUNDS: how many rounds to go, each round will have NUM_TRIALS trials and own incremental run id\n- \"PROMPT_ID_X ...\" : list of IDs pointing to corresponding entries defined in prompt.json file\n- \"EXP_NAME_X ...\" : list of experiment names that can be chosed freely for each corresponding prompt ID to be appended to the filename with saved data\n\n- Example for a collecting data for a full plot (Fig. 1) in the paper\n\n```bash\n\n# Reading models from models_plot_set_reference.json and prompt IDs from prompt.json; full experiment set over all main AIW variations 1-4 and prompt types STANDARD, THINKING, RESTRICTED; doing 30 trials starting with run counter 1, for 2 rounds, aiming ot 60 trials in total per each model and prompt ID (that is a given combination of a prompt type and AIW variation)\nsource execute_litellm_data_gathering.sh 30 1 2 models_plot_set_reference.json \"55 56 57 58 53 54 63 64 65 69 70 71\" \"model_set_reference_AIW-VAR-1_STANDARD model_set_reference_AIW-VAR-2_STANDARD model_set_reference_AIW-VAR-1_THINKING model_set_reference_AIW-VAR-2_THINKING model_set_reference_AIW-VAR-1_RESTRICTED model_set_reference_AIW-VAR-2_RESTRICTED model_set_reference_AIW-VAR-3_STANDARD model_set_reference_AIW-VAR-3_THINKING model_set_reference_AIW-VAR-3_RESTRICTED model_set_reference_AIW-VAR-4_STANDARD model_set_reference_AIW-VAR-4_THINKING model_set_reference_AIW-VAR-4_RESTRICTED\"\n\n```\n\n\nRefer to [this bash script](scripts/execute_litellm_data_gathering.sh) to see how to use litellm to gather model responses.\n\n**Collect using [TogetherAI](https://www.together.ai/):**\n\nRefer to the [TogetherAI Docs](https://docs.together.ai/docs/quickstart) on how to setup your account and API keys.\n\nRefer to [this Python script](data_collection/examples/example_together.py) to see how to use togetherAI to gather model responses.\n\n**Collect by scraping [LMSYS Chatbot Arena](https://chat.lmsys.org/):**\n\n*Note* This method is not recommended since it's limited for purpose of automated model evaluation. The platform is gated by cloudflare.\n\nRefer to [this bash script](scripts/execute_lmsys_data_gathering.sh) to see how to use litellm to gather model responses.\n\n\n## Plot the data\n\nRun script to generate plots from the paper (by default plots will be saved in the working directory):\n`bash scripts/plot.sh`\n\n## Acknowledgments\n\nWe would like to express gratitude to all the people who are working on making code, models and data publicly available, advancing community based research and making research more reproducible. Specifically, we would like to thank all the members of the [LAION Discord server](https://discord.gg/BZqhreFazY) community and [Open-Ψ (Open-Sci) Collective](https://discord.gg/GsKh4mBVcv) for providing fruitful ground for scientific exchange and open-source development.\n\nMarianna Nezhurina acknowledges funding by the Federal Ministry of Education and Research of Germany under grant no. 01IS22094B WestAI - AI Service Center West.\n\nLucia Cipolina-Kun acknowledges the Helmholtz Information \u0026 Data Science Academy (HIDA) for providing financial support enabling a short-term research stay at Juelich Supercomputing Center (JSC), Research Center Juelich (FZJ) to conduct research on foundation models.\n\n## Citation\nIf you like this work, please cite:\n\n```\n@article{nezhurina2024alice,\n        title={Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models},\n        author={Marianna Nezhurina and Lucia Cipolina-Kun and Mehdi Cherti and Jenia Jitsev},\n        year={2024},\n        journal={arXiv preprint arXiv:2406.02061},\n        eprint={2406.02061},\n        archivePrefix={arXiv},\n        primaryClass={cs.LG}\n}\n```\n\nLicense\n=======\n    Copyright 2024 Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, Jenia Jitsev\n\n    Licensed under the Apache License, Version 2.0 (the \"License\");\n    you may not use this file except in compliance with the License.\n    You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n    Unless required by applicable law or agreed to in writing, software\n    distributed under the License is distributed on an \"AS IS\" BASIS,\n    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n    See the License for the specific language governing permissions and\n    limitations under the License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLAION-AI%2FAIW","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLAION-AI%2FAIW","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLAION-AI%2FAIW/lists"}