{"id":13624400,"url":"https://github.com/SqueezeAILab/LLMCompiler","last_synced_at":"2025-04-16T00:32:19.953Z","repository":{"id":211635177,"uuid":"728395410","full_name":"SqueezeAILab/LLMCompiler","owner":"SqueezeAILab","description":"[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling","archived":false,"fork":false,"pushed_at":"2024-07-10T04:39:34.000Z","size":384,"stargazers_count":1489,"open_issues_count":4,"forks_count":109,"subscribers_count":24,"default_branch":"main","last_synced_at":"2024-10-29T17:12:19.942Z","etag":null,"topics":["efficient-inference","function-calling","large-language-models","llama","llama2","llm","llm-agent","llm-agents","llm-framework","llms","natural-language-processing","nlp","parallel-function-call","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2312.04511","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SqueezeAILab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-06T21:12:54.000Z","updated_at":"2024-10-28T12:12:36.000Z","dependencies_parsed_at":"2024-02-13T21:28:07.565Z","dependency_job_id":"de270be8-d3c3-4e01-b0bd-5ec93297f11c","html_url":"https://github.com/SqueezeAILab/LLMCompiler","commit_stats":{"total_commits":32,"total_committers":4,"mean_commits":8.0,"dds":0.09375,"last_synced_commit":"cf78be7dacd409b19424beb0f41ef6d61d2b220b"},"previous_names":["squeezeailab/llmcompiler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SqueezeAILab%2FLLMCompiler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SqueezeAILab%2FLLMCompiler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SqueezeAILab%2FLLMCompiler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SqueezeAILab%2FLLMCompiler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SqueezeAILab","download_url":"https://codeload.github.com/SqueezeAILab/LLMCompiler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223094226,"owners_count":17086554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["efficient-inference","function-calling","large-language-models","llama","llama2","llm","llm-agent","llm-agents","llm-framework","llms","natural-language-processing","nlp","parallel-function-call","transformer"],"created_at":"2024-08-01T21:01:42.129Z","updated_at":"2024-11-08T13:30:25.054Z","avatar_url":"https://github.com/SqueezeAILab.png","language":"Python","funding_links":[],"categories":["Python","A01_文本生成_文本对话","AI Agent Frameworks \u0026 SDKs"],"sub_categories":["大语言对话模型及数据","Orchestration Frameworks"],"readme":"# LLMCompiler: An LLM Compiler for Parallel Function Calling [[Paper](https://arxiv.org/abs/2312.04511)]\n\n![Thumbnail](figs/thumbnail.png)\n\n**LLMCompiler** is a framework that enables an _efficient and effective orchestration of parallel function calling_ with LLMs, including both open-source and close-source models, by automatically identifying which tasks can be performed in parallel and which ones are interdependent.\n\n\n**TL;DR:**\nThe reasoning capabilities of LLMs enable them to execute multiple function calls, using user-provided functions to overcome\ntheir inherent limitations (e.g. knowledge cutoffs, poor arithmetic skills, or lack of access to private data).\nWhile multi-function calling allows them to tackle more complex problems, \ncurrent methods often require sequential reasoning and acting for each function which can result\nin high latency, cost, and sometimes inaccurate behavior.\nLLMCompiler addresses this by decomposing problems into multiple tasks \nthat can be executed in parallel, thereby efficiently orchestrating multi-function calling.\nWith LLMCompiler, the user specifies the tools\nalong with optional in-context examples, and **LLMCompiler automatically computes an optimized orchestration for\nthe function calls**.\nLLMCompiler can be used with open-source models such as LLaMA, as well as OpenAI’s GPT models.\nAcross a range of tasks that exhibit different patterns of parallel function calling, LLMCompiler \nconsistently demonstrated **latency speedup, cost saving, and accuracy improvement**.\nFor more details, please check out our [paper](https://arxiv.org/abs/2312.04511).\n\n## News\n* 📌 [7/9] Friendli endpoints are supported for popular open-source models.\n* 🦜 [2/13] LLMCompiler is available within the [LangGraph](https://github.com/langchain-ai/langgraph/blob/main/examples/llm-compiler/LLMCompiler.ipynb) framework of [LangChain](https://github.com/langchain-ai).\n* 📌 [1/17] Running custom models using vLLM supported\n* 🦙 [12/29] LLMCompiler is available on [LlamaIndex](https://llamahub.ai/l/llama_packs-agents-llm_compiler?from=llama_packs)\n\n---\n## Installation\n\n1. Create a conda environment and install the dependencies\n```\nconda create --name llmcompiler python=3.10 -y\nconda activate llmcompiler\n```\n\n2. Clone and install the dependencies\n```\ngit clone https://github.com/SqueezeAILab/LLMCompiler\ncd LLMCompiler\npip install -r requirements.txt\n```\n\n---\n## Basic Runs\nTo reproduce the evaluation results in the paper, run the following command.\nYou need to first register your OpenAI API key to the environment: `export OPENAI_API_KEY=\"sk-xxx\"`\n```\npython run_llm_compiler.py --benchmark {benchmark-name} --store {store-path} [--logging] [--stream]\n```\n\nTo run a custom models served using the vLLM framework, run the following command.\nDetailed instructions for serving custom models with the vLLM framework can be found in the [vLLM documentation](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server).\nNote that the pre-defined prompts in the default configuration files are tailored for (non-chat) LLaMA-2 70B and might need adjustments for different models.\n```\npython run_llm_compiler.py --model_type vllm --benchmark {benchmark-name} --store {store-path} --model_name {vllm-model-name} --vllm_port {vllm-port} [--logging]\n```\n\n* `--benchmark`: Benchmark name. Use `hotpotqa`, `movie`, and `parallelqa` to evaluate LLMCompiler on the HotpotQA, Movie Recommendation, and ParallelQA benchmarks, respectively.\n* `--store`: Path to save the result. Question, true label, prediction, and latency per example will be stored in a JSON format.\n* `--logging`: (Optional) Enables logging. Not yet supported for vLLM.\n* `--do_benchmark`: (Optional) Do additional benchmarking on detailed run-time statistics.\n* `--stream`: (Optional, Recommended) Enables streaming. It improves latency by streaming out tasks from the Planner to the Task Fetching Unit and Executor immediately after their generation, rather than blocking the Executor until all the tasks are generated from the Planner.\n* `--react`: (Optional) Use ReAct instead of LLMCompiler for baseline evaluation.\n\n### Azure Endpoint\nYou can optionally use your Azure endpoint instead of OpenAI endpoint with `--model_type azure`. In this case, you need to provide the associated Azure configuration as the following fields in your environment: `AZURE_ENDPOINT`, `AZURE_OPENAI_API_VERSION`, `AZURE_DEPLOYMENT_NAME`, and `AZURE_OPENAI_API_KEY`.\n\n### Friendli Endpoint\nYou can use [Friendli](https://friendli.ai/) endpoint with `--model_type friendli`. In this case, you need to provide Friendli API key in your environment: `FRIENDLI_TOKEN`. Additionally, you need to install Friendli Client:\n```\npip install friendli-client\n```\n\nAfter the run is over, you can get the summary of the results by running the following command:\n```\npython evaluate_results.py --file {store-path}\n```\n\n---\n## Adding Your Custom Benchmark\nTo use LLMCompiler on your custom benchmarks or use cases, \nyou only need to provide the functions and their descriptions, as well as example prompts.\nPlease refer to `configs/hotpotqa`, `configs/movie`, and `configs/parallelqa` as examples. \n\n* `gpt_prompts.py`: Defines in-context example prompts\n* `tools.py`: Defines functions (i.e. tools) to use, and their descriptions (i.e. instructions and arguments) \n\n\n---\n## Roadmap\nWe are planning to update the following features soon:\n* Tree-of-Thoughts evaluation we used in the paper\n\n---\n## Citation\n\nLLMCompiler has been developed as part of the following paper. We appreciate it if you would please cite the following paper if you found the library useful for your work:\n\n```\n@article{kim2023llmcompiler,\n  title={An LLM Compiler for Parallel Function Calling},\n  author={Kim, Sehoon and Moon, Suhong and Tabrizi, Ryan and Lee, Nicholas and Mahoney, Michael and Keutzer, Kurt and Gholami, Amir},\n  journal={arXiv},\n  year={2023}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSqueezeAILab%2FLLMCompiler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSqueezeAILab%2FLLMCompiler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSqueezeAILab%2FLLMCompiler/lists"}