{"id":25945514,"url":"https://github.com/jplane/llm-function-call-eval","last_synced_at":"2025-03-04T09:17:52.581Z","repository":{"id":279150519,"uuid":"937867100","full_name":"jplane/llm-function-call-eval","owner":"jplane","description":"Demonstrates a workflow for LLM function calling evaluation. Uses GitHub Copilot to generate synthetic eval data and Azure AI Foundry for handling results.","archived":false,"fork":false,"pushed_at":"2025-02-24T03:22:04.000Z","size":537,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-24T04:25:47.713Z","etag":null,"topics":["azure-ai-foundry","evaluation-framework","function-calling","llm","synthetic-dataset-generation","tool-use","vscode"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jplane.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-24T03:18:21.000Z","updated_at":"2025-02-24T03:38:01.000Z","dependencies_parsed_at":"2025-02-24T04:26:00.711Z","dependency_job_id":"5affc08b-61a5-4a37-9042-5d00ecb2078a","html_url":"https://github.com/jplane/llm-function-call-eval","commit_stats":null,"previous_names":["jplane/llm-function-call-eval"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jplane%2Fllm-function-call-eval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jplane%2Fllm-function-call-eval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jplane%2Fllm-function-call-eval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jplane%2Fllm-function-call-eval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jplane","download_url":"https://codeload.github.com/jplane/llm-function-call-eval/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241818899,"owners_count":20025212,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure-ai-foundry","evaluation-framework","function-calling","llm","synthetic-dataset-generation","tool-use","vscode"],"created_at":"2025-03-04T09:17:52.064Z","updated_at":"2025-03-04T09:17:52.571Z","avatar_url":"https://github.com/jplane.png","language":"Jupyter Notebook","readme":"# LLM Function Call Performance Evaluation With Synthetic Eval Data and Azure AI Foundary\n\nThis repo demonstrates the following evaluation workflow:\n\n- From a small, representative set of [natural language intents](./intents.txt), generate an [OpenAPI spec](https://spec.openapis.org/oas/latest.html) to describe the desired behaviors\n\n- Using the spec and intents, generate more intent examples to cover a wider range of scenarios\n\n- From the OpenAPI spec, generate function call metadata that describes the necessary function call(s) and arguments needed to fulfill each intent\n\n- Deploy [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio) and supporting services in an Azure subscription\n\n- Use the [Azure AI Foundry Evaluation SDK](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk) to eval [function calling](https://platform.openai.com/docs/guides/function-calling) (sometimes called \"tool use\") performance of various [LLM + config + prompt] combinations\n\nThis evaluation workflow produces the core artifacts for reliable LLM function calling in a production solution:\n\n- Choice of appropriate model\n- Optimal model [configuration](https://www.promptingguide.ai/introduction/settings) (temperature, top_p, etc.)\n- An optimal system prompt\n- An OpenAPI spec that fully describes the desired behaviors and maximizes LLM performance\n- A set of intents to document targeted capabilities\n\n## Prerequisites\n\n- An Azure subscription\n- [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)\n- [Python 3.8+](https://www.python.org/downloads/)\n- [Visual Studio Code](https://code.visualstudio.com/)\n- [GitHub Copilot extension](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot)\n\n## Walkthrough\n\n### 1. Generate a spec\n\nOpen GitHub Copilot in [edit mode](https://code.visualstudio.com/docs/copilot/copilot-edits#_use-edit-mode). Add [intents.txt](./intents.txt) to the working set and ask Copilot to generate an OpenAPI spec:\n\n```\nUse the existing intents as a guide and generate an OpenAPI 3.0 spec that implements the behaviors. Write the spec to 'api.json'.\n```\n\nThe resulting spec will be a good start, but likely require some fine-tuning (edit manually, or ask Copilot to help!). For example, you might want to parameterize things like room, light or thermostat state, etc.:\n\nAdjust these prompts below based on the exact contents of `api.json`... and don't forget to add it to the Copilot working set.\n\n(_if you prefer, you can see final versions of these artifacts in [./resulting_artifacts](./resulting_artifacts)_)\n\n```\nupdate the spec to parameterize choice of room, for all endpoints where room is specified\n```\n\n```\nin the spec, parameterize /lights/{room} to allow turning lights on or off\nadd on/off to the endpoint path and convert it to a PUT\n```\n\n```\nmodify /thermostat endpoint to add temp and unit to the endpoint path. convert from post to put.\n```\n\n```\nconvert /lights/{room} to /lights/{room}/level/{level}. change from post to put\n```\n\n```\nParameterize /alarm-system with on/off support... add it to the endpoint path. Convert from POST to PUT\n```\n\n```\nUpdate /garage-door to parameterize up/down state and integer id of door. Use path parameters. Change from POST to PUT.\n```\n\n### 2. Add more intents\n\nOnce you have the spec in a good place, ask Copilot to add more intents to [intents.txt](./intents.txt) to cover a wider range of scenarios:\n\n```\nAdd 10 more intents to intents.txt. The new intents should match the spec. Don't repeat intents that already exist. Be creative with rooms and other argument values.\n```\n\n### 3. Create eval dataset\n\nNow we're ready to create our eval dataset. For Foundry, this is a single jsonl file, with each line containing a) an intent and b) the expected function call metadata for that intent, in JSON format.\n\nTry this:\n\n```\nCreate a file 'dataset.jsonl'.\n\nFor each line in 'intents.txt' add JSON to 'dataset.jsonl', the added content should conform to 'dataset_template.json' and contain the original intent text and the endpoint details (names, argument values, etc.) needed to fulfill the intent against the spec in 'api.json'.\n```\n\n### 4. Deploy Azure AI Foundry\n\nDeploy a new [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/architecture) hub and project. This will require [Owner or Contributor role assignment](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/rbac-ai-studio) in your subscription.\n\n### 5. Deploy a model from Foundry Model Catalog\n\nFollow the guidance found [here](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/deployments-overview).\n\n![](./assets/foundry_model_catalog.png)\n\n### 6. Setup local environment\n\nFirst, create a new Python virtual environment:\n\n```python\npython -m venv .venv\nsource .venv/bin/activate\n```\n\nThen install the requirements:\n\n```python\npip install -r requirements.txt\n```\n\nNext, login to the Azure CLI. These credentials will be used to access the Azure AI Foundry hub and project:\n\n```bash\naz login\n```\n\nFinally, copy [.env.template](./.env.template) to `.env` and fill in the required values:\n\n```\nFOUNDRY_CONNECTION_STRING=\"\"\nMODEL=\"\"\nSWAGGER_PATH=\"./api.json\"\nTEMPERATURE=\"0.2\"\nTOP_P=\"0.1\"\nAZURE_SUB_ID=\"\"\nAZURE_RESOURCE_GROUP=\"\"\nAZURE_AI_PROJECT_NAME=\"\"\n```\n\n### 7. Run eval in notebook\n\nNow its time to do an eval run. Open [eval.ipynb](./eval.ipynb) in VS Code and run all cells.\n\nThe notebook configures an AI Foundry eval run using [function_call_generator.py](./function_call_generator.py) to generate function call metadata for each intent in `dataset.jsonl`... these 'actual' results are compared to the 'expected' results using [function_call_evaluator.py](./function_call_evaluator.py).\n\n### 8. Review results\n\nResults are logged to the Azure AI Foundry project. You can view the results in the [Foundry portal](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-results).\n\n![](./assets/foundry_eval_results.png)","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjplane%2Fllm-function-call-eval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjplane%2Fllm-function-call-eval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjplane%2Fllm-function-call-eval/lists"}