Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ts-azure-services/stress-testing-oai-endpoints
A repo to test scaling OAI endpoints
https://github.com/ts-azure-services/stress-testing-oai-endpoints
apim azure-openai bash load-balancing
Last synced: about 1 month ago
JSON representation
A repo to test scaling OAI endpoints
- Host: GitHub
- URL: https://github.com/ts-azure-services/stress-testing-oai-endpoints
- Owner: ts-azure-services
- License: mit
- Created: 2024-11-24T01:47:37.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-12-01T08:09:25.000Z (about 2 months ago)
- Last Synced: 2024-12-01T08:35:45.686Z (about 2 months ago)
- Topics: apim, azure-openai, bash, load-balancing
- Language: Bicep
- Homepage:
- Size: 15.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
This repo contains a number of workflows and scripts to stress test Azure OpenAI endpoints, both individually and through an APIM
instance. Most of the workflow and steps are captured in the `Makefile` and should be followed sequentially. This leverages
`locust` as a testing framework to simulate concurrent users, but also leverages batch approaches to hit the endpoints with load.
All this was tested on a Mac, though would work equally on any Unix-based machine. This also relies on several command-line tools like
`parallel`, `jq`, `grep` and `time`. For reference, most of these tools are available natively in the Azure Cloud shell providing an easy
deployment option (except for the `time` utility).### Random Notes
- For the Azure OpenAI endpoints, this has leveraged the Global Standard deployment for gpt-4o-mini. This can be customized as needed.
- This does not include setup of Application Insights or Log Analytics as part of the workflow.
- For most of the "batch" workflows, there is no logic to implement a backoff period. Limits should be understood considering the
capacity of the endpoint and/or the logic in place (e.g. with APIM) to handle throttling.
- While on a Mac, to monitor CPU and memory usage, consider using `btop` (available through Homebrew). To determine the number of CPUs, run: `sysctl -n hw.ncpu`.
- Future build:
- Inclusion of text embedding endpoints to test Azure Search workflows.### References
- For the APIM tooling, leveraged this great [repo](https://github.com/Azure-Samples/AI-Gateway) to support setup and the custom APIM policy.