Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ts-azure-services/stress-testing-oai-endpoints

A repo to test scaling OAI endpoints
https://github.com/ts-azure-services/stress-testing-oai-endpoints

apim azure-openai bash load-balancing

Last synced: about 1 month ago
JSON representation

A repo to test scaling OAI endpoints

Host: GitHub
URL: https://github.com/ts-azure-services/stress-testing-oai-endpoints
Owner: ts-azure-services
License: mit
Created: 2024-11-24T01:47:37.000Z (2 months ago)
Default Branch: main
Last Pushed: 2024-12-01T08:09:25.000Z (about 2 months ago)
Last Synced: 2024-12-01T08:35:45.686Z (about 2 months ago)
Topics: apim, azure-openai, bash, load-balancing
Language: Bicep
Homepage:
Size: 15.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

This repo contains a number of workflows and scripts to stress test Azure OpenAI endpoints, both individually and through an APIM
instance. Most of the workflow and steps are captured in the `Makefile` and should be followed sequentially. This leverages
`locust` as a testing framework to simulate concurrent users, but also leverages batch approaches to hit the endpoints with load.
All this was tested on a Mac, though would work equally on any Unix-based machine. This also relies on several command-line tools like
`parallel`, `jq`, `grep` and `time`. For reference, most of these tools are available natively in the Azure Cloud shell providing an easy
deployment option (except for the `time` utility).

### Random Notes
- For the Azure OpenAI endpoints, this has leveraged the Global Standard deployment for gpt-4o-mini. This can be customized as needed.
- This does not include setup of Application Insights or Log Analytics as part of the workflow.
- For most of the "batch" workflows, there is no logic to implement a backoff period. Limits should be understood considering the
capacity of the endpoint and/or the logic in place (e.g. with APIM) to handle throttling.
- While on a Mac, to monitor CPU and memory usage, consider using `btop` (available through Homebrew). To determine the number of CPUs, run: `sysctl -n hw.ncpu`.
- Future build:
- Inclusion of text embedding endpoints to test Azure Search workflows.

### References
- For the APIM tooling, leveraged this great [repo](https://github.com/Azure-Samples/AI-Gateway) to support setup and the custom APIM policy.