https://github.com/aleph-alpha/locust-sse
Locust plugin for SSE (useful for loadtesting LLMs)
https://github.com/aleph-alpha/locust-sse
llm llm-loadtesting loadtesting locust locust-plugin plugin sse sse-loadtesting
Last synced: 19 days ago
JSON representation
Locust plugin for SSE (useful for loadtesting LLMs)
- Host: GitHub
- URL: https://github.com/aleph-alpha/locust-sse
- Owner: Aleph-Alpha
- License: mit
- Created: 2025-12-09T11:33:23.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-12-12T09:34:02.000Z (4 months ago)
- Last Synced: 2025-12-22T09:38:45.821Z (3 months ago)
- Topics: llm, llm-loadtesting, loadtesting, locust, locust-plugin, plugin, sse, sse-loadtesting
- Language: Python
- Homepage:
- Size: 154 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Locust SSE User
A Locust plugin for testing Server-Sent Events (SSE) endpoints, specifically designed for LLM streaming response benchmarking.
## Installation
You can install this package using `uv` (recommended) or `pip`.
### Using uv
```bash
uv add locust-sse
```
### Using pip
```bash
pip install locust-sse
```
## Usage
Inherit from `SSEUser` in your `locustfile.py` and use the `handle_sse_request` method to make SSE requests.
```python
from locust import task
from locust_sse import SSEUser
class MyLLMUser(SSEUser):
# Set the host for the user
host = "http://localhost:8080"
@task
def chat(self):
# Example payload for a chat completion endpoint
payload = {
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Tell me a joke."}
],
"stream": True
}
# Make the SSE request
self.handle_sse_request(
url="/chat/completions",
params={"json": payload},
prompt="Tell me a joke.",
request_name="chat_completion"
)
```
## Metrics
This plugin automatically tracks specific metrics relevant to LLM streaming performance and reports them to Locust.
| Metric | Description |
| :--- | :--- |
| **TTFT** | **Time To First Token**. Measures the latency from the start of the request until the first "append" event is received. |
| **Prompt Tokens** | Number of tokens in the input prompt (estimated). |
| **Completion Tokens** | Number of tokens in the generated response (estimated). |
| **Processing Time** | Total time taken for the entire generation process. |
### How Metrics Appear in Locust
These metrics are reported as separate entries in the Locust statistics table:
- `{request_name}_ttft`: Latency statistics for the first token.
- `{request_name}_prompt_tokens`: "Response Length" column shows token count.
- `{request_name}_completion_tokens`: "Response Length" column shows token count.
- `{request_name}`: The main request entry showing total duration.
## Development
This project uses `uv` for dependency management.
```bash
# Install dependencies
uv sync
# Run tests
uv run pytest
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.