Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bitovi/temporal-load-grafana
Load Testing Temporal with Grafana
https://github.com/bitovi/temporal-load-grafana
Last synced: about 7 hours ago
JSON representation
Load Testing Temporal with Grafana
- Host: GitHub
- URL: https://github.com/bitovi/temporal-load-grafana
- Owner: bitovi
- Created: 2023-10-31T18:38:43.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-01T20:15:41.000Z (about 1 year ago)
- Last Synced: 2024-04-14T22:37:36.157Z (7 months ago)
- Homepage:
- Size: 41 KB
- Stars: 0
- Watchers: 11
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# Temporal Load Testing
Let's do some load testing on Temporal. This will help us configure our cluster for efficiency, robustness, and cost control.
## Load Test Deployment
Using:
### Prerequisites
- A running Temporal cluster
- `kubectl` configured to access the cluster
- Access to the Grafana and Temporal UIs_NOTE: this document uses the alias `k` for `kubectl` (and you should too!)_
### Clone this repo
```shell
cd /path/to/your/projects
git clone https://github.com/bitovi/temporal-load-grafana.git
```### Import the Temporal dashboards
In the Grafana UI, paste the content of `./server-general.json` into the Dashboard import window.
- pulled from
### Deploy the load test harness
```shell
k apply -f deployment.yaml [-n ]# you should see:
# deployment.apps/benchmark-workers created
# deployment.apps/benchmark-soak-test created
```### Confirm the activity in Temporal UI
The provided deployment file is configured to start the load test immediately. You should see activity in the Temporal UI within a few seconds.
Click the refresh button to see the latest activity: ↻
You should see new workflows being created with each refresh.### Observe the load in Grafana
In the Grafana UI, select the `Temporal Server` dashboard that you imported above.
You should see the load increasing in the various graphs. The peaks are increasing load, and the valleys are decreasing/no load.
### Scale up
You can increase the load by scaling up the deployment:
```shell
k scale deployment benchmark-soak-test --replicas=10 [-n ]
```### Scale down
You can decrease the load by scaling down the deployment:
```shell
k scale deployment benchmark-soak-test --replicas=5 [-n ]
```### Stop the test
The quick-and-dirty way to stop the test is to just delete the loading deployment:
```shell
k delete -f deployment.yaml
```Alternatively, you can scale the runner deployment to 0:
```shell
k scale deployment benchmark-soak-test --replicas=0
```This keeps the deployment alive for easy up-scaling when ready.
> Of course, you can use your k8s interface of choice instead of `kctl` to do these operations: [k9s](https://k9scli.io/), [openlens](https://github.com/MuhammedKalkan/OpenLens), etc.
# What to look for
This benchmark creates a stable load test. What are the key metrics to look for?
The most critical metric is `state_transitions_count_count`. This is the throughput of your Temporal system. As you increase and decrease the load, you'll see the `state_transitions_count_count` react.
Specifically, the metric is defined as `sum(rate(state_transition_count_count[1m]))`
> That is not a typo, the metric name is `...count_count`.
At a certain load, you will see the `state_transition_count_count` plateau, or even start to drop. This is a sign that you've found a bottleneck in the system. Where is it? You can look at the metrics suggested above. You can also inspect the resource utilization of your cluster. Are any worker pods at CPU or Memory limits? Is throttling happening on the node?
To scale the service you've identified as a potential bottleneck, scale the deployment:
```shell
k scale deployment --replicas= [-n ]
```As always, `kubectl` is just one way to manage your cluster. K9s and openlens are highly recommended.
## Appendix
### Run with `tctl`
As an alternative to the above options, you can run benchmark tests directly with `tctl`.
Anywhere you have `tctl` available:
1. Run `export TEMPORAL_CLI_ADDRESS=`
1. Execute:```shell
tctl workflow start --taskqueue benchmark \
--workflow_type ExecuteActivity \
--execution_timeout 60 \
-i '{"Count":1,"Activity":"Sleep","Input":{"SleepTimeInSeconds":3}}'
```This will start a workflow that executes a three-second `Sleep` activity once.
To execute the activity multiple times, change the `Count` value.
To change the sleep time, change the `SleepTimeInSeconds` value.### Change the load test
In most cases, the default benchmark test provided by the `sleep` command above is adequate for load testing your Temporal cluster.
If necessary, the `soak-test` runner configuration can be adjusted either with Environment Variables or with command line flags. There aren't too many options, but see the official documentation for details: