https://github.com/ccamacho/mlperf
Ansible automation to run MLPerf benchmarks
https://github.com/ccamacho/mlperf
Last synced: 3 months ago
JSON representation
Ansible automation to run MLPerf benchmarks
- Host: GitHub
- URL: https://github.com/ccamacho/mlperf
- Owner: ccamacho
- Created: 2025-03-18T08:09:42.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-03-18T10:18:01.000Z (3 months ago)
- Last Synced: 2025-03-19T03:41:21.189Z (3 months ago)
- Size: 7.81 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This repository aims to guide the deployment and
testing of MLPerf submissions.## Container builds
Container images are published in
[quay.io](https://quay.io/repository/psap/mlperf-inference?tab=tags).- Inference test harness `quay.io/psap/mlperf-inference:latest`.
### Building vLLM container
From the repository root folder:
```
cd containers
podman build -f Containerfile.inference -t quay.io/psap/mlperf-inference:latest .
podman push quay.io/psap/mlperf-inference:latest
```## Deploying
To deploy the containers in a OCP cluster run (from the repository root folder):
- Go into the containers folder `cd containers`.
- Run initial steps `./00_pre.sh`.
- Deploy a vLLM pod with `./01_pod.vllm.sh`.## Testing
### vLLM
Connect to the vLLM container with and run the evaluation script with `python /workspace/scripts/run_vllm.py`.
```
oc exec -n my-whisper-runtime -it vllm-standalone -- /bin/bash
```Run the script directly:
```
oc exec -n my-whisper-runtime -it vllm-standalone -- /bin/bash -c "python /workspace/scripts/run_vllm.py"
```The current output should look like:
```
.
.Elapsed time: 789.2372903823853
Total audio seconds processed: 49507.556
Seconds transcribed / sec: 62.72835382121085
Requests per second: 4.217996337181559 for 3329
.
.
.
```## Ansible collection
From the root of the repository run:
- Install the collection:
```
cd psap/mlperf
``````
ansible-galaxy collection build --force --output-path releases/
VERSION=$(grep '^version: ' ./galaxy.yml | awk '{print $2}')
ansible-galaxy collection install releases/psap-mlperf-$VERSION.tar.gz --force
```- Run the playbook:
```
ansible-playbook playbook_deploy_kserve.yml
ansible-playbook playbook_run_mlperf_inference.yml
```#### Publishing a new MLPerf release
```
MY_GALAXY_API_KEY="this_is_a_very_secure_api_key_lol"
ansible-galaxy collection publish \
releases/psap-mlperf-$VERSION.tar.gz \
--server https://galaxy.ansible.com \
--ignore-certs \
--verbose \
--api-key $MY_GALAXY_API_KEY
```### Extending the collection default variables
```
# Create a file with the extra vars
VARS_FILE="./vars.yml"# Use small or large-v3
# TODO: The variables passed to the CLI should be fetched from the env vars if configured
# TODO: The run_trt.py should be called from a bash script so it can fetch the values from
# the environmental variables.cat < $VARS_FILE
whisper_image: quay.io/psap/whisper-poc:latest-trt
whisper_commands_to_run:
- mkdir -p /tmp/output/
- nvidia-smi > /tmp/output/gpu_status.txt
- bash /home/trt/scripts/trt-build-whisper.sh -m small > /tmp/trt-build-whisper.log 2>&1
- python3 /home/trt/scripts/run_trt.py --engine_dir ~/tensorrtllm_backend/tensorrt_llm/examples/whisper/trt_engines/large_v3_max_batch_64 --dataset hf-internal-testing/librispeech_asr_dummy --enable_warmup --name librispeech_dummy_large_v3 --assets_dir ~/assets --num_beams 4 > /tmp/run_trt.log 2>&1
- python3 /home/trt/scripts/run_vllm_plot.py
EOF# Running from the Ansible CLI
ansible-playbook playbook_whisper.yml -e @$VARS_FILE
```## Logging
```
# Update the ansible configuration (ansible.cfg) accordingly.
# python -m ara.setup.ansible
# [defaults]
# callback_plugins=/usr/local/lib/python3.10/dist-packages/ara/plugins/callback
# action_plugins=/usr/local/lib/python3.10/dist-packages/ara/plugins/action
# lookup_plugins=/usr/local/lib/python3.10/dist-packages/ara/plugins/lookup# Let's make sure the local DB is clean
ara-manage prune --confirm
ansible-playbook playbook_plotter.yml
ara-manage generate ./ara-output
```## Running MLPerf for the first time?
The following steps will allow you to test this collection.
The only external requirement is the OpenShift (oc)
command line interface.### 1. Clone the repository
```
git clone https://github.com/ccamacho/mlperf
```### 2. Install the python dependencies
```
cd mlperf/psap/mlperf
python3 -m pip install -r requirements.txt
```### 3. Install the collection dependencies
```
ansible-galaxy install -r requirements.yml
```### 4. Install MLPerf as an Ansible collection
```
ansible-galaxy collection build --force --output-path releases/
VERSION=$(grep '^version: ' ./galaxy.yml | awk '{print $2}')
ansible-galaxy collection install releases/psap-mlperf-$VERSION.tar.gz --force
```### 5. Export your kubeconfig file
```
export KUBECONFIG=
```### 6. Run the playbook
This wil run the whisper PoC on with less resources in a Nvidia T4
```
# Create a file with the extra vars
VARS_FILE="./vars.yml"# Use small or large-v3
# TODO: the variables passed to the CLI should be fetched from the env vars if configuredcat < $VARS_FILE
whisper_image: quay.io/psap/whisper-poc:latest-vllm
whisper_commands_to_run:
- mkdir -p /tmp/output/
- nvidia-smi > /tmp/output/gpu_status.txt
- python /workspace/scripts/run_vllm.py --model small --range 100 --batch_sizes 1 2 4 8 16 32 > /tmp/run_vllm.log 2>&1
- python /workspace/scripts/run_vllm_plot.py
EOF# Running from the Ansible CLI
ansible-playbook playbook_deploy_kserve.yml -e @$VARS_FILE
```### 7. Verifying results
The results are stored by default in the `./whisper_bench-output` folder.
This structure and the results files are subject to change.```
user@machine:~/dev/whisper-poc/psap/topsail/whisper_bench-output$ tree
.
├── gpu_metrics.csv
├── gpu_status.txt
├── images
│ ├── gpu_utilization_plot.png
│ ├── memory_utilization_plot.png
│ ├── power_draw_plot.png
│ ├── vllm-latency.png
│ ├── vllm-seconds_transcribed_per_sec.png
│ └── vllm-total_time.png
├── output-vllm-001.json
├── output-vllm-002.json
├── output-vllm-004.json
├── output-vllm-008.json
├── output-vllm-016.json
└── output-vllm-032.json1 directory, 14 files
```### 8. HTML output
If you need an HTML of the execution, by default it is installed ara as a callback plugin.
```
ara-manage generate ./ara-output
```And inspect the output of the `ara-output` folder (open the `index.html` file),
for more information about how to configure this callback plugin read the
[Logging section](https://github.com/openshift-psap/whisper-poc/blob/main/README.md#logging).