An open API service indexing awesome lists of open source software.

https://github.com/ccamacho/mlperf

Ansible automation to run MLPerf benchmarks
https://github.com/ccamacho/mlperf

Last synced: 3 months ago
JSON representation

Ansible automation to run MLPerf benchmarks

Awesome Lists containing this project

README

        

This repository aims to guide the deployment and
testing of MLPerf submissions.

## Container builds

Container images are published in
[quay.io](https://quay.io/repository/psap/mlperf-inference?tab=tags).

- Inference test harness `quay.io/psap/mlperf-inference:latest`.

### Building vLLM container

From the repository root folder:

```
cd containers
podman build -f Containerfile.inference -t quay.io/psap/mlperf-inference:latest .
podman push quay.io/psap/mlperf-inference:latest
```

## Deploying

To deploy the containers in a OCP cluster run (from the repository root folder):

- Go into the containers folder `cd containers`.
- Run initial steps `./00_pre.sh`.
- Deploy a vLLM pod with `./01_pod.vllm.sh`.

## Testing

### vLLM

Connect to the vLLM container with and run the evaluation script with `python /workspace/scripts/run_vllm.py`.

```
oc exec -n my-whisper-runtime -it vllm-standalone -- /bin/bash
```

Run the script directly:

```
oc exec -n my-whisper-runtime -it vllm-standalone -- /bin/bash -c "python /workspace/scripts/run_vllm.py"
```

The current output should look like:

```
.
.

Elapsed time: 789.2372903823853
Total audio seconds processed: 49507.556
Seconds transcribed / sec: 62.72835382121085
Requests per second: 4.217996337181559 for 3329
.
.
.
```

## Ansible collection

From the root of the repository run:

- Install the collection:

```
cd psap/mlperf
```

```
ansible-galaxy collection build --force --output-path releases/
VERSION=$(grep '^version: ' ./galaxy.yml | awk '{print $2}')
ansible-galaxy collection install releases/psap-mlperf-$VERSION.tar.gz --force
```

- Run the playbook:

```
ansible-playbook playbook_deploy_kserve.yml
ansible-playbook playbook_run_mlperf_inference.yml
```

#### Publishing a new MLPerf release

```
MY_GALAXY_API_KEY="this_is_a_very_secure_api_key_lol"
ansible-galaxy collection publish \
releases/psap-mlperf-$VERSION.tar.gz \
--server https://galaxy.ansible.com \
--ignore-certs \
--verbose \
--api-key $MY_GALAXY_API_KEY
```

### Extending the collection default variables

```
# Create a file with the extra vars
VARS_FILE="./vars.yml"

# Use small or large-v3
# TODO: The variables passed to the CLI should be fetched from the env vars if configured
# TODO: The run_trt.py should be called from a bash script so it can fetch the values from
# the environmental variables.

cat < $VARS_FILE
whisper_image: quay.io/psap/whisper-poc:latest-trt
whisper_commands_to_run:
- mkdir -p /tmp/output/
- nvidia-smi > /tmp/output/gpu_status.txt
- bash /home/trt/scripts/trt-build-whisper.sh -m small > /tmp/trt-build-whisper.log 2>&1
- python3 /home/trt/scripts/run_trt.py --engine_dir ~/tensorrtllm_backend/tensorrt_llm/examples/whisper/trt_engines/large_v3_max_batch_64 --dataset hf-internal-testing/librispeech_asr_dummy --enable_warmup --name librispeech_dummy_large_v3 --assets_dir ~/assets --num_beams 4 > /tmp/run_trt.log 2>&1
- python3 /home/trt/scripts/run_vllm_plot.py
EOF

# Running from the Ansible CLI
ansible-playbook playbook_whisper.yml -e @$VARS_FILE
```

## Logging

```

# Update the ansible configuration (ansible.cfg) accordingly.
# python -m ara.setup.ansible
# [defaults]
# callback_plugins=/usr/local/lib/python3.10/dist-packages/ara/plugins/callback
# action_plugins=/usr/local/lib/python3.10/dist-packages/ara/plugins/action
# lookup_plugins=/usr/local/lib/python3.10/dist-packages/ara/plugins/lookup

# Let's make sure the local DB is clean
ara-manage prune --confirm
ansible-playbook playbook_plotter.yml
ara-manage generate ./ara-output
```

## Running MLPerf for the first time?

The following steps will allow you to test this collection.
The only external requirement is the OpenShift (oc)
command line interface.

### 1. Clone the repository

```
git clone https://github.com/ccamacho/mlperf
```

### 2. Install the python dependencies

```
cd mlperf/psap/mlperf
python3 -m pip install -r requirements.txt
```

### 3. Install the collection dependencies

```
ansible-galaxy install -r requirements.yml
```

### 4. Install MLPerf as an Ansible collection

```
ansible-galaxy collection build --force --output-path releases/
VERSION=$(grep '^version: ' ./galaxy.yml | awk '{print $2}')
ansible-galaxy collection install releases/psap-mlperf-$VERSION.tar.gz --force
```

### 5. Export your kubeconfig file

```
export KUBECONFIG=
```

### 6. Run the playbook

This wil run the whisper PoC on with less resources in a Nvidia T4

```
# Create a file with the extra vars
VARS_FILE="./vars.yml"

# Use small or large-v3
# TODO: the variables passed to the CLI should be fetched from the env vars if configured

cat < $VARS_FILE
whisper_image: quay.io/psap/whisper-poc:latest-vllm
whisper_commands_to_run:
- mkdir -p /tmp/output/
- nvidia-smi > /tmp/output/gpu_status.txt
- python /workspace/scripts/run_vllm.py --model small --range 100 --batch_sizes 1 2 4 8 16 32 > /tmp/run_vllm.log 2>&1
- python /workspace/scripts/run_vllm_plot.py
EOF

# Running from the Ansible CLI
ansible-playbook playbook_deploy_kserve.yml -e @$VARS_FILE
```

### 7. Verifying results

The results are stored by default in the `./whisper_bench-output` folder.
This structure and the results files are subject to change.

```
user@machine:~/dev/whisper-poc/psap/topsail/whisper_bench-output$ tree
.
├── gpu_metrics.csv
├── gpu_status.txt
├── images
│ ├── gpu_utilization_plot.png
│ ├── memory_utilization_plot.png
│ ├── power_draw_plot.png
│ ├── vllm-latency.png
│ ├── vllm-seconds_transcribed_per_sec.png
│ └── vllm-total_time.png
├── output-vllm-001.json
├── output-vllm-002.json
├── output-vllm-004.json
├── output-vllm-008.json
├── output-vllm-016.json
└── output-vllm-032.json

1 directory, 14 files
```

### 8. HTML output

If you need an HTML of the execution, by default it is installed ara as a callback plugin.

```
ara-manage generate ./ara-output
```

And inspect the output of the `ara-output` folder (open the `index.html` file),
for more information about how to configure this callback plugin read the
[Logging section](https://github.com/openshift-psap/whisper-poc/blob/main/README.md#logging).