https://github.com/digitalocean/vmtop

Real-time monitoring of KVM/Qemu VMs
https://github.com/digitalocean/vmtop

bcc ebpf kvm monitoring performance prometheus qemu virtualization

Last synced: about 1 year ago
JSON representation

Real-time monitoring of KVM/Qemu VMs

Host: GitHub
URL: https://github.com/digitalocean/vmtop
Owner: digitalocean
License: apache-2.0
Created: 2020-03-17T20:18:54.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2024-04-20T04:48:36.000Z (over 2 years ago)
Last Synced: 2025-06-11T01:09:28.110Z (about 1 year ago)
Topics: bcc, ebpf, kvm, monitoring, performance, prometheus, qemu, virtualization
Language: Python
Homepage:
Size: 164 KB
Stars: 66
Watchers: 10
Forks: 21
Open Issues: 8
Metadata Files:
- Readme: readme.md
- License: LICENSE

Awesome Lists containing this project

README

# vmtop

Monitor various metrics for KVM/qemu guests and output in a top-like fashion.
Also aggregate the usage/allocation metrics at the NUMA node and host-level.

## Main features

It can output as text on the console and write CSV files. The `graph-vmtop.py`
script generates graphs from those CSV files (record with `--csv `).

This script can also be ran with a Prometheus exporter enabled, like so:
```
sudo ./vmtop.py --prometheus [host:port]
```

The host and port are optional, and will default to localhost:8000 if not
specified.

## Example output
Example output for the top 10 VMs experiencing the most vcpu steal per node:

```
$ sudo ./vmtop.py
[...]
Node 0:
Name PID
Guest01 48642 141.73 %
Guest02 16830 219.07 %
Guest03 17152 119.61 %
Guest04 60782 52.90 %
Guest05 33077 46.47 %
Guest06 39435 41.70 %
Guest07 5196 26.49 %
Guest08 56822 55.63 %
Guest09 65751 36.59 %
Guest10 25320 26.61 %
Node 0: vcpu
Node 0: 79 VMs
Node 1:
Name PID
Guest11 14506 29.49 %
Guest12 52580 28.69 %
Guest13 60864 36.45 %
Guest14 69426 64.98 %
Guest15 52138 52.45 %
Guest16 38308 57.05 %
Guest17 7700 11.87 %
Guest18 39542 39.46 %
Guest19 1381 49.03 %
Guest20 6945 36.15 %
Node 1: vcpu
Node 1: 131 VMs
``` --vm --limit 10 --sort vcpu_steal vcpu util vcpu steal vhost util vhost steal emu util emu steal disk read disk write rx tx 32.86 % 0.30 % 0.06 % 0.57 % 0.48 % 0.00 MB/s 0.00 MB/s 0.00 MB/s 0.00 MB/s 29.74 % 28.45 % 22.13 % 0.54 % 1.30 % 0.00 MB/s 0.01 MB/s 2.49 MB/s 2.43 MB/s 27.87 % 13.89 % 16.31 % 0.48 % 0.12 % 0.00 MB/s 0.00 MB/s 1.45 MB/s 1.39 MB/s 16.70 % 0.79 % 0.75 % 0.61 % 0.19 % 0.00 MB/s 0.01 MB/s 0.02 MB/s 0.03 MB/s 12.69 % 2.02 % 1.82 % 3.11 % 0.97 % 0.00 MB/s 2.06 MB/s 0.24 MB/s 0.14 MB/s 10.01 % 17.36 % 12.85 % 0.98 % 0.08 % 0.00 MB/s 0.00 MB/s 0.27 MB/s 0.30 MB/s 9.28 % 0.03 % 0.00 % 0.51 % 0.09 % 0.00 MB/s 0.00 MB/s 0.00 MB/s 0.00 MB/s 8.77 % 9.33 % 5.61 % 0.53 % 0.17 % 0.00 MB/s 0.00 MB/s 0.30 MB/s 0.30 MB/s 8.54 % 6.26 % 3.42 % 0.76 % 0.06 % 0.08 MB/s 0.04 MB/s 0.18 MB/s 0.18 MB/s 8.44 % 0.06 % 0.00 % 5.30 % 2.72 % 0.00 MB/s 0.06 MB/s 0.00 MB/s 0.00 MB/s util: 80.85%, vcpu steal: 6.49%, emulators util: 1.71%, emulators steal: 0.89% (168 vcpus, 306.00 GB mem allocated, 231.98 GB mem used) vcpu util vcpu steal vhost util vhost steal emu util emu steal disk read disk write rx tx 0.70 % 0.00 % 0.00 % 0.45 % 0.01 % 0.00 MB/s 0.01 MB/s 0.00 MB/s 0.00 MB/s 0.63 % 4.55 % 0.26 % 0.60 % 0.08 % 0.12 MB/s 0.00 MB/s 0.16 MB/s 0.15 MB/s 0.56 % 0.06 % 0.00 % 4.25 % 0.06 % 0.00 MB/s 0.03 MB/s 0.00 MB/s 0.00 MB/s 0.54 % 11.06 % 0.79 % 0.09 % 0.00 % 0.00 MB/s 0.02 MB/s 4.20 MB/s 3.87 MB/s 0.39 % 10.66 % 0.64 % 0.75 % 0.02 % 0.22 MB/s 0.00 MB/s 0.43 MB/s 0.38 MB/s 0.25 % 12.03 % 0.93 % 1.18 % 0.03 % 0.51 MB/s 0.00 MB/s 0.47 MB/s 0.56 MB/s 0.25 % 0.08 % 0.00 % 4.16 % 0.05 % 0.00 MB/s 0.03 MB/s 0.00 MB/s 0.00 MB/s 0.24 % 7.61 % 0.51 % 0.77 % 0.02 % 0.16 MB/s 0.00 MB/s 0.27 MB/s 0.27 MB/s 0.24 % 9.01 % 0.72 % 0.76 % 0.05 % 0.25 MB/s 0.00 MB/s 0.35 MB/s 0.43 MB/s 0.24 % 6.42 % 0.45 % 0.77 % 0.04 % 0.21 MB/s 0.00 MB/s 0.23 MB/s 0.36 MB/s util: 41.88%, vcpu steal: 0.20%, emulators util: 1.87%, emulators steal: 0.08% (161 vcpus, 210.00 GB mem allocated, 193.42 GB mem used)

# Dependencies

The tool currently depends on `numastat` being in the `$PATH`.

Besides that dependency, one design rule of this tool is that it should be easy
to just download it and run without any additional setup. We use very common
Python modules and the advanced features such as Prometheus and eBPF are
optional.

This tool has been mostly tested on Ubuntu Bionic 18.04, but should be easy to
run on other platforms.

The optional dependencies are:
* `python3-daemon` for the the `--daemon` option
* `python3-bcc` for the `--vmexit` option
* `python3-prometheus-client` for the `--prometheus` option

## Assumptions

The node-level information assumes a VM mostly runs where most of its memory is
allocated, if a VM can float between NUMA nodes, the node-level information may
not be accurate (and a warning will be shown). The VM-level data is accurate
regardless of the pinning.

## Feedback

This tool is a quick script to solve the issue of live monitoring resource
allocation and usage by VMs. We plan to keep improving it over time.

## Extras

In the `extras` folder, there are other ad-hoc tools for monitoring various
performance aspects of KVM. The tools are in various states of maintenance,
feel free to reach out if you have questions or suggestions for improvements.
The features from those scripts may end up in vmtop at some point, but for now
they are tested outside.

### guesttime.bt

`bpftrace` tool to check statistically how long a vCPU spends inside the guest
when it is scheduled in.

### Core-scheduling and KVM

The `core-sched-stats.py` script ensures that the core scheduling
feature works properly and accounts for the time spent by a vCPU in various
scheduling modes (co-scheduled with idle, with a compatible task, or a foreign
task).

This script works with perf trace recorded:
```
sudo perf record -e kvm:kvm_entry -e kvm:kvm_exit -e sched:sched_switch -e sched:sched_waking -o perf.data -a sleep 60
```

and can be converted to CTF like so (requires perf compiled with CTF support):
```
perf data convert --to-ctf=./ctf -i perf.data
```

If you see this message and the number of chunks is greater than 1 or 2, consider writing your `perf.data` to a ramdisk instead
of your local disk:
```
Processed 7378132 events and lost 1 chunks!
Check IO/CPU overload!
```

It depends on Babeltrace compiled with the python library and Perf compiled with the CTF support. On bionic:

```
apt-get install libbabeltrace-dev libbabeltrace-ctf-dev python3-babeltrace babeltrace
```

Then rebuild `perf` so it will detect the new library.

In order to run `kvm_co_sched_stats.py`, the siblings list must be provided in a text file with the `--topology` flag.
To collect this data, run this on the target HV:
```
for i in /sys/devices/system/cpu/cpu*/topology/thread_siblings_list; do cat $i; done > out.txt
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/digitalocean/vmtop

Awesome Lists containing this project

README