Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/filiph/benchmarkhor

Benchmark comparison tool.
https://github.com/filiph/benchmarkhor

Last synced: 6 days ago
JSON representation

Benchmark comparison tool.

Host: GitHub
URL: https://github.com/filiph/benchmarkhor
Owner: filiph
License: mit
Created: 2021-06-17T02:22:58.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-11-29T12:34:12.000Z (about 1 month ago)
Last Synced: 2024-12-30T05:35:37.618Z (8 days ago)
Language: Dart
Size: 162 KB
Stars: 12
Watchers: 4
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

An illustration of a markhor, a mountain goat

![Build status](https://github.com/filiph/benchmarkhor/actions/workflows/test.yml/badge.svg)

**Benchmarkhor** is a benchmark comparison tool.
It provides ways to compare benchmark data as a whole, as opposed to just
as a handful of summary numbers.

(Markhor is a species of
[mountain goat](https://www.google.com/search?q=markhor).)

### Features

* **Visualizes performance changes with a diff histogram**.
This allows for a much more insightful comparison.

* **Saves each run in a tiny file without
losing detail.**
Each `.benchmark` file takes only a few kilobytes
(about the size of this `README.md` text file).
Compared to the
[timeline JSON files](https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview),
which can easily take hundreds of megabytes,
this format takes about 0.003% on disk.

* **Summarizes performance improvements with a carefully selected
set of metrics.**
This makes it easier to see if a particular performance
optimization makes a significant difference,
and it makes it harder to be fooled by false indications of progress.

## Install

Currently, the tool requires that you have the
[Dart SDK](https://dart.dev/get-dart)
installed and in your path.

Then, install this tool by running

```text
$ dart pub global activate benchmarkhor
```

## Use

**Note:** Benchmarkhor currently only supports comparison of Flutter
benchmarks. It may support more use cases in the future, but here we'll
assume you want to compare two versions of the same Flutter app.

### Creating the baseline ("before") benchmark

1. Exercise your app via an automated benchmarking approach, using
[`flutter_driver`](https://api.flutter.dev/flutter/flutter_driver/flutter_driver-library.html).
(See
[instructions](https://medium.com/flutter/performance-testing-of-flutter-apps-df7669bb7df7) on how to make the benchmarks as stable as possible.)
[Save the timeline](https://api.flutter.dev/flutter/flutter_driver/FlutterDriver/stopTracingAndDownloadTimeline.html)
to a file, such as `baseline.json`.

2. Run the `benchextract` tool on this file:

$ benchextract baseline.json

This generates a `baseline.benchmark` file in the same directory. It is much smaller yet contains all the salient data.

3. (Optional.) You can delete `baseline.json` now.

4. Keep `baseline.benchmark` for later, possibly even adding it to your source version control.

### Creating the candidate ("after") benchmark

After you've made your performance optimization work, create a new `.benchmark` file by following the instructions above. (Excercise your app using `flutter_driver`, save the timeline to a `.json` file, run `benchextract` on that file, remove the `.json` file.)

### Comparing two benchmarks

Simply run the `benchcompare` tool on any two benchmark files:

```text
$ benchcompare baseline.benchmark new.benchmark
```

This will give a result like this:

```text
<-- (improvement) UI thread (deterioration) -->

█
█
█
█
█
█
█
█
█
█
█
█
█
█
█
█
█
██
██
.......███................ ..
───────────────────────────────────────────────────────────────────────────────
-8.0ms ^ 8.0ms

<-- (improvement) Raster thread (deterioration) -->

█
█
█
█
█
█
█
█
█
█ █
█ █
█ █
█ █
█ █
█ █
█ █
▄█ █
██ █
▄ ██████ █
▄.█▄▄▄▄▄▄▄▄▄▄▄.▄▄▄▄▄▄▄▄▄▄██████████....█ . . . .. .
───────────────────────────────────────────────────────────────────────────────
-8.0ms ^ 8.0ms

UI Median Average
Before: 215 904.8
After: 216 953.5
* statistically significant difference (95% confidence)
Raster Median Average
Before: 6542 6240.2
After: 5506 4807.3
* statistically significant difference (95% confidence)

UI thread:
* 2.3% (10785ms) worsening of total execution time
* No significant change in jank risk (5946 -> 5894)
(That's a 0 ppt decrease in ratio of jank-to-normal frames.)
* 0.0% of individual measurements improved by 1ms+
* 0.1% of individual measurements worsened by 1ms+

Raster thread:
* 23.0% (-165496ms) improvement of total execution time
* -66% to -67% less potential jank (25507 -> 8567)
(That's a 15 ppt decrease in ratio of jank-to-normal frames.)
* 77.8% of individual measurements improved by 1ms+
* 0.0% of individual measurements worsened by 1ms+
```

In the above example, we can see a massive improvement on the raster thread. The histogram tells us this improvement is consistent: most of the graph is to the left of center, which means that most of the measurements in `new.benchmark` were shorter (faster) than in `baseline.benchmark`.

We can also see some deterioration on the UI thread, but only in total execution time (which roughly translates to battery usage). We can now decide whether this deterioration on the UI thread is a fair price for the improvement on the raster thread. (For what it's worth: _it definitely is, in this case._)

## Contribute

Please file an issue first, or feel free to fork this project if you can't wait. This is a personal project, so there are no guarantees.