https://github.com/stasm/raptor-compare

Compare sets of Raptor results and test for statistical significance of the observed difference.
https://github.com/stasm/raptor-compare

Last synced: 3 months ago
JSON representation

Compare sets of Raptor results and test for statistical significance of the observed difference.

Host: GitHub
URL: https://github.com/stasm/raptor-compare
Owner: stasm
Created: 2014-07-22T18:31:15.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2015-11-19T10:54:18.000Z (over 9 years ago)
Last Synced: 2025-04-14T21:12:38.055Z (3 months ago)
Language: JavaScript
Homepage: https://www.npmjs.com/package/raptor-compare
Size: 69.3 KB
Stars: 4
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

        raptor-compare [![Build Status][travisimage]][travislink]

============================================================

[travisimage]: https://travis-ci.org/stasm/raptor-compare.png?branch=master

[travislink]: https://travis-ci.org/stasm/raptor-compare

Compare sets of Raptor results and test for their statistical significance 

([t-test][] with 0.05 alpha).

[t-test]: https://en.wikipedia.org/wiki/Student%27s_t-test

    $ raptor-compare my_test.ldjson

    music.gaiamobile.org   base: mean  1: mean  1: delta  1: p-value

    ---------------------  ----------  -------  --------  ----------

    navigationLoaded              711      726        14        0.06

    navigationInteractive         737      748        12        0.10

    visuallyLoaded               1322     1217      -105      * 0.00

    contentInteractive           1323     1217      -105      * 0.00

    fullyLoaded                  1462     1442       -20        0.14

    uss                        19.881   20.370     0.489      * 0.00

    pss                        23.468   23.981     0.513      * 0.00

    rss                        39.640   40.152     0.512      * 0.00

In the example above, Raptor measurements for the Music app were stable for the 

`visuallyLoaded` and `contentInteractive` events, as indicated by the asterisks 

next to the p-values.  At the same time, we can see that the memory footprint 

has regressed: the mean `uss` usage is higher than the base measurement and the 

difference is statistically significant as well.

For all measurements marked with the asterisk (`*`) it is valid to assume that 

the means are indeed significantly different between the base and the try runs.

The remaining results, e.g. the 20 ms `fullyLoaded` speed-up, are not 

significant and might be caused by a random instability of the data.  Try 

increasing the sample size (via Raptor's `--runs` option; see below) and run 

Raptor again.

What is p-value?

----------------

The p-value is a concept used in statistical testing which represents our 

willingness to make mistakes about the data.  A low p-value means that there's 

only a small risk of making a mistake by concluding that the test data 

indicates that the means are truly different and that the observed differences 

are not due to poor sampling and randomness.

For the data above, a p-value of 0.14 for `fullyLoaded` means that the risk of 

being wrong is 14% when we conclude that the 20 ms difference between the means 

is due to an actual code change and not to randomness.

Good p-values are below 0.05.

Installation

------------

    npm install -g raptor-compare

Running Raptor tests

--------------------

(For best results, follow the [Raptor][] guide on MDN.)

Install Raptor with:

    $ sudo npm install -g @mozilla/raptor

Connect your device to the computer, go into you Gaia directory and build Gaia:

    $ make raptor

Then, run the desired perf test:

    $ raptor test coldlaunch --runs 30 --app music --metrics my_test.ldjson

Raptor will print the output to `stdout`.  The raw data will be saved in the 

`ldjson` file specified in the `--metrics` option.  The data is appended so you 

can runmultiple tests for different revisions and apps and `raptor-compare` 

will figure out how to handle it.  All testing is conducted relative to the 

first result set for the given app.

[Raptor]: https://developer.mozilla.org/en-US/Firefox_OS/Automated_testing/Raptor

API

---

You can also use _raptor-compare_ programmatically.  It exposes three functions 

for working with Raptor data: `read` reads in a LDJSON stream with the raw 

metrics data, `parse` aggregates the data into a Map and `build` creates the 

comparison tables with p-values for significance testing.

```javascript

// Needed for Node.js 0.10 and 0.12.

require('babel/polyfill');

const fs = require('fs');

const compare = require('raptor-compare');

compare.read(fs.createReadStream(filename))

  .then(compare.parse)

  .then(compare.build)

  .then(tables => tables.forEach(

    table => console.log(table.toString())))

  .catch(console.error);

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/stasm/raptor-compare

Awesome Lists containing this project

README