https://github.com/jmid/efftester

Effect-Driven Compiler Tester for OCaml
https://github.com/jmid/efftester
compiler-testing ocaml property-based-testing quickcheck
Last synced: 8 months ago
JSON representation
Effect-Driven Compiler Tester for OCaml
Host: GitHub
URL: https://github.com/jmid/efftester
Owner: jmid
License: bsd-2-clause
Created: 2017-05-11T09:29:57.000Z (about 9 years ago)
Default Branch: master
Last Pushed: 2019-09-25T20:03:55.000Z (over 6 years ago)
Last Synced: 2025-07-25T13:41:17.916Z (11 months ago)
Topics: compiler-testing, ocaml, property-based-testing, quickcheck
Language: OCaml
Size: 29.3 KB
Stars: 38
Watchers: 6
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          Effect-Driven Compiler Tester:

==============================

This is a prototype implementation of a compiler testing approach

described in the paper:

>  "Effect-Driven QuickChecking of Compilers",

>  Jan Midtgaard, Mathias Nygaard Justesen, Patrick Kasting, Flemming Nielson, Hanne Riis Nielson,

>  ICFP'17,

>  [http://janmidtgaard.dk/papers/Midtgaard-al:ICFP17-full.pdf](http://janmidtgaard.dk/papers/Midtgaard-al%3aICFP17-full.pdf) (full version)

We suggest to generate programs following a dedicated type and effect

system that determines whether a program has evaluation-order

dependence. By reading the type-and-effect-system relation bottom up,

in a goal-directed manner we can generate programs that are free of

evaluation-order dependence. We can thus run such programs with both

the bytecode and native code backends of OCaml, log their observable

behaviour to a file, and `diff` the two behaviours.

Building and running:

---------------------

The tester requires OCaml, the QCheck library, and a *nix-like platform with `diff` and `/dev/null`.

The below output is obtained with QCheck version 0.5.2. If compiled as is with a later QCheck version

(known to produce more output) the resulting program may produce a somewhat garbled output.

To build simply run `make`.

Now run `effmain.native -v` (the native code build is recommended for speed reasons).

This should produce output along these lines:

```

$ ./effmain.native -v

random seed: 228021772

...........................................................................x...x...x...x...x...x...x...x...x...x....x...x..x....x....x.....x..x...x..x......x.....x.....x.....x.....x.....x.....x.....x.....x.....x.....x.....x.....x.....x.....

law bytecode/native backends agree: 76 relevant cases (76 total)

  test `bytecode/native backends agree`

  failed on ≥ 1 cases:

  Some (let o = (let j = List.hd [] in fun w -> int_of_string) "" in 0) (after 32 shrink steps)

  

failure (1 tests failed, ran 1 tests)

```

Interestingly, the produced counterexamples such as

`let o = (let j = List.hd [] in fun w -> int_of_string) "" in 0`

above illustrate observable differences in the executables produced by

the two backend, aka, bugs. This particular one is a variant of MPR#7531

listed below.

You can provide a known seed with the flag `-s`: `./effmain.native -v -s 228021772`

to reproduce an earlier run.

When invoked without a command-line seed QCheck is seeded differently

for each run and will therefore (most likely) produce different

results for each of them:

- each run may reveal different errors or

- no bugs are found in the given run.

The reproducability with respect to a given seed should be taken with

a large grain of salt. A seed may still yield different results

depending on the underlying architecture, OS, bit-width,

OCaml-version, and QCheck-version. Furthermore it is (all too) easy to

invalidate previous seeds by any adjustments to

- the generator -- as the adjusted version may utilize/interpret the

  old seed differently,

- the generator's initial environment -- as choices herein may come

  out differently for a smaller/bigger/different environment,

- the shrinker -- as the adjusted version may shrink a bigger

  problematic counterexample to a different minimal example.

The generated programs are written to `testdir/test.ml` and the

compiled versions are named `testdir/native` and `testdir/byte`.

These are overwritten on each iteration. As such, the last version

remaining rarely represents a counterexample. Instead it represents

the shrinker's last (failed) attempt to shrink the minimal

counterexample one step further.

The current version tests OCaml version 4.04.0's native-code compiler

with flag `-O3` against the bytecode compiler. It was originally

tested with version 4.02.3 which doesn't support the `-O3` flag. So

far the tester has found the same bugs (below) related to erroneous

optimization in 4.02.3 as in 4.04.0 with the `-O3` flag.

If you find more errors using this approach please let me know.

Bugs found:

-----------

MPR#7531  https://caml.inria.fr/mantis/view.php?id=7531  Delayed effects in partially applied functions

MPR#7533  https://caml.inria.fr/mantis/view.php?id=7533  Wrong code generation of `e / e'` and `e mod e'` with effectful `e`

MPR#7201  https://caml.inria.fr/mantis/view.php?id=7201  Wrong optimization of `0 / e`

	  						 (reported by way of Jean-Christophe Filliatre)

   #8864  https://github.com/ocaml/ocaml/issues/8864     Left-shift of 0 yields negative number

                                                         (found by [Murilo Giacometti Rocha](https://github.com/murilogiacometti))

A quick port to [js_of_ocaml](http://ocsigen.org/js_of_ocaml/) revealed the following 2 bugs (reported together):

 

https://github.com/ocsigen/js_of_ocaml/issues/584  function difference+equality exception removed

A quick port to [BuckleScript](https://bucklescript.github.io/) revealed the following 8 bugs:

https://github.com/bloomberg/bucklescript/issues/1667 failure: internal compiler Failure

https://github.com/bloomberg/bucklescript/issues/1692 BuckleScript produces incorrect JavaScript

https://github.com/BuckleScript/bucklescript/issues/1759 comparison+difference of units yields different results

https://github.com/BuckleScript/bucklescript/issues/1760 div+mod: Division_by_zero removed

https://github.com/BuckleScript/bucklescript/issues/1761 comp: function comparison succeeds

      

https://github.com/BuckleScript/bucklescript/issues/1762 order: bool_of_string erroneously optimized

Known bugs recreated:

---------------------

GPR#954  https://github.com/ocaml/ocaml/pull/954  Wrong optimisation of `0 mod e`

                                                  (fixes both the division and mod cases)

GPR#956  https://github.com/ocaml/ocaml/pull/956  Keep possibly-effectful expressions when optimizing multiplication by zero

Observing the generated programs:

---------------------------------

To invoke the generator directly, first `make eff` and then start

`ocaml` from within this directory (this will load modules suitably

via `.ocamlinit`).

To generate an arbitrary program you can now invoke `Gen.generate1

term_gen.gen` (wrapped with a suitable string coersion to make the

generator's output more understandable).  The output changes per each

invocation:

```

$ ocaml

        OCaml version 4.04.0

[some lines about loaded libs omitted]

# Print.option (toOCaml ~typeannot:false) (Gen.generate1 term_gen.gen);;

- : string = "Some (exit ((fun n -> (-98051756132636271)) string_of_bool))"

# Print.option (toOCaml ~typeannot:false) (Gen.generate1 term_gen.gen);;

- : string =

"Some ((mod) (pred (let h = if true then exit ((/) (if false then 1830787755246062127

 else (-2895157864674163253)) (lnot ((asr) (-3089269914618536456) 93))) else 

 let w = print_endline (let i = lnot 996022529208063915 in string_of_int 25) in 

 string_of_bool (exit (abs (-427726557501168681))) in ( * ) (pred (lnot (List.hd 

 (List.hd (exit 991))))) (lnot 5))) ((lsr) 225 755))"

```

To observe the distribution of the effect-driven program generator you

can build a different target with `make stat` which results in a

`effstat.native` executable. This program runs a constant true test

over generated terms while logging the size of each term.

A bashscript `runstat.sh` will then run `./effstat.native -v`, log the

output, process it with `sed`, and pass the output to the program

'ministat'.

`runstat.sh` itself requires a *nix platform with bash and 'ministat'

installed.

When run, `runstat.sh` should produce output along these lines (you

can adjust the 50 generated terms in `src/effstat.ml`):

```

$ ./runstat.sh 

x stat_size.txt

+-------------------------------------------------------------------------------------------------------------+

|             x                                                                                               |

|            xx                                                                                               |

|            xx                                                                                               |

|           xxx                                                                                               |

|          xxxx                                                                                               |

|          xxxx                                                                                               |

|          xxxx                                                                                               |

|          xxxx                                                                                               |

|          xxxx    x x              xx                                                                        |

|          xxxxx   xxx    x x      xxx                 x                            x x     x                x|

||____________M_________A_____________________|                                                               |

+-------------------------------------------------------------------------------------------------------------+

    N           Min           Max        Median           Avg        Stddev

x  50             1           639            16         80.82     148.41691

```

The very latest version of the QCheck library can provide similar

output and thereby removes the need for the above bashscript.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jmid/efftester

Awesome Lists containing this project

README