https://github.com/joom/edit-distance
Verifying edit distance properties in Rocq.
https://github.com/joom/edit-distance
c-verification edit-distance levenshtein-distance rocq-prover vst
Last synced: 6 days ago
JSON representation
Verifying edit distance properties in Rocq.
- Host: GitHub
- URL: https://github.com/joom/edit-distance
- Owner: joom
- License: mit
- Created: 2019-02-19T20:17:03.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2026-05-31T11:53:54.000Z (about 1 month ago)
- Last Synced: 2026-05-31T13:19:46.803Z (about 1 month ago)
- Topics: c-verification, edit-distance, levenshtein-distance, rocq-prover, vst
- Language: Rocq Prover
- Homepage:
- Size: 61.5 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# edit-distance
Formal verification of the Levenshtein (edit) distance in [Rocq](https://rocq-prover.org/),
including a proof that a C implementation refines a verified functional model using the
[Verified Software Toolchain](http://vst.cs.princeton.edu/) (VST).
The development is structured in three layers, each proved equivalent to the next:
1. an **intrinsically-correct recursive** model, whose dependently-typed definition
carries its own optimality proof;
2. a **dynamic-programming** model in the Wagner–Fischer style, proved to compute
the same value as the recursive model; and
3. a **C implementation** (`levenshtein.c`), proved with VST to refine the
dynamic-programming model.
## What Is Proved
### Recursive model
`theories/Levenshtein_recursive.v` defines edit scripts as indexed Rocq types:
- `edit s t` is one insertion, deletion, or non-equal-character update from
`s` to `t`.
- `chain s t n` is an edit script from `s` to `t` with exactly `n` charged
edits; equal heads are skipped at no cost.
- `levenshtein_chain s t` computes both a distance and a witness edit script.
- `levenshtein_recursive s t` is the numeric distance extracted from that
witness.
The main theorem, `levenshtein_recursive_is_minimal`, proves that the computed
distance is minimal: for every edit script `chain s t n`,
`levenshtein_recursive s t <= n`. Since `levenshtein_chain` also returns a
witness script with exactly that distance, the recursive model computes the true
Levenshtein distance.
### Dynamic-programming model
`theories/Levenshtein_dp.v` defines `levenshtein_dp`, a Wagner-Fischer-style
dynamic-programming implementation over Rocq strings. It also contains
index-based cache and loop-state lemmas used by the C proof.
The main theorem, `levenshtein_dp_eq_levenshtein_recursive`, proves that for all
strings `s` and `t`, `levenshtein_dp s t = levenshtein_recursive s t`. The DP
proof first shows that the left-to-right cache traversal computes the recursive
model on reversed inputs, then uses reversal invariance of the recursive
distance to remove the reversals.
### C implementation
`theories/Verif_levenshtein.v` proves the generated Clight body of `levenshtein_n`
against a VST function specification. The specification interprets the input
byte arrays as Rocq strings with `bytes_to_string`, requires the usual pointer
and `size_t` bounds, and states that the returned `size_t` is exactly:
```coq
Levenshtein.levenshtein_recursive
(bytes_to_string a)
(bytes_to_string b)
```
The proof connects the C loops to the DP cache model, then uses
`levenshtein_dp_eq_levenshtein_recursive` to conclude that the C result is the
intrinsic Levenshtein distance.
## Files
| File | Description |
| --- | --- |
| `theories/Levenshtein_recursive.v` | Intrinsic edit-script model and proof that `levenshtein_recursive` is minimal among all edit scripts. |
| `theories/Levenshtein_dp.v` | Wagner-Fischer dynamic-programming model `levenshtein_dp`, proved equal to `levenshtein_recursive`. |
| `levenshtein.c` | The C implementation that is verified. |
| `theories/levenshtein.v` | CompCert Clight AST generated from `levenshtein.c` (via `clightgen`). |
| `theories/Verif_levenshtein.v` | VST proof that `levenshtein_n` returns the intrinsic recursive distance for the input byte arrays. |
All `.v` files live under `theories/` and form the Coq theory `EditDistance`,
so modules are referenced as `EditDistance.Levenshtein_dp`, etc.
## Requirements
- Rocq / `coq-core` ≥ 9.0
- [VST](https://vst.cs.princeton.edu/) ≥ 2.16 (which bundles CompCert and Flocq)
- dune ≥ 3.21
These can be installed with opam:
```sh
opam install dune coq-vst
```
## Building
The project is built with [dune](https://dune.build/); the `Makefile` is a thin
frontend to it.
```sh
make # dune build
make clean # dune clean + git clean
make install # dune install
```
Equivalently, run `dune build` directly. Build artifacts go under `_build/`.
## Regenerating the Clight AST
`theories/levenshtein.v` is generated from `levenshtein.c` and should not be edited by hand:
```sh
clightgen -normalize -o theories/levenshtein.v levenshtein.c
```
## Credit
This proof development was carried out with assistance from Claude and Codex. The [minimality proof](https://github.com/bloomberg/crane/pull/17) and the [dynamic programming implementation and proof](https://github.com/bloomberg/crane/pull/25) was done by [Charles C. Norton](https://github.com/CharlesCNorton).