https://github.com/leonardoalt/dl_symb_exec_sol

Symbolic execution engine written in Solidity, based on Difference Logic.
https://github.com/leonardoalt/dl_symb_exec_sol

Last synced: 3 months ago
JSON representation

Symbolic execution engine written in Solidity, based on Difference Logic.

Host: GitHub
URL: https://github.com/leonardoalt/dl_symb_exec_sol
Owner: leonardoalt
License: gpl-3.0
Created: 2022-02-07T12:23:33.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-10-06T14:59:04.000Z (almost 3 years ago)
Last Synced: 2025-03-12T18:04:11.863Z (4 months ago)
Language: Solidity
Size: 37.1 KB
Stars: 111
Watchers: 3
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        **This code base is not meant to be used seriously, it's only a study.**

========================================================================

EVM Symbolic Execution in Solidity

==================================

This repo contains an experimental symbolic execution engine implemented in

Solidity.

If you write smart contracts in Solidity and write your tests also in Solidity,

the analysis runs simply as part of the test suite, itself being a test

library.

Therefore, any framework that allows tests in Solidity should be able to run

this symbolic execution, without any extra tooling.

Since that code is not going to be deployed anyway we don't care about gas.

The VM opcode handling part was inspired by

https://github.com/Ohalo-Ltd/solevm.

Analysis

--------

During the symbolic run, path constraints are collected and for every `JUMPI`

opcode the analysis asks a Difference Logic solver (`DLSolver.sol`) whether the

condition can ever be true.

If the condition can never be true, the event `UnreachableBranch(pc)` is

emitted, giving the program counter of that branch.

If the condition can be true, the generated constraints are added into the list

of constraints, and the new `true` branch is executed.

The negation of the generated constraints is added into the constraint list for

the `false` branch.

Usage

-----

This repo uses `forge` from [Foundry](https://github.com/gakonst/foundry/).

To run all the tests:

```

$ forge test

```

The tests in this repo include unit tests for the DL solver, and examples of

how to use the symbolic engine.

To run the latter:

```

$ forge test --match symb_run -vvvv

```

The first test, `SymbExecTest::test_symb_run_simple`, shows us that the branch

starting at program counter 0x14 is unreachable! It can therefore be removed.

That branch is `tag_2`, which represents the inner `if` in the sample Yul code.

See the test for a detailed explanation.

In the last test, you should see

```

[PASS] test_symb_run_unreachable() (gas: 687208634)

Traces:

  [687208634] SymbExecTest::test_symb_run_unreachable()

    ├─ emit UnreachableBranch(pc: 382)

    ├─ emit UnreachableBranch(pc: 425)

    ├─ emit UnreachableBranch(pc: 468)

    ├─ emit UnreachableBranch(pc: 550)

    └─ ← ()

```

This shows that the analysis found 4 useless branches in the bytecode!

Check `src/test/SymbExec.t.sol` to understand why/where.

The analysis for a contract Analyzed can be invoked by calling

`symb_run(type(Analyzed).runtimeCode)`, as seen in the tests.

Note that the settings in this repo are **not** using the Solidity compiler's

optimizer on purpose.

The optimizer itself already removes some of these branches from the bytecode.

It is likely that many of the cases that this engine could optimize are already

covered by the compiler.

You will likely notice test result differences if you enable/disable the

optimizer settings in`foundry.toml`.

Need to run more tests.

Difference Logic (DL)

---------------------

DL is a nice little logic that accepts expressions of the form `a - b <= k`,

where `a` and `b` are variables, and `k` is a constant.

The domain may be the Integers or the Reals.

A DL solver is similar to an LP solver, but a lot simpler.

It takes a set of constraints in the form above (instead of more general linear

constraints), and answers whether it is feasible for all the constraints to be

true at the same time.

The algorithm for solving sets of DL constraints is to represent the

constraints as a weighted graph, such that every constraint `a - b <= k` is an

edge `a -> b` with weight `k`, and check whether the graph has a negative

cycle.

The latter can be solved, for example, with the Bellman-Ford single source

shortest path algorithm with the negative cycle detection extension.

The proofs of the statements above are left as exercises to the reader.

Encoding

--------

For every `JUMPI` we collect constraints from the condition and convert them to

DL expressions.

For example, if the `JUMPI` condition is `stack_slot_1 < 2`, this becomes

`stack_slot_1 - zero <= 1`, where `zero` is a symbolic variable that always

represents the constant 0 in the DL graph.

The encoding for `GT` is similar.

The encoding for `ISZERO` simply negates the encoding of its argument.

Equalities (`EQ`) `x = y` become two constraints: `x - y <= 0` and `y - x <=

0`.

In the case of disequalities, that is, the `iszero(eq(...))` constraint for the

true branch of a `JUMPI`, or the negation of an `eq(...)` for the false branch,

we actually do not generate new constraints.

This is because the encoding of a disequality is actually a disjunction: `a !=

b <=> !(a <= b && b <= a) <=> a > b || b > a`.

If the domain is the Integers (which it is in our case), the DL satisfiability

problem becomes NP-hard.

Code base issues

----------------

All the VM data structures, such as stack, path and constraints, are memory arrays.

There is a lot of copying of those arrays.

Some of them are intended, since we need to make a new VM when starting a new

`JUMPI` branch, and continue the other branch with the current VM.

However, at times we simply want to extend an array without copying it

entirely.

Since we can't extend memory arrays natively, we just create a new larger array

from scratch with the previous content copied, plus the desired extension.

The code base would definitely benefit from a memory vector, either natively

or from a library.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/leonardoalt/dl_symb_exec_sol

Awesome Lists containing this project

README