https://github.com/eth-sri/ilf
AI based fuzzer based on imitation learning
https://github.com/eth-sri/ilf
blockchain fuzzing imitation-learning machine-learning smart-contracts symbolic-execution testing
Last synced: 11 months ago
JSON representation
AI based fuzzer based on imitation learning
- Host: GitHub
- URL: https://github.com/eth-sri/ilf
- Owner: eth-sri
- License: apache-2.0
- Created: 2019-11-23T12:31:16.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-07-26T22:22:55.000Z (almost 3 years ago)
- Last Synced: 2024-05-08T00:15:48.435Z (about 2 years ago)
- Topics: blockchain, fuzzing, imitation-learning, machine-learning, smart-contracts, symbolic-execution, testing
- Language: Python
- Size: 4.73 MB
- Stars: 144
- Watchers: 11
- Forks: 32
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
ILF: AI-based Fuzzer for Ethereum Smart Contracts
=============================================================================================================
ILF is an **I**mitation **L**earning based **F**uzzer for smart contracts. The fuzzing policy, which is used to generate transactions, is represented by an ensemble of neural networks and is learned from thousands of high-quality sequences of transactions generated using symbolic execution. ILF can be used to fuzz any Ethereum smart contract and outputs the coverage and a vulnerability report.
ILF is developed at [SRI Lab, Department of Computer Science, ETH Zurich](https://www.sri.inf.ethz.ch/) as part of the [Machine Learning for Programming](https://www.sri.inf.ethz.ch/research/plml) and [Blockchain Security](https://www.sri.inf.ethz.ch/research/blockchain-security) projects. For mode details, please refer to [ILF CCS'19 paper](https://files.sri.inf.ethz.ch/website/papers/ccs19-ilf.pdf) and [slides](https://files.sri.inf.ethz.ch/website/slides/ccs19-ilf-slides.pdf).
## Setup
We provide a docker file, which we recommend to start with. To build and run:
```
$ docker build -t ilf .
$ docker run -it ilf
```
You can also follow the instructions in the Dockerfile to install ILF locally. If you experience build errors on Apple M chips, please refer to [#21](https://github.com/eth-sri/ilf/issues/21).
## Usage
### Fuzzing
To fuzz the example provided in the repo with ILF (the `imitation` fuzzing policy) using our pre-trained model in the `model` directory:
```
$ python3 -m ilf --proj ./example/crowdsale/ --contract Crowdsale --fuzzer imitation --model ./model/ --limit 2000
```
The `--fuzzer` argument can be replaced by:
* `random`: a uniformly random fuzzing policy.
* `symbolic`: a symbolic execution fuzzing policy based on depth first search of block states. This is used for generating training sequences.
* `sym_plus`: an augmentation of `symbolic` which can revisit encountered block states.
* `mix`: a fuzzing policy that randomly chooses `imitation` or `symbolic` for generating each transaction.
For fuzzing new contracts, one needs to provide a Truffle project (formatted as the example in `example/crowdsale`). Then the script `script/extract.py` should be called to extract deployment transactions of the contracts. For the example contract, the script runs as follows:
```
$ rm example/crowdsale/transactions.json
$ python3 script/extract.py --proj example/crowdsale/ --port 8545
```
Note that you need to kill existing `ganache-cli` processes listening the same port before calling this script.
### Training
For training, one needs to run `symbolic` on a set of training contracts to produce a dataset in a training directory. Usually tens of thousands of contracts are used for training. For demonstration purposes, we show how to produce a small training dataset from our example contract to the `train_data` directory:
```
$ mkdir train_data
$ python3 -m ilf --proj ./example/crowdsale/ --contract Crowdsale --limit 2000 --fuzzer symbolic --dataset_dump_path ./train_data/crowdsale.data
```
Run the scripts to select seed integer values and amount values from the training dataset, and put them into `ilf/fuzzers/imitation/int_values.py` and `ilf/fuzzers/imitation/amounts.py`, respectively:
```
$ python3 script/get_int_values.py --train_dir ./train_data
$ python3 script/get_amounts.py --train_dir ./train_data
```
Then the following command performs neural network training and outputs the trained networks in the `new_model` directory:
```
$ mkdir new_model
$ python3 -m ilf --fuzzer imitation --train_dir ./train_data --model ./new_model
```
### Automatically Constructing Truffle Projects
For evaluation and training purposes, one might want to automatically construct Truffle projects from a large set of contracts. To achieve this, one can write a script to automatically produce files required by Truffle projects, following the format in `example/crowdsale`. The compressed file `truffle_scripts.tar.gz` contains the scripts we used. Those scripts might not run directly but can give you a high level idea how things work.
## Citing ILF
```
@inproceedings{He:2019:LFS:3319535.3363230,
author = {He, Jingxuan and Balunovi\'{c}, Mislav and Ambroladze, Nodar and Tsankov, Petar and Vechev, Martin},
title = {Learning to Fuzz from Symbolic Execution with Application to Smart Contracts},
booktitle = {Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security},
series = {CCS '19},
year = {2019},
isbn = {978-1-4503-6747-9},
location = {London, United Kingdom},
pages = {531--548},
numpages = {18},
url = {http://doi.acm.org/10.1145/3319535.3363230},
doi = {10.1145/3319535.3363230},
acmid = {3363230},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {fuzzing, imitation learning, smart contracts, symbolic execution},
}
```
## Contributors
* [Jingxuan He](https://www.sri.inf.ethz.ch/people/jingxuan)
* [Mislav Balunović](https://www.sri.inf.ethz.ch/people/mislav)
* Nodar Ambroladze
* [Petar Tsankov](https://www.sri.inf.ethz.ch/people/petar)
* [Martin Vechev](https://www.sri.inf.ethz.ch/people/martin)
* Anton Permenev
## License and Copyright
* Copyright (c) 2019 [Secure, Reliable, and Intelligent Systems Lab (SRI), ETH Zurich](https://www.sri.inf.ethz.ch/)
* Licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0)