Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/shouc/corbfuzz

Code for ASE'21 Paper "CorbFuzz: Checking Browser Security Policies with Fuzzing"
https://github.com/shouc/corbfuzz

browser-security concolic-execution fuzzing

Last synced: 3 months ago
JSON representation

Code for ASE'21 Paper "CorbFuzz: Checking Browser Security Policies with Fuzzing"

Lists

README

        

# CorbFuzz
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5105112.svg)](https://doi.org/10.5281/zenodo.5105112)

CorbFuzz is a state-aware fuzzer for generating as much reponses from a web application as possible without need of setting up database, etc.

### Obtaining Source Code
Download from this git repo...
```bash
git clone https://github.com/shouc/corbfuzz.git
```

### Building (Docker)

Make sure docker has been installed on your computer and you are running as root. This would take ~10 minutes on [our computer](https://github.com/shouc/corbfuzz/blob/master/REQUIREMENTS.md#our-computer) (P.S. to optimize performance, you can update line 51 in `build.sh` `make -j30` to `make -j{how many processors you have}`).

Notice that you **must** add `--shm-size=6g` so that there is enough memory for the fuzzer.

```bash
docker build . -t corbfuzz
docker run --shm-size=6g --name corbfuzz1 -ti corbfuzz # tapping into docker container
```

If you instead want to use our [pre-built docker image](https://hub.docker.com/repository/docker/shouc/corbfuzz), you can replace previous procedure with
```bash
docker run --shm-size=6g --name corbfuzz1 -ti shouc/corbfuzz:latest
```

### Building (Ubuntu)

The setup script has been tested on Ubuntu 20.10, make sure you are running as root.

If you don't have Chromium & chromedriver on your computer, following script downloads Chromium and setup it up:
```bash
./install_chrome.sh
```

Following script installs all dependency required by the fuzzer, build the instrumented PHP, and setup the environment.
```bash
./build.sh
```

### Running
Start the redis server required by the fuzzer (You don't need to set this for dry run)
```bash
service redis-server start
```

Run the `fake_mysql` first:
```bash
cd /corbfuzz/fake_mysql
nvm use --lts # switch to nodejs 14
rm -f /tmp/rand.sock # remove the existing unix socket
node main.js & # start the fake MySQL server
```

Run instrumented PHP server on port 1194 (the fuzzer defaults to use port 1194):
```bash
cd [Fuzzing Target Location] # for testing, you can use /corbfuzz/test
mkdir cov # if cov directory does not exist
php -dextension=/corbfuzz/hsfuzz.so -S 0.0.0.0:1194 &
```

Run fuzzer at `/corbfuzz` directory in Python console:
```bash
cd /corbfuzz
python3
```

Inside the console, type:
```python3
import main
main.main(0)
```
As for when to terminate the fuzzer, the paper uses coverage growth. If its growth is 0 over 1 minute, we terminate it manually. The following automation script intead simply gives fuzzer 7 minute to run.

You can also run multiple fuzzers, of which the coverage information is synchronized through Redis. To do so, you need to start multiple PHP servers with ports starting from 1194 (e.g., 1194, 1195, 1196). Then, start fuzzer for each
PHP server by running following Python code asynchroneously:
```python3
main.main(1)
main.main(2)
main.main(3)
...
main.main([PORT] - 1194)
```

An example is shown in `start.py`, which starts 10 fuzzers that fuzz port 1194-1214. To run it, you have to start 10 PHP runtimes. You can refer to Reproduce Data Synthesis Effectiveness Figures Section for automation.

### Project Organization

* `fake_mysql/` implements a fake MySQL server listening on port 3306. It handles queries and generate a legit MySQL response. It also listens on `/tmp/rand.sock` for requests on synthesizing a specific query result or notification of type information.

* `extension/` implements a PHP extension that generates the coverage information and aid the data synthesis workflow by generating session/cookies values and dumping their related constraints.

* `php/` is the PHP code base with instrumentation that implements the data synthesis workflow and
interacts with `fake_mysql` through the unix socket. It also hijacks all MySQL connections to our fake MySQL server.

* `scripts/` contains useful scripts for running the oracles and ensuring consistency for the data synthesis workflow.

* `test/` contains a PHP code adapted from an existing PHP project for testing the fuzzer.

* `benchmark/` utils for crawling GitHub/PHP and generating the data for the plots on the paper about data synthesis.
* `main.py`: Crawls potential PHP repos' URL from GitHub and save them to repos.txt
* `get.py`: Clones GitHub repos specified in the repos.txt to the directory specified as argv[1] (e.g. `python3 get.py /tmp/repos`)
* `run.py`: Fuzz through all the projects in the directory specified as argv[1] and generate intermediate benchmark information for following Python scripts to consume.
* `count_edge.py`: Count all the edges from the coverage data generated by the fuzzer for each project.
* `count_violation_comp.py`: Count all the comparison type violations from the data generated by the instrumented PHP for each project.
* `count_violation_internal.py`: Count all the internal function type violations from the data generated by the instrumented PHP for each project.

### Reproduce Data Synthesis Effectiveness Figures

If you are not using the Docker container, you have to modify the paths in `run.py` accordingly.

**Setup**
```bash
service redis-server start
cd /corbfuzz/benchmark/
```

**Crawl Web Apps from GitHub**

Create a GitHub access token with full access to `repo` here: https://github.com/settings/tokens/new, the token should be a string starting with ghp_...

```
vim main.py # put your GitHub access token to the specified location
python3 main.py # crawl recent PHP related repos
# Inspect repos.txt, if there is no data inside or limited data (file < 800 bytes) inside, you should run previous script at a different time.
mkdir /repos

# if you have trouble making a valid repos.txt, you can rename repos_test.txt to repos.txt under benchmark directory.
# repos_test.txt contains PHP repos crawled on Aug 23, 2021.
python3 get.py /repos # clone the repos
```

In case GitHub no longer works / API has changed, you can also replace previous process by requiring the dataset from [us](mailto:[email protected]) and download them directly.
```
wget [DATASET URL]
unzip application.zip
mkdir /repos
mv repos_ok2/* /repos
```

The repos folder contains subfolders with name as a UUID. Each subfolder is a GitHub repo.

**Start fuzzing**

A lot of error may be outputed but it is safe to ignore them.

The script would exit when fuzzing is done. It would take (repo amount) x (7 minutes) time. You can also kill everything at the middle by using CTRL-C then `pkill -f python3` command. The unfuzzed repos' benchmarking result would simply be 0.
```
pwd # => /corbfuzz/benchmark/
python3 run.py /repos
```

**Benchmarking**

For following scripts, if the count printed is 0, you can safely ignore them (what we did in the paper) because that indicates the fuzzer could not find PHP code inside.

Get branch coverage count for each app. Each line outputs the (repo id, edge count)
```
python3 count_edge.py /repos/
```

Get comparison type violation for each app. Each line outputs the (repo id, type violation count, sample size)
```
python3 count_violation_comp.py /repos/
```

Get internal function type violation for each app. Each line outputs the (repo id, type violation count, sample size)
```
python3 count_violation_internal.py /repos/
```

We process the data using Google Docs.

### CORB Oracle
Will release.