Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seahorn/sea-dsa
A new context, field, and array-sensitive heap analysis for LLVM bitcode based on DSA.
https://github.com/seahorn/sea-dsa
llvm pointer-analysis static-analysis verification
Last synced: 1 day ago
JSON representation
A new context, field, and array-sensitive heap analysis for LLVM bitcode based on DSA.
- Host: GitHub
- URL: https://github.com/seahorn/sea-dsa
- Owner: seahorn
- License: other
- Created: 2017-06-29T16:25:09.000Z (over 7 years ago)
- Default Branch: main
- Last Pushed: 2024-06-13T18:09:15.000Z (8 months ago)
- Last Synced: 2025-01-24T03:38:08.779Z (9 days ago)
- Topics: llvm, pointer-analysis, static-analysis, verification
- Language: C++
- Homepage:
- Size: 1.56 MB
- Stars: 163
- Watchers: 12
- Forks: 30
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: license.txt
Awesome Lists containing this project
README
# SeaDsa: A Points-to Analysis for Verification of Low-level C/C++ #
`SeaDsa` is a context-, field-, and array-sensitive unification-based
points-to analysis for LLVM bitcode inspired
by [DSA](http://llvm.org/pubs/2003-11-15-DataStructureAnalysisTR.ps).
`SeaDsa` is an order of magnitude more scalable and precise than `Dsa`
and a previous implementation of `SeaDsa` thanks to improved handling
of context sensitivity, addition of partial flow-sensitivity, and type-awareness.Although `SeaDsa` can analyze arbitrary LLVM bitcode, it has been
tailored for use in program verification of C/C++ programs. It can be
used as a stand-alone tool or together with
the [SeaHorn](https://github.com/seahorn/seahorn)
verification framework and its analyses.This branch supports LLVM 14.
## Requirements ##
`SeaDsa` is written in C++ and uses the Boost library. The main requirements
are:- C++ compiler supporting c++14
- Boost >= 1.65
- LLVM 14To run tests, install the following packages:
- `sudo pip install lit OutputCheck`
- `sudo easy_install networkx`
- `sudo apt-get install libgraphviz-dev`
- `sudo easy_install pygraphviz`## Project Structure ##
1. The main Points-To Graph data structures, `Graph`, `Cell`, and `Node`, are
defined in `include/Graph.hh` and `src/Graph.cc`.
2. The *Local* analysis is in `include/Local.hh` and `src/DsaLocal.cc`.
3. The *Bottom-Up* analysis is in `include/BottomUp.hh` and
`src/DsaBottomUp.cc`.
4. The *Top-Down* analysis is in `include/TopDown.hh` and `src/DsaTopDown.cc`.
5. The interprocedural node cloner is in `include/Cloner.hh` and
`src/Clonner.cc`.
6. Type handling code is in `include/FieldType.hh`, `include/TypeUtils.hh`,
`src/FieldType.cc`, and `src/TypeUtils.cc`.
7. The allocator function discovery is in `include/AllocWrapInfo.hh` and
`src/AllocWrapInfo.cc`.## Compilation and Usage ##
### Program Verification benchmarks ###
Instructions on running program verification benchmarks, together with recipes
for building real-world projects and our results, can be found in
[tea-dsa-extras](https://github.com/kuhar/tea-dsa-extras).### Integration in other C++ projects (for users) ##
`SeaDsa` contains two directories: `include` and `src`. Since `SeaDsa`
analyzes LLVM bitcode, LLVM header files and libraries must be
accessible when building with `SeaDsa`.If your project uses `cmake` then you just need to add in your
project's `CMakeLists.txt`:include_directories(seadsa/include)
add_subdirectory(seadsa)### Standalone (for developers) ###
If you already installed `llvm-14` on your machine:
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=run -DLLVM_DIR=__here_llvm-14__/share/llvm/cmake ..
cmake --build . --target install
Otherwise:mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=run ..
cmake --build . --target installTo run tests:
cmake --build . --target test-sea-dsa
## Visualizing Memory Graphs and Complete Call Graphs ##
Consider a C program called `tests/c/simple.c`:
``` c
#includetypedef struct S {
int** x;
int** y;
} S;int g;
int main(int argc, char** argv){
S s1, s2;
int* p1 = (int*) malloc(sizeof(int));
int* q1 = (int*) malloc(sizeof(int));
s1.x = &p1;
s1.y = &q1;
*(s1.x) = &g;
return 0;
}```
1. Generate bitcode:
clang -O0 -c -emit-llvm -S tests/c/simple.c -o simple.ll
The option `-O0` is used to disable clang optimizations. In general,
it is a good idea to enable clang optimizations. However, for trivial
examples like `simple.c`, clang simplifies too much so nothing useful
would be observed. The options `-c -emit-llvm -S` generate bitcode in
human-readable format.2. Run `sea-dsa` on the bitcode and print memory graphs to [dot](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) format:
seadsa -sea-dsa=butd-cs -sea-dsa-type-aware -sea-dsa-dot simple.ll
The options `-sea-dsa=butd-cs -sea-dsa-type-aware` enable the analysis
implemented in our FMCAD'19 paper (see References). This command will
generate a `FUN.mem.dot` file for each function `FUN` in the bitcode
program. In our case, the only function is `main` and thus, there is
one file named `main.mem.dot`. The file is generated in the current
directory. If you want to store the `.dot` files in a different
directory `DIR` then add the option `-sea-dsa-dot-outdir=DIR`3. Visualize `main.mem.dot` by transforming it to a `pdf` file:
dot -Tpdf main.mem.dot -o main.mem.pdf
open main.mem.pdf // replace with you favorite pdf viewer
![Example of a memory graph](https://github.com/seahorn/sea-dsa/blob/tea-dsa/tests/expected_graphs/simple.jpg?raw=true)In our memory model, a pointer value is represented by a __cell__
which is a pair of a memory object and offset. Memory objects are
represented as nodes in the memory graph. Edges are between cells.Each node field represents a cell (i.e., an offset in the node). For
instance, the node fields `<0,i32**>` and `<8,i32**>` pointed by `%6`
and `%15`, respectively are two different cells from the same memory
object. The field `<8,i32**>` represents the cell at offset 8 in the
corresponding memory object and its type is `i32**`. Black edges
represent points-to relationships between cells. They are labeled with
a number that represents the offset in the destination node. Blue
edges connect formal parameters of the function with a cell. Purple
edges connect LLVM pointer variables with cells. Nodes can have
markers such as `S` (stack allocated memory), `H` (heap allocate
memory), `M` (modified memory), `R` (read memory), `E` (externally
allocated memory), etc. If a node is red then it means that the
analysis lost field sensitivity for that node. The label `{void}` is
used to denote that the node has been allocated but it has not been
used by the program.`sea-dsa` can also resolve indirect calls. An _indirect call_ is a
call where the callee is not known statically. `sea-dsa` identifies
all possible callees of an indirect call and generates a LLVM call
graph as output.Consider this example in `tests/c/complete_callgraph_5.c`:
``` c
struct class_t;
typedef int (*FN_PTR)(struct class_t *, int);
typedef struct class_t {
FN_PTR m_foo;
FN_PTR m_bar;
} class_t;int foo(class_t *self, int x)
{
if (x > 10) {
return self->m_bar(self, x + 1);
} else
return x;
}int bar (class_t *self, int y) {
if (y < 100) {
return y + self->m_foo(self, 10);
} else
return y - 5;
}int main(void) {
class_t obj;
obj.m_foo = &foo;
obj.m_bar = &bar;
int res;
res = obj.m_foo(&obj, 42);
return 0;
}
```Type the commands:
clang -c -emit-llvm -S tests/c/complete_callgraph_5.c -o ex.ll
sea-dsa --sea-dsa-callgraph-dot ex.llIt generates a `.dot` file called `callgraph.dot` in the current
directory. Again, the `.dot` file can be converted to a `.pdf` file
and opened with the commands:dot -Tpdf callgraph.dot -o callgraph.pdf
open callgraph.pdf![Example of a call graph](https://github.com/seahorn/sea-dsa/blob/tea-dsa/tests/expected_graphs/complete_callgraph_5.jpg?raw=true)
`sea-dsa` can also print some statistics about the call graph
resolution process (note that you need to call `clang` with `-g` to
print file,line, and column information):sea-dsa --sea-dsa-callgraph-stats ex.ll
=== Sea-Dsa CallGraph Statistics ===
** Total number of indirect calls 0
** Total number of resolved indirect calls 3%16 = call i32 %12(%struct.class_t* %13, i32 %15) at tests/c/complete_callgraph_5.c:14:12
RESOLVED
Callees:
i32 bar(%struct.class_t*,i32)
%15 = call i32 %13(%struct.class_t* %14, i32 10) at tests/c/complete_callgraph_5.c:23:16
RESOLVED
Callees:
i32 foo(%struct.class_t*,i32)
%11 = call i32 %10(%struct.class_t* %2, i32 42) at tests/c/complete_callgraph_5.c:36:9
RESOLVED
Callees:
i32 foo(%struct.class_t*,i32)
## Dealing with C/C++ library and external calls ##
The pointer semantics of external calls can be defined by writing a
wrapper that calls any of these functions defined in
`seadsa/seadsa.h`:- `extern void seadsa_alias(const void *p, ...);`
- `extern void seadsa_collapse(const void *p);`
- `extern void seadsa_mk_seq(const void *p, unsigned sz);``seadsa_alias` unifies all argument's cells, `seadsa_collapse` tells
`sea-dsa` to collapse (i.e., loss of field-sensitivity) the cell
pointed by `p`, and `seadsa_mk_seq` tells `sea-dsa` to mark as
_sequence_ the node pointed by `p` with size `sz`.For instance, consider an external call `foo` defined as follows:
extern void* foo(const void*p1, void *p2, void *p3);
Suppose that the returned pointer should be unified to `p2` but not to
`p1`. In addition, we would like to collapse the cell corresponding to
`p3`. Then, we can replace the above prototype of `foo` with the
following definition:#include "seadsa/seadsa.h"
void* foo(const void*p1, void *p2, void*p3) {
void* r = seadsa_new();
seadsa_alias(r,p2);
seadsa_collapse(p3);
return r;
}## References ##
1. "A Context-Sensitive Memory Model for Verification of C/C++
Programs" by A. Gurfinkel and J. A. Navas. In SAS'17.
([Paper](https://jorgenavas.github.io/papers/sea-dsa-SAS17.pdf))
| ([Slides](https://jorgenavas.github.io/slides/sea-dsa-SAS17-slides.pdf))2. "Unification-based Pointer Analysis without Oversharing" by J. Kuderski, J. A. Navas and A. Gurfinkel. In FMCAD'19.
([Paper](https://jorgenavas.github.io/papers/tea-dsa-fmcad19.pdf))