{"id":21882427,"url":"https://github.com/seahorn/sea-dsa","last_synced_at":"2025-04-09T20:10:42.797Z","repository":{"id":26370230,"uuid":"95797011","full_name":"seahorn/sea-dsa","owner":"seahorn","description":"A new context, field, and array-sensitive heap analysis for LLVM bitcode based on DSA.","archived":false,"fork":false,"pushed_at":"2024-06-13T18:09:15.000Z","size":1639,"stargazers_count":165,"open_issues_count":11,"forks_count":30,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-04-09T20:10:38.383Z","etag":null,"topics":["llvm","pointer-analysis","static-analysis","verification"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seahorn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-29T16:25:09.000Z","updated_at":"2025-03-24T21:53:13.000Z","dependencies_parsed_at":"2024-06-13T20:57:45.242Z","dependency_job_id":"327f1f29-4df2-486e-8416-0a4f00b8d65a","html_url":"https://github.com/seahorn/sea-dsa","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seahorn%2Fsea-dsa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seahorn%2Fsea-dsa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seahorn%2Fsea-dsa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seahorn%2Fsea-dsa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seahorn","download_url":"https://codeload.github.com/seahorn/sea-dsa/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248103872,"owners_count":21048245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llvm","pointer-analysis","static-analysis","verification"],"created_at":"2024-11-28T09:29:17.530Z","updated_at":"2025-04-09T20:10:42.761Z","avatar_url":"https://github.com/seahorn.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SeaDsa: A Points-to Analysis for Verification of Low-level C/C++ #\n\n\u003ca href=\"https://github.com/seahorn/sea-dsa/actions\"\u003e\u003cimg src=\"https://github.com/seahorn/sea-dsa/workflows/CI/badge.svg\" title=\"Ubuntu 22.04 LTS 64bit, clang++-14\"/\u003e\u003c/a\u003e\n\n\n`SeaDsa` is a context-, field-, and array-sensitive unification-based\npoints-to analysis for LLVM bitcode inspired\nby [DSA](http://llvm.org/pubs/2003-11-15-DataStructureAnalysisTR.ps).\n`SeaDsa` is an order of magnitude more scalable and precise than `Dsa`\nand a previous implementation of `SeaDsa` thanks to improved handling\nof context sensitivity, addition of partial flow-sensitivity, and type-awareness.  \n\nAlthough `SeaDsa` can analyze arbitrary LLVM bitcode, it has been\ntailored for use in program verification of C/C++ programs. It can be\nused as a stand-alone tool or together with\nthe [SeaHorn](https://github.com/seahorn/seahorn)\nverification framework and its analyses.\n\nThis branch supports LLVM 14.\n\n## Requirements ## \n\n`SeaDsa` is written in C++ and uses the Boost library. The main requirements\nare: \n\n- C++ compiler supporting c++14\n- Boost \u003e= 1.65\n- LLVM 14\n\nTo run tests, install the following packages:\n\n- `sudo pip install lit OutputCheck`\n- `sudo easy_install networkx`\n- `sudo apt-get install libgraphviz-dev`\n- `sudo easy_install pygraphviz`\n\n## Project Structure ##\n1. The main Points-To Graph data structures, `Graph`, `Cell`, and `Node`, are\n   defined in `include/Graph.hh` and `src/Graph.cc`.\n2. The *Local* analysis is in `include/Local.hh` and `src/DsaLocal.cc`.\n3. The *Bottom-Up* analysis is in `include/BottomUp.hh` and\n   `src/DsaBottomUp.cc`.\n4. The *Top-Down* analysis is in `include/TopDown.hh` and `src/DsaTopDown.cc`.\n5. The interprocedural node cloner is in `include/Cloner.hh` and\n   `src/Clonner.cc`.\n6. Type handling code is in `include/FieldType.hh`, `include/TypeUtils.hh`, \n   `src/FieldType.cc`, and `src/TypeUtils.cc`.\n7. The allocator function discovery is in `include/AllocWrapInfo.hh` and\n   `src/AllocWrapInfo.cc`.\n\n## Compilation and Usage ##\n\n### Program Verification benchmarks ###\nInstructions on running program verification benchmarks, together with recipes\nfor building real-world projects and our results, can be found in\n[tea-dsa-extras](https://github.com/kuhar/tea-dsa-extras).\n\n### Integration in other C++ projects (for users) ## \n\n`SeaDsa` contains two directories: `include` and `src`. Since `SeaDsa`\nanalyzes LLVM bitcode, LLVM header files and libraries must be\naccessible when building with `SeaDsa`.\n\nIf your project uses `cmake` then you just need to add in your\nproject's `CMakeLists.txt`:\n\n\t include_directories(seadsa/include)\n\t add_subdirectory(seadsa)\n\n### Standalone (for developers) ###\n\nIf you already installed `llvm-14` on your machine:\n\n    mkdir build \u0026\u0026 cd build\n\tcmake -DCMAKE_INSTALL_PREFIX=run -DLLVM_DIR=__here_llvm-14__/share/llvm/cmake  ..\n   \tcmake --build . --target install\n\t\nOtherwise:\n\n    mkdir build \u0026\u0026 cd build\n\tcmake -DCMAKE_INSTALL_PREFIX=run ..\n    cmake --build . --target install\n\nTo run tests:\n\n\tcmake --build . --target test-sea-dsa\n\n## Visualizing Memory Graphs and Complete Call Graphs ##\n\nConsider a C program called `tests/c/simple.c`:\n\n``` c\n#include \u003cstdlib.h\u003e\n\ntypedef struct S {\n  int** x;\n  int** y;  \n} S;\n\nint g;\n\nint main(int argc, char** argv){\n\n  S s1, s2;\n\n  int* p1 = (int*) malloc(sizeof(int));\n  int* q1 = (int*) malloc(sizeof(int));  \n  s1.x = \u0026p1;\n  s1.y = \u0026q1;    \n  *(s1.x) = \u0026g;\n  \n  return 0;\n}   \n\n```\n\n1. Generate bitcode:\n\n\t    clang -O0 -c -emit-llvm -S tests/c/simple.c -o simple.ll\n\nThe option `-O0` is used to disable clang optimizations. In general,\nit is a good idea to enable clang optimizations. However, for trivial\nexamples like `simple.c`, clang simplifies too much so nothing useful\nwould be observed. The options `-c -emit-llvm -S` generate bitcode in\nhuman-readable format.\n\n2. Run `sea-dsa` on the bitcode and print memory graphs to [dot](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) format:\n\n\t    seadsa -sea-dsa=butd-cs -sea-dsa-type-aware -sea-dsa-dot  simple.ll\n\nThe options `-sea-dsa=butd-cs -sea-dsa-type-aware` enable the analysis\nimplemented in our FMCAD'19 paper (see References). This command will\ngenerate a `FUN.mem.dot` file for each function `FUN` in the bitcode\nprogram. In our case, the only function is `main` and thus, there is\none file named `main.mem.dot`.  The file is generated in the current\ndirectory. If you want to store the `.dot` files in a different\ndirectory `DIR` then add the option `-sea-dsa-dot-outdir=DIR`\n\n3. Visualize `main.mem.dot` by transforming it to a `pdf` file:\n\n\t\tdot -Tpdf main.mem.dot -o main.mem.pdf\n\t\topen main.mem.pdf  // replace with you favorite pdf viewer \n\t\n![Example of a memory graph](https://github.com/seahorn/sea-dsa/blob/tea-dsa/tests/expected_graphs/simple.jpg?raw=true)\n\nIn our memory model, a pointer value is represented by a __cell__\nwhich is a pair of a memory object and offset. Memory objects are\nrepresented as nodes in the memory graph. Edges are between cells.\n\nEach node field represents a cell (i.e., an offset in the node). For\ninstance, the node fields `\u003c0,i32**\u003e` and `\u003c8,i32**\u003e` pointed by `%6`\nand `%15`, respectively are two different cells from the same memory\nobject. The field `\u003c8,i32**\u003e` represents the cell at offset 8 in the\ncorresponding memory object and its type is `i32**`.  Black edges\nrepresent points-to relationships between cells. They are labeled with\na number that represents the offset in the destination node. Blue\nedges connect formal parameters of the function with a cell. Purple\nedges connect LLVM pointer variables with cells.  Nodes can have\nmarkers such as `S` (stack allocated memory), `H` (heap allocate\nmemory), `M` (modified memory), `R` (read memory), `E` (externally\nallocated memory), etc. If a node is red then it means that the\nanalysis lost field sensitivity for that node. The label `{void}` is\nused to denote that the node has been allocated but it has not been\nused by the program.\n\n`sea-dsa` can also resolve indirect calls. An _indirect call_ is a\ncall where the callee is not known statically. `sea-dsa` identifies\nall possible callees of an indirect call and generates a LLVM call\ngraph as output.\n\nConsider this example in `tests/c/complete_callgraph_5.c`:\n\n\n``` c\nstruct class_t;\ntypedef int (*FN_PTR)(struct class_t *, int);\ntypedef struct class_t {\n  FN_PTR m_foo;\n  FN_PTR m_bar;\n} class_t;\n\nint foo(class_t *self, int x)\n{\n  if (x \u003e 10) {\n    return self-\u003em_bar(self, x + 1);\n  } else\n    return x;\n}\n\nint bar (class_t *self, int y) {\n  if (y \u003c 100) {\n    return y + self-\u003em_foo(self, 10);\n  } else\n    return y - 5;\n}\n\nint main(void) {\n  class_t obj;\n  obj.m_foo = \u0026foo;\n  obj.m_bar = \u0026bar;\n  int res;\n  res = obj.m_foo(\u0026obj, 42);\n  return 0;\n}\n```\n\nType the commands:\n\n    clang -c -emit-llvm -S tests/c/complete_callgraph_5.c  -o ex.ll\n    sea-dsa --sea-dsa-callgraph-dot ex.ll\n\nIt generates a `.dot` file called `callgraph.dot` in the current\ndirectory. Again, the `.dot` file can be converted to a `.pdf` file\nand opened with the commands:\n\n\tdot -Tpdf callgraph.dot -o callgraph.pdf\n\topen callgraph.pdf  \n\n![Example of a call graph](https://github.com/seahorn/sea-dsa/blob/tea-dsa/tests/expected_graphs/complete_callgraph_5.jpg?raw=true)\n\n`sea-dsa` can also print some statistics about the call graph\nresolution process (note that you need to call `clang` with `-g` to\nprint file,line, and column information):\n\n    sea-dsa --sea-dsa-callgraph-stats ex.ll\n\n\n    === Sea-Dsa CallGraph Statistics === \n    ** Total number of indirect calls 0\n    ** Total number of resolved indirect calls 3\n\n    %16 = call i32 %12(%struct.class_t* %13, i32 %15) at tests/c/complete_callgraph_5.c:14:12\n    RESOLVED\n    Callees:\n\t  i32 bar(%struct.class_t*,i32)\n\t  \n    %15 = call i32 %13(%struct.class_t* %14, i32 10) at tests/c/complete_callgraph_5.c:23:16\n\tRESOLVED\n    Callees:\n      i32 foo(%struct.class_t*,i32)\n\t  \n    %11 = call i32 %10(%struct.class_t* %2, i32 42) at tests/c/complete_callgraph_5.c:36:9\n    RESOLVED\n    Callees:\n\t  i32 foo(%struct.class_t*,i32)\n\t\n\n## Dealing with C/C++ library and external calls ##\n\nThe pointer semantics of external calls can be defined by writing a\nwrapper that calls any of these functions defined in\n`seadsa/seadsa.h`:\n\n- `extern void seadsa_alias(const void *p, ...);`\n- `extern void seadsa_collapse(const void *p);`\n- `extern void seadsa_mk_seq(const void *p, unsigned sz);`\n\n`seadsa_alias` unifies all argument's cells, `seadsa_collapse` tells\n`sea-dsa` to collapse (i.e., loss of field-sensitivity) the cell\npointed by `p`, and `seadsa_mk_seq` tells `sea-dsa` to mark as\n_sequence_ the node pointed by `p` with size `sz`. \n\nFor instance, consider an external call `foo` defined as follows:\n\n\textern void* foo(const void*p1, void *p2, void *p3);\n\nSuppose that the returned pointer should be unified to `p2` but not to\n`p1`. In addition, we would like to collapse the cell corresponding to\n`p3`. Then, we can replace the above prototype of `foo` with the\nfollowing definition:\n\n\t#include \"seadsa/seadsa.h\"\n\tvoid* foo(const void*p1, void *p2, void*p3) {\n\t\tvoid* r = seadsa_new();\n\t\tseadsa_alias(r,p2);\n\t\tseadsa_collapse(p3);\n\t\treturn r;\n\t}\n\n\n## References ## \n\n1. \"A Context-Sensitive Memory Model for Verification of C/C++\n   Programs\" by A. Gurfinkel and J. A. Navas. In SAS'17.\n   ([Paper](https://jorgenavas.github.io/papers/sea-dsa-SAS17.pdf))\n   | ([Slides](https://jorgenavas.github.io/slides/sea-dsa-SAS17-slides.pdf))\n\n2. \"Unification-based Pointer Analysis without Oversharing\" by J. Kuderski, J. A. Navas and A. Gurfinkel. In FMCAD'19. \n   ([Paper](https://jorgenavas.github.io/papers/tea-dsa-fmcad19.pdf))\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseahorn%2Fsea-dsa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseahorn%2Fsea-dsa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseahorn%2Fsea-dsa/lists"}