{"id":18422444,"url":"https://github.com/sri-csl/clam-prov","last_synced_at":"2025-04-13T12:27:57.283Z","repository":{"id":80766048,"uuid":"372689111","full_name":"SRI-CSL/clam-prov","owner":"SRI-CSL","description":"Provenance Tracking with Clam","archived":false,"fork":false,"pushed_at":"2023-04-13T02:54:47.000Z","size":105,"stargazers_count":3,"open_issues_count":3,"forks_count":0,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-04-13T12:27:56.375Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SRI-CSL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-01T03:21:35.000Z","updated_at":"2024-04-22T16:15:11.000Z","dependencies_parsed_at":"2024-11-06T04:31:53.490Z","dependency_job_id":"d92f704e-efdc-437d-b7a8-f38c57809918","html_url":"https://github.com/SRI-CSL/clam-prov","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SRI-CSL%2Fclam-prov","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SRI-CSL%2Fclam-prov/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SRI-CSL%2Fclam-prov/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SRI-CSL%2Fclam-prov/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SRI-CSL","download_url":"https://codeload.github.com/SRI-CSL/clam-prov/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248713571,"owners_count":21149725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T04:30:03.887Z","updated_at":"2025-04-13T12:27:57.246Z","avatar_url":"https://github.com/SRI-CSL.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Provenance Tracking with Clam # \n\n## Description ## \n\nPropagate tags (numerical identifiers) from many sources to many\nsinks.\n\n### How ### \n\nUser defines sources and sinks (memory locations) via a\nconfiguration file (option `-add-metadata-config`):\n\n```\nread, 2, clam-prov-type:input\nread, 2, clam-prov-size:3\nwrite, 2, clam-prov-type:output\nwrite, 2, clam-prov-size:3\n```\n\nThis states that the `clam-prov` type of the second parameter of any \ncall to `read` is an input (i.e., _source_) and the `clam-prov` type of\nthe second parameter of any call to `write` is an output (i.e., _sink_).\nAlso, it states that the size of the second parameter of any call to `read`\nis specified by the third parameter and the size of the second parameter\nof any call to `write` is specified by the third parameter. Note that, in \ngeneral, a program will have _many sources_ and _many sinks_ since it can \nhave many calls to `read` and `write`.\n\nThe analysis then assigns a unique numerical identifier (i.e., tags)\nto each source and it relies on\nthe [Clam static analyzer](https://github.com/seahorn/clam) to\npropagate those tags across memory and function boundaries. The output\nof the analysis maps each sink to a set with all possible\ntags. With this, each sink is connected back to a subset of\nsources. Currently, this output is encoded as metadata named\n`clam-prov-tags`. For instance:\n\n```\n%res = call i64 @write(i32 %param, i8* %param1, i64 %param2), !call-site-metadata !7, !clam-prov-tags !9\n...\n!7 = !{!\"4\", !8}\n!8 = !{!\"2\", !\"clam-prov-type:output\", !\"clam-prov-size:3\"}\n!9 = !{i64 2}\n\n```\n\nThis says that the second parameter of call to `write` (identifier\n`4`) is tagged with tags `{2}`.\n\n  \n\n## Requirements ##\n\n- Install [cmake](https://cmake.org/) \u003e= 3.13 \n- Install [llvm 11](https://releases.llvm.org/download.html)\n- Install [gmp](https://gmplib.org/)\n- Install [boost](https://www.boost.org/) \u003e= 1.65\n\n### To run tests ###\n\n- `lit`: `sudo pip install lit`\n- `cmp` utility\n\n## Compilation and Installation ##\n\n``` bash\n./install.sh\n```\n\n## Checking installation ## \n\nTo run some regression tests:\n\n     cmake --build . --target test-all\n\n## Usage ##\n\n     clam-prov.py  test.c -o test.prov.bc\n     \nBy default, `clam-prov.py` uses the file `config/sources-sinks.config`\nfrom the install directory to know which are the sources and sinks.\nThe output bitcode file `test.prov.bc` contains `call-site-metadata`\nmetadata indicating the tags assigned to both sources (e.g., read\ncalls) and sinks (e.g., write calls).\n\nIf we want to use a different set of sources and sinks then use the\ncommand:\n\n     clam-prov.py  test.c --add-metadata-config=addMetadata.config -o test.prov.bc\n     \n\n## Output Propagated Tags ##\n\nAlternatively, the same information provided by LLVM metadata\n`call-site-metadata` can be printed to a file in [DOT](https://graphviz.org/doc/info/lang.html) format. The tags\npropagated to sinks from sources can be outputted using the argument\n`dependency-map-file` as follows:\n\n     clam-prov.py test.c --add-metadata-config=addMetadata.config --dependency-map-file=dependency_map.dot\n\nFollowing is an example output file `dependency_map.dot`:\n\n```\ndigraph clam_prov_dependency_map{\n\"0\" [label=\"function name:read\\ncall site:0\"];\n\"1\" [label=\"function name:read\\ncall site:1\"];\n\"2\" [label=\"function name:write\\ncall site:2\"];\n\"2\" -\u003e \"0\" [label=\"WasDependentOn\"];\n\"2\" -\u003e \"1\" [label=\"WasDependentOn\"];\n}\n```\n\nThe output above says the following:\n* First call-site `read` has the call site tag `0`\n* Second call-site `read` has the call site tag `1`\n* Third call-site `write` has the call site tag `2`\n* The third call-site (`write`) has the propagated call site tags `0`, and `1`.\n\n## Log call-sites (Linux) ##\n\nLogging can be added to a Linux program to emit call-sites using the argument `add-logging-config` as follows:\n\n     clang -Xclang -disable-O0-optnone -c -emit-llvm test.c -o test.bc\n     clam-pp --crab-devirt test.bc -o test.pp.bc\n     clam-prov test.pp.bc --add-metadata-config=addMetadata.config --add-logging-config=call-site-logging.config -o test.out.pp.bc\n\nThe above specifies the file `call-site-logging.config` to configure how to log the call-sites when program is executed. The configurations must have the keys:\n* `output_mode` - Whether to write to a file (at `~/.clam-prov/audit.log`) or to a pipe (at `~/.clam-prov/audit.pipe`). Specify `0` to write to the file, or specify `1` to write to the pipe\n* `max_records` - The maximum call-site records to buffer before writing to the file or the pipe\n\nThe output is written as a series of records in binary format. Each record contains the following fields in the given order:\n\n* `time in milliseconds` expressed as an unsigned long (8 bytes)\n* `process id` expressed as an integer (4 bytes)\n* `call site tag` expressed as a signed long (8 bytes)\n* `function return value` expressed as a signed long (8 bytes)\n* `name of the function` expressed as a char array (256 bytes)\n\nThe source file [CallSiteLogReader.c](https://github.com/SRI-CSL/clam-prov/blob/master/src/Util/CallSiteLogReader.c) demonstrates how to read the call site log file. \n\nTo be able to generate an executable to log call-sites from `test.out.pp.bc` (above), the shared library must be linked as follows:\n\n```\nllc -relocation-model=pic test.out.pp.bc -o test.out.pp.s \ngcc -L./install/lib test.out.pp.s -o test.out.pp.native -lclamprovlogger\n```\n\nFinally, the generated executable can be executed as:\n\n```\nLD_LIBRARY_PATH=./install/lib ./test.out.pp.native\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsri-csl%2Fclam-prov","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsri-csl%2Fclam-prov","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsri-csl%2Fclam-prov/lists"}