{"id":17632743,"url":"https://github.com/romnn/nvbit-rs","last_synced_at":"2025-05-05T22:37:43.972Z","repository":{"id":63486965,"uuid":"563420972","full_name":"romnn/nvbit-rs","owner":"romnn","description":"Rust bindings to the NVIDIA NVBIT binary instrumentation API","archived":false,"fork":false,"pushed_at":"2023-10-10T23:18:19.000Z","size":9950,"stargazers_count":3,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-09T21:19:16.217Z","etag":null,"topics":["cuda","ffi","gpgpu","instrumentation","nvbit","nvidia","profiling","ptx","rust","sass","tracing"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/romnn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-11-08T15:21:08.000Z","updated_at":"2025-02-03T10:00:33.000Z","dependencies_parsed_at":"2024-10-23T07:19:16.387Z","dependency_job_id":"ab978c43-3e76-400b-a2bd-d0dd874fac74","html_url":"https://github.com/romnn/nvbit-rs","commit_stats":{"total_commits":81,"total_committers":1,"mean_commits":81.0,"dds":0.0,"last_synced_commit":"bfa17553832cfbed7874dcb0854e6a0cf7120dfb"},"previous_names":[],"tags_count":37,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romnn%2Fnvbit-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romnn%2Fnvbit-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romnn%2Fnvbit-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romnn%2Fnvbit-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/romnn","download_url":"https://codeload.github.com/romnn/nvbit-rs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252588477,"owners_count":21772685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","ffi","gpgpu","instrumentation","nvbit","nvidia","profiling","ptx","rust","sass","tracing"],"created_at":"2024-10-23T01:45:29.404Z","updated_at":"2025-05-05T22:37:43.955Z","avatar_url":"https://github.com/romnn.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"## nvbit-rs\n\n##### Test it out\n\n```bash\nmake -j -B -C test-apps/\n```\n\n```bash\n# build the mem trace example tracer\ncargo build --release -p mem_trace\n\n# trace a sample application\nLD_PRELOAD=./target/release/libmem_trace.so ./test-apps/vectoradd/vectoradd 100\n```\n\nDone:\n- implement messagepack and json trace dumping\n- that does not work: clean up the nvbit api such that there the context is managed in nvbit-rs and the hooks are just functions\n\n##### Accelsim reference\n```bash\nmake -B -j -C ./tracer_nvbit\nLD_PRELOAD=./tracer_nvbit/tracer_tool/tracer_tool.so ./nvbit-sys/nvbit_release/test-apps/vectoradd/vectoradd\n```\n This will generate files: `./tracer_nvbit/tracer_tool/traces/kernelslist` and `./tracer_nvbit/tracer_tool/traces/stats.csv`.\n\n##### Our implementation\n```bash\ncargo build --release\nmake -B -j -C ./examples/accelsim\nLD_PRELOAD=./target/release/libaccelsim.so ./test-apps/vectoradd/vectoradd 100\nLD_PRELOAD=./target/release/libmem_trace.so ./test-apps/vectoradd/vectoradd 100\n```\n\n```bash\ndocker build --platform linux/amd64 -t builder .\ndocker run --rm -i -v \"$PWD\":/src -v \"$PWD\"/buildcache:/cache builder cargo build\n```\n\n```bash\nnvcc -D_FORCE_INLINES -dc -c -std=c++11 -I../nvbit_release/core -Xptxas -cloning=no -Xcompiler -w  -O3 -Xcompiler -fPIC tracer_tool.cu -o tracer_tool.o\n\nnvcc -D_FORCE_INLINES -I../nvbit_release/core -maxrregcount=24 -Xptxas -astoolspatch --keep-device-functions -c inject_funcs.cu -o inject_funcs.o\n\nnvcc -D_FORCE_INLINES -O3 tracer_tool.o inject_funcs.o -L../nvbit_release/core -lnvbit -lcuda -shared -o tracer_tool.so\n```\n\nnow is a good time to introduce workspaces\nmake the examples individual crates with cargo.toml and build.rs\nwrite the custom tracing kernels per example \nthis way we might finally include the symbol\n\n\n#### TODO - we find that Rust and C++ interop is hard - e.g. `nvbit_get_related_functions` returns `std::vector\u003cCUfunction\u003e`, for which there is no easy binding, even using `\u0026cxx::CxxVector\u003cCUfunction\u003e` does not work because `CUfunction` is a FFI struct (by value).\n  - a possible way is to provide a wrapper that copies to a `cxx::Vec\u003cCUfuncton\u003e` i guess (see [this example](https://github.com/dtolnay/cxx/blob/master/book/src/binding/vec.md#example))\n  - since we are tracing, and this would need to be performed for each unseen function, this copy overhead is not acceptable\n    - TODO: find out how often it is called and maybe still do it and measure\n      - UPDATE: get_related_functions is only called once, try it in rust\n\n- other approach: only receive stuff from the channel, a simple struct...\n  - if that works: how can we decide which tracing function to use\n    - (since we cannot write new ones in rust)\n\n- figure out if we can somehow reuse the same nvbit names by using a namespace??\n- or wrap the calls in rust which calls the `ffi::rust_*` funcs.\n\n- IMPORTANT OBSERVATION:\n  - i almost gave up on `cxx`, because it was only giving me `Unsupported type` errors\n  - i dont import `cxx::UniquePtr` or `cxx::CxxVector` in the `ffi` module, \n    so i was assuming i need to use `cxx::` to reference the types.\n  - they dont in the docs, but use them in the top level module ...\n  - turns out you *need* to omit the `cxx::` prefix because this is all macro magic ...\n\n#### Done\n- we must include the CUDA inject funcs? and the `nvbit_tool.h` into the binary somehow.\n  - maybe statically compile them in the build script\n  - then link them with `nvbit-sys` crate\n  - then, `nvbit_at_init` _should_ be called - hopefully\n\n#### Example\n\nThe current goal is to get a working example of a tracer written in rust.\nUsage should be:\n```bash\n# install lld\nsudo apt-get install -y lld\n# create a shared dynamic library that implements the nvbit hooks\ncargo build -p accelsim\n# run the nvbit example CUDA application with the tracer\nLD_PRELOAD=./target/debug/libaccelsim.so nvbit-sys/nvbit_release/test-apps/vectoradd/vectoradd\n```\n\n#### Notes\n\ncheck the clang versions installed\n```bash\napt list --installed | grep clang\n```\n\nWhen running `clang nvbit.h`, it also complains about missing cassert.\n-std=c++11 \n-I$(NVBIT_PATH)\n\n\u003ccassert\u003e is C++ STL, so we need: `clang++ -std=c++11 nvbit.h`.\n\n`bindgen` does not work that well with C++ code, check [this](https://rust-lang.github.io/rust-bindgen/cpp.html).\n\nwe need some clang stuff so that bindgen can find `#include \u003ccassert\u003e`.\n\nWe will also need to include `nvbit.h`, `nvbit_tool.h`, and tracing injected functions, which require `.cu` files to be compiled and linked with the binary.\n\n[this example](https://github.com/termoshtt/link_cuda_kernel) shows how `.cu` can be compiled and linked with the `cc` crate.\n\nMake sure that the C function hooks of nvbit are not mangled in the shared library:\n```bash\nnm -D ./target/debug/examples/libtracer.so\nnm -D ./target/debug/build/nvbit-sys-08fdef510bde07a0/out/libinstrumentation.a\n```\n\nProblem: we need the `instrument_inst` function to be present in the binary, just like\nfor the example:\n```bash\nnm -D /home/roman/dev/nvbit-sys/tracer_nvbit/tracer_tool/tracer_tool.so | grep instrument\n\n# for a static library:\nnm --debug-syms target/debug/build/accelsim-a67c1762e4619dad/out/libinstrumentation.a | grep instrument\n```\nCurrently, its not :(\n\nMake sure that we link statically:\n```bash\nldd ./target/debug/examples/libtracer.so\n```\n\nCheck what includes `cxx` generated:\n```bash\ntre target/debug/build/nvbit-sys-*/out/cxxbridge/\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fromnn%2Fnvbit-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fromnn%2Fnvbit-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fromnn%2Fnvbit-rs/lists"}