{"id":13785730,"url":"https://github.com/memtt/numaprof","last_synced_at":"2026-02-27T04:12:56.057Z","repository":{"id":39648715,"uuid":"120311305","full_name":"memtt/numaprof","owner":"memtt","description":"NUMAPROF is a NUMA memory profliler based on Pintool to track your remote memory accesses.","archived":false,"fork":false,"pushed_at":"2025-06-20T17:12:15.000Z","size":6764,"stargazers_count":47,"open_issues_count":24,"forks_count":8,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-06-20T18:26:45.648Z","etag":null,"topics":["instrumentation","memory","numa","profiler"],"latest_commit_sha":null,"homepage":"https://memtt.github.io/numaprof","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/memtt.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"Authors.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-05T13:54:13.000Z","updated_at":"2025-06-20T17:11:42.000Z","dependencies_parsed_at":"2024-01-31T16:36:41.841Z","dependency_job_id":"cd7cab1a-c6a2-4756-b67c-b6dc112e593b","html_url":"https://github.com/memtt/numaprof","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/memtt/numaprof","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memtt%2Fnumaprof","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memtt%2Fnumaprof/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memtt%2Fnumaprof/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memtt%2Fnumaprof/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/memtt","download_url":"https://codeload.github.com/memtt/numaprof/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/memtt%2Fnumaprof/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29884515,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-26T23:51:21.483Z","status":"online","status_checked_at":"2026-02-27T02:00:06.759Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["instrumentation","memory","numa","profiler"],"created_at":"2024-08-03T19:01:03.904Z","updated_at":"2026-02-27T04:12:56.051Z","avatar_url":"https://github.com/memtt.png","language":"C++","funding_links":[],"categories":["Observation and profiling tools"],"sub_categories":[],"readme":"Numaprof\n========\n\nWhat is it ?\n------------\n\nNumaprof is a NUMA memory profiler. The idea is to instrument the read/write operations in \nthe application and check the NUMA location of the thread at \naccesss time to compare it to the memory location of the data.\n\nThe tool is currently based on Pintool, a dynamic instrumentation tool from Intel offering a little bit\nthe same service than valgrind but supporting threads so faster for parallel applications.\n\nYou can find more details and screenshots on the dedicated website: https://memtt.github.io/numaprof/.\n\n![NUMAPROF GUI](https://memtt.github.io/numaprof/images/screenshots/screenshot-6-2.png)\n\nMetrics\n-------\n\nNumaprof extract the given metrics per call site and per malloc call site :\n\n * firstTouch : permet de savoir où ont lieu les first touch depuis un thread bindé\n * unpinnedFirstTouch : permet de savoir où ont lieu les first touch depuis des thread non bindés\n * localAccess : Permet de compter les accès locaux (via un thread bindé)\n * remoteAccess : Permet de compter les accès distant (via un thread bindé)\n * unpinnedPageAccess : Accès depuis un thread bindé à une page dont le thread ayant fait le first touch était non bindé\n * unpinnedThreadAccess : Accès depuis un thread non bindé à une page dont le thread ayant fait le first touch était bindé\n * unpinnedBothAccess : Accès depuis un thread non bindé à une page mise en place par un thread non bindé\n * mcdram : Accès à la mcdram sur KNL\n \nDependencies\n------------\n\nNUMAPROF needs:\n\n * CMake (required to build, greated than 2.8.8) : https://cmake.org/\n * Intel Pintool (required, tested : 3.24) : https://software.intel.com/en-us/articles/pin-a-binary-instrumentation-tool-downloads. Take care of the licence which is free only for non commercial use.\n * Python (required). To run the webserver.\n * Qt5-webkit (optional, greater than : 5.4). To provide a browser embedded view to use ssh X forward instead of the webserver port forwarding.\n * libnuma or numactl devel package. This is required to use the profiler.\n * Optionnaly you can install google-test and google-mock to avoid the warnings on recent system of the in source embedded version. (tested is 1.11 under ubuntu 22.04).\n\nIf you use the git repo (eg. master branch) instead of a release file :\n\n * NodeJS/npm. To fetch the JavaScript libraries used by the web GUI. If you use a release archive, they already contain all the required JS files so you don't need anymore NodeJS.\n * Python pip. To download the dependencies of the web server for the web GUI. Again you can use a release archive which already contain all those files.\n\nIf you don't have npm and pip on your server, prefer using the release archive which already contain all the required\nlibraries and do not depend anymore on those two commands.\n\nInstall\n-------\n\nFirst download the last version of pintool (tested : 3.24 on x86_64 arch : https://software.intel.com/en-us/articles/pin-a-binary-instrumentation-tool-downloads) and extract it somewhere.\nTAKE CARE, PINTOOL IS NOT OPEN-SOURCE AND IS FREE ONLY FOR NON-COMMERCIAL USE.\n\nThen use the configure script :\n\n```\nmkdir build\ncd build\n../configure --prefix=PREFIX --with-pintool=PINTOOL_PATH\nmake\nmake install\n```\n\nFor those who prefer cmake, the configure is just a wrapper to provide autotools-like semantic and `--help`.\nYou can of course call cmake as you want in place of it. Notice the script provide the cmake command if you\nuse `--show` option.\n\nUsage\n-----\n\nSetup your paths (you can also use absolute paths if you don't want to change your env): \n\n```\nexport PATH=PREFIX/bin:$PATH\n```\n\nRun you program using the wrapper:\n\n```\nnumaprof ./benchmark --my-option\n```\n\nThe numaprof GUI is based on a webserver and be viewed in the browser at http://localhost:8080.\nThe GUI password is currently fixed to admin/admin. You can launch the webserver by running : \n\n```\nnumaprof-webview numaprof-1234.json\n```\n\nThe first time you launch the GUI you will need to provide a user/password to secure the interface.\nYou can change the password or add other users by using :\n\n```\nnumaprof-passwd {USER}\n```\n\nThe users are stored into `~/.numaprof/htpasswd` by following the `htpasswd` format.\n\nIf you run the webview on a remote node, you can forward the http session to your local browser by using :\n\n```\nssh myhost -L8080:localhost:8080\n```\n\nIf you have Qt5-webkit installed you can also automatically open a bowser view by using ssh X-Forward by using :\n\n```\nnumaprof-qt5 numaprof-1234.json\n```\n\nMPI Support\n-----------\n\nIf you want to profile an MPI application you will get a profile per process so at least\none per rank.\n\nIn order to name the files with the given MPI rank instead of the PID you can add the option :\n\n```sh\nmpirun -np 16 numaprof --mpi ./my_program\n```\n\nKcachgrind compatibility\n------------------------\n\nIf you want to generate the callgrind compatible output, use:\n\n```\nnumaprof-to-callgrind numaprof-45689.json\n```\n\nThen you can open the callgrind file with kcachegrind (http://kcachegrind.sourceforge.net/html/Home.html):\n\n```\nkcachegrind numaprof-12345.callgrind\n```\n\nAvailable options\n-----------------\n\nHere the config file which can be given by using `-c FILE` option to numaprof-pintool. You can also give a specific entry\nby using `-o SECTION:NAME=value,SECTION2:NAME2=value2`.\n\n```ini\n[output]\nname=numaprof-%1-%2.%3\nindent=true\njson=true\ndumpConfig=false\nsilent=false\nremoveSmall=false\nremoveRatio=0.5\n\n[core]\nskipStackAccesses=true\nthreadCacheEntries=512\nobjectCodePinned=false\nskipBinaries=\naccessBatchSize=0\n\n[info]\nhidden=false\n\n[cache]\n;can be 'dummy' or 'L1' or 'L1_static'\ntype=dummy\nsize=32K\nassociativity=8\n\n[mpi]\nuseRank=false\nrankVar=auto\n\n[emulate]\nnuma=-1\n```\n\nOn huge application\n-------------------\n\nNUMAPROF was not yet testing on multi-million line application so we expect some slow down on such big code.\nBut it should be able to work. Although, the web GUI might lag due to too much data. In this case, enable\nfiltering option at profiling time by using option to remove all entries smaller than 0.2% from the output profile:\n\n```\nnumaprof-pintool -o output:removeSmall=true,output:removeRatio=0.2 ./benchmark --my-option\n```\n\nView on another machine\n-----------------------\n\nIf you want to view the NUMAPROF profile on another machine than the one you profiled on, you can\ncopy the json file and open it. Ideally the sources need to be placed at the same path than the one\nwhere you profiled.\n\nIf this is not the case you can use the override option of the GUI to redirect some directories :\n\n```sh\nnumaprof-webview -o /home/my_server_user/server_path/project:/home/my_local_user/loal_path/project ./numaprof-1234.json\n```\n\nnumactl\n-------\n\nIf you want to profile an application while using the `numactl` tool to setup the memory binding you need to use\nthe command line in given order:\n\n```sh\nnumactl {OPTIONS} numaprof-pintool ./MY_APP\n```\n\nCache simulation\n----------------\n\nNUMAPROF report all the memory accesses to account local/remote/MCDRAM. But this is biased compared to the\nreallity as your processor has CPU caches which reduce a lot the accesses to the RAM. If you want to take\nthis into account there is currently a slight cache simulation infrastructure embedded into NUMAOROF.\nIt currently only provide one L1 cache per thread (32K by default) with LRU replacement policy.\nThis does not match with the multi-level and shared caches of current architectures but can be used\nfor example to eliminate spinlocks and access to global variables from the profile as it for sure finish\nin the cache.\n\nCaution, this is currently an **experimental feature**.\n\nYou can enable it by using command line option and can optionally change its size using\nthe standard way to override config file options via command line (or provide a config file) :\n\n```sh\nnumaprof-pintool --cache L1 -o cache:size=32K -o cache:associativity=8 {YOUR_APP}\n```\n\nNot having a NUMA server for dev\n--------------------------------\n\nIf you want to test your application about NUMA without having a NUMA server under your hand,\nyou can use the option `emulate:numa` to make numaprof runnning as it would run on a NUMA\nserver.\n\nIn the option, give the desired number of NUMA nodes to emulated. It should be a multiple\nof the core count which will be distributed over the X requested NUMA domains.\n\nNotice that in this case NUMAPROF provide a purely theoritical view not fetching any\ninfos from the OS about NUMA as it request for a normal run. \n\nPointers:\n---------\n\nIf you search pointers about similar tools, interesting related papers, you can refer to the [docs/bibliography.md](https://github.com/memtt/numaprof/blob/master/doc/bibliography.md) file.\n\nLicense\n-------\n\nNumaprof is distributed under CeCILL-C licence which is LGPL compatible.\nTake care, NUMAPROF currently strongly depend on Intel Pintool which is free only for non commercial use.\n\nI would like to make a port to DynamoRIO to avoid this, if someone want to help !.\n\nTo cite\n-------\n\nIf you publish about NUMAPROF, you cite this research paper as reference :\n\n```\nValat, S., Bouizi, O. (2019). NUMAPROF, A NUMA Memory Profiler. In: Mencagli, G., et al.\nEuro-Par 2018: Parallel Processing Workshops. Euro-Par 2018. Lecture Notes in Computer\nScience(), vol 11339. Springer, Cham. https://doi.org/10.1007/978-3-030-10549-5_13\n```\n\nDiscussion\n----------\n\nYou can join the google group to exchange ideas and ask questions : https://groups.google.com/forum/#!forum/memtt-numaprof.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmemtt%2Fnumaprof","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmemtt%2Fnumaprof","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmemtt%2Fnumaprof/lists"}