{"id":18363256,"url":"https://github.com/andikleen/simple-pt","last_synced_at":"2025-04-05T16:09:15.254Z","repository":{"id":36432855,"uuid":"40737862","full_name":"andikleen/simple-pt","owner":"andikleen","description":"Simple Intel CPU processor tracing on Linux","archived":false,"fork":false,"pushed_at":"2023-03-01T16:46:21.000Z","size":296,"stargazers_count":346,"open_issues_count":12,"forks_count":77,"subscribers_count":24,"default_branch":"master","last_synced_at":"2025-03-29T15:07:27.279Z","etag":null,"topics":["debug","kernel-driver","performance-analysis","performance-tuning","processor-trace","pt-decoder","trace","x86"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andikleen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2015-08-14T22:03:52.000Z","updated_at":"2025-03-23T00:17:47.000Z","dependencies_parsed_at":"2023-10-20T17:06:11.673Z","dependency_job_id":"3bb0589c-bd75-4b2b-b4a6-e390fba8b142","html_url":"https://github.com/andikleen/simple-pt","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andikleen%2Fsimple-pt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andikleen%2Fsimple-pt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andikleen%2Fsimple-pt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andikleen%2Fsimple-pt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andikleen","download_url":"https://codeload.github.com/andikleen/simple-pt/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247361691,"owners_count":20926643,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["debug","kernel-driver","performance-analysis","performance-tuning","processor-trace","pt-decoder","trace","x86"],"created_at":"2024-11-05T23:05:46.585Z","updated_at":"2025-04-05T16:09:15.211Z","avatar_url":"https://github.com/andikleen.png","language":"C","readme":"![simple-pt](http://halobates.de/spt-logo.png)\n\n# Introduction\n\nsimple-pt is a simple implementation of Intel Processor Trace (PT) on\nLinux. PT can trace all branches executed by the CPU at the hardware level\nwith moderate overhead. simple-pt then decodes the branch trace and\ndisplays a function or instruction level trace.\n\nPT is supported on Intel 5th generation Core (Broadwell), 6th generation Core (Skylake) CPUs,\nand later, as well as Goldmont based Atom CPUs (Intel Joule, Apollo Lake) and later.\n\n# Example\n\n\t% sptcmd  -c tcall taskset -c 0 ./tcall\n\tcpu   0 offset 1027688,  1003 KB, writing to ptout.0\n\t...\n\tWrote sideband to ptout.sideband\n\t% sptdecode --sideband ptout.sideband --pt ptout.0 | less\n\tTIME      DELTA\t INSNs   OPERATION\n\tfrequency 32\n\t0        [+0]     [+   1] _dl_aux_init+436\n\t                  [+   6] __libc_start_main+455 -\u003e _dl_discover_osversion\n\t...\n\t                  [+  13] __libc_start_main+446 -\u003e main\n\t                  [+   9]     main+22 -\u003e f1\n\t                  [+   4]\t      f1+9 -\u003e f2\n\t                  [+   2]\t      f1+19 -\u003e f2\n\t                  [+   5]     main+22 -\u003e f1\n\t                  [+   4]\t      f1+9 -\u003e f2\n\t                  [+   2]\t      f1+19 -\u003e f2\n\t                  [+   5]     main+22 -\u003e f1\n\t...\n\n# Overview\n\nsimple-pt consists of a\n* kernel driver\n* sptcmd to collect data from the kernel driver\n* sptdecode to display function or instruction traces\n* fastdecode to dump raw PT traces\n\nIt uses the [libipt](https://github.com/01org/processor-trace) PT decoding library\n\nNote that Linux 4.1 and later has an [integrated PT implementation](http://lwn.net/Articles/648154/) as part \nof Linux perf. gdb 7.10 also supports full debugging on top of PT. [Intel VTune](https://software.intel.com/en-us/intel-vtune-amplifier-xe)\nalso supports PT.\n\nIf you want a full production system please use one of these. simple-pt is an experimental implementation.\n\nSimple PT does *NOT* support:\n\n* It does not support long term tracing of more data than fits in the buffer (no interrupt) (use perf or VTune)\n* It does not support any sampling (use perf or VTune)\n* It requires root rights to collect data (use perf)\n* It does not support interactive debugging (use gdb or hardware debuggers)\n\nSimple PT has the following functionality:\n* set up hardware to processor trace\n* supports a ring buffer of branch data, stopped on events\n* supports flushing buffer on panic\n* does not require patching the kernel (although it cheats a bit using kprobes)\n* set up PT filters, such as kernel filter, or filter ranges\n* start and stop traces at specific kernel addresses, with unlimited number\n* support tracing multiple processes\n* print all function calls in \"ftrace\" style\n* disassembling all executed instructions (requires xed library, optional)\n* simple driver that could be ported to older kernel releases or other operating systems\n* simple code base that is easily changed.\n* modular \"unix style\" design with simple tools that do only one thing\n* can dump branches before panic to kernel log and decode\n\n# Installation\n\n__Note: simple-pt now requires a new version of libipt (2.x), which has\nan incompatible API. Please update.__\n\n__Note: The installation requirements for simple-pt have changed. It now requires\nthe upstream version of libipt. No special branches needed anymore.\nAlso udis86 has been replaced with xed.__\n\nBuild and install libipt\n\n\tgit clone https://github.com/01org/processor-trace -b stable/v2.0\n\tcd processor-trace\n\tcmake .\n\tmake\n\tsudo make install\n\tsudo ldconfig\n\nInstall libelf-elf-devel or elfutils-devel or similar depending on your distribution.\n\nOptionally install xed if you want to see disassembled instructions:\n\n\tgit clone https://github.com/intelxed/mbuild.git mbuild\n\tgit clone https://github.com/intelxed/xed\n\tcd xed\n\tmkdir obj\n\tcd obj\n\t../mfile.py\n\tsudo ../mfile.py --prefix=/usr/local install\n\nClone simple-pt\n\n\tgit clone https://github.com/andikleen/simple-pt\n\tcd simple-pt\n\nBuild the kernel module. May require installing kernel includes from your distribution.\n\n\tmake \n\nInstall the kernel module\n\n\tsudo make modules_install\n\nBuild the user tools\n\n\tmake user\n\nIf you installed xed use\n\n\tmake user XED=1\n\nCheck if your system supports PT\n\n\t./ptfeature\n\nRun a trace\n\n\tsudo ./sptcmd -c ls ls\n\tsudo ./sptdecode --sideband ptout.sideband --pt ptout.0 | less\n\nOn recent kernels it may be needed to separate page table separation, if you\nwant to use process filtering\n\n\tBoot the kernel with the \"nopti\" argument\n\nsptcmd loads and configures the kernel driver. It runs a program with trace. It always \ndoes a global trace. It writes the pt trace data to trace files for each CPU\n(ptout.N where N is the CPU number). It also writes side band information needed\nto decode the trace into the ptout.sideband file. \n\n-c sets a command filter, tracing only commands with that name. Otherwise\neverything global is traced.\n\nsptdecode then decodes the trace for a CPU using the side band information.\nWhen it should decode kernel code it needs to run as root to be able to\nread /proc/kcore. If it's not run as root kernel code will not be shown.\n\nAnother way to use simple-pt is to run the workload with PT running\nin the background and only dump on an event.\n\nStart trace and dump trace on event:\n\n\tsudo ./sptcmd --enable\n\t\u003crun workload\u003e\n\t\u003csome event of interest happens and triggers:\u003e\n\tsudo ./sptcmd --dump\n\tsudo ./sptdecode --sideband ptout.sideband --pt ptout.0 | less\n\nAnother way is to use --stop-address or --stop-range to stop the trace\non specific kernel symbols being executed. Note that these options\nonly affect the trace on their current CPU.\n\nRun test suite\n\n\tsudo ./tester\n\n# Design overview\n\nThe kernel driver manages the PT hardware and allocates the trace buffers.\nIt also sets up some custom trace points for the sideband data.\n\nThe simple-pt kernel driver is configured using module parameters. Many can be changed\nat runtime through /sys/module/simple_pt/parameters. A few need a driver reload\n\nUse\n\tmodinfo simple-pt.ko\n\nto show all allowed parameters. For most parameters sptcmd has options to\nset them up. That is the recommended interface.\n\nsptcmd configures the driver, starts the trace and runs the trace command.\nThe driver sets up a ring buffer and runs the the processor trace\nfor each CPU until stopped.  Then it calls sptdump to write the buffer\nfor each CPU to a ptout.N file (N is the number of the CPU)\n\nFor the side band information ftrace with some custom trace points defined\nby the driver is used. sptsideband converts the ftrace output into\nthe .sideband files used by the decoder.\n\nsptdecode then reads the PT data, the sideband data, the executables, the kernel code\nthrough /proc/kcore, and uses the libipt decoder to reconstruct the\ntrace.\n\n# Manpages\n\n* [sptcmd](http://halobates.de/spt-man/sptcmd.html)\n* [sptdecode](http://halobates.de/spt-man/sptdecode.html)\n* [ptfeature](http://halobates.de/spt-man/ptfeature.html)\n* [sptarchive](http://halobates.de/spt-man/sptarchive.html)\n* [fastdecode](http://halobates.de/spt-man/fastdecode.html)\n* [sptdump](http://halobates.de/spt-man/sptdump.html)\n\n# Changing the PT buffer sizes\n\nTo change the PT buffer size the driver needs to be loaded manually. The PT\nbuffer size can be changed with the pt_buffer_order parameter.\n\n\trmmod simple_pt # if it was loaded\n\tmodprobe simple_pt pt_buffer_order=10\n\nThe size is specified in 2^n 4K pages. The default is 9 (2MB). The maximum limit\nis the kernel's MAX_ORDER limit, typically 8MB. The allocation may also fail\nif the kernel memory is too fragmented. In this case quitting a large process\nmay help.\n\nWhen ptfeature shows the \"multiple toPA entries\" feature it is possible to\nallocate multiple PT buffers with the pt_num_buffers parameter. All the buffers\nare logically concatenated. The default is one buffer. The maximum is 511\nbuffers.\n\n# Using simple-pt for panic debugging\n\nsimple-pt can be used to print a number of branches before a panic.\n\n\tinsmod simple-pt.ko start=1 print_panic_psbs=4\n\t\u003cpanic system\u003e\n\t\u003ccollect log from serial console\u003e\n\nThe number after print_panic_psbs specifies the length of the logged trace\n(expressed in number of PT sync points)\n\nThe PT information is logged in base64 format to the kernel log.  It can be recovered\nwith the base64log.py utility\n\n\tbase64log.py \u003c log \u003e ptlog\n\tsptdecode --elf vmlinux --pt ptlog\n\nThis method currently does not support modules or ring 3 code, or multiple\nPT buffers.\n\n# Notes\n\n* To limit the program to one CPU use sptcmd taskset -c CPU ..\n* To demangle C++ symbols pipe output through c++filt\n* To start/stop around specific user code bracket it with dummy syscalls that you\n  can then put a kernel trigger on. The test suite uses personality(21212212) and prctl(12341234).\n  This will be improved in the future.\n* perf or the BIOS may be already using the PT hardware. If you know it's safe you can take\n  over the PT hardware with --force -d.\n* When configuring the driver manually you need to manually reset any parameters you do not want anymore.\n  sptcmd takes care of that automatically.\n* Some Debian kernels are built without CONFIG_KALLSYMS_ALL. When you see an \"Cannot find task_lock\"\nerror message load the simple_pt module like this\n\n\tinsmod simple_pt.ko tasklist_lock_ptr=0x$(grep tasklist_lock /boot/System.map-$(uname -r) | awk ' {print $1}')\n* Various older Linux kernels have problems with ftrace in kernel modules. simple-pt relies on ftrace\noutput for its sideband. \"tester\" has a special test. If there are problems likely the workarounds\nin \"compat.h\" (e.g. the ifdefs) need to be adjusted. Upgrading to a newer kernel should fix the problem too.\n* The time in different ptout files collected on the same system without reboot is synchronized.\nHowever the synchronization is not fine grained enough to directly determine causality of nearby memory accesses.\n\n# Current limitations:\n\n* When kernel tracing is disabled (-K) multiple processes cannot be distinguished by the decoder.\n\n* Enabling/Disabling tracing causes the kernel to modify itself, which can cause the PT decoder\n  to lose synchronization. sptcmd disables trace points. Workaround is to keep trace points\n  running after the trace ends with -k, or disable kernel tracing. This can sometimes affect the\n  test suite. If this happens try \"tester -k\"\n\n* sptcmd does not continuously save side band data, so events at the beginning\n  of a trace may not be saved. For complex workloads it may be needed to increase the trace buffers \n  in /sys/kernel/debug/tracing/buffer_size_kb.\n\n* The decoder does not (currently) support reusing the same address region in a process for\n  different code (for example after dlclose/dlopen)\n\n* Tracing JITed code is not supported.\n\n* On Skylake the trace time occasionally jumps backwards after frequency changes.\n\n* Decoder loses synchronization in some cases where it shouldn't.\n\n* Binaries with spaces in the name are not supported (due to limitations in sptsideband.py)\n\n* On 5.7+ kernels using symbol names located in modules in --start/stop-addr will leak the module count\n  of the module.\n\n* On systems with page table isolation active the -C filter can only filter on user code or kernel code,\n  but not both at the same time. To avoid this boot with pti=off. Note this may make the system\n  suspectible to Meltdown.\n\n# Porting simple-pt\n\nThere is some Linux specific code in the driver, but the basic PT hardware configuration\nshould be straight forward to adapt to other environments. The minimum support needed\nis memory allocation, a mechanism to call a callback on all CPUs (IPIs), and a mechanism\nto establish a shared buffer with the decoding tool (implemented using mmap on a character device).\nWhen suspend-to-ram is supported it's also useful to have a callback after resume\nto reinitialize the hardware.\n\nThe kernel driver is configured using global variables with Linux's moduleparams mechanism.\nThis can be replaced with simple hard coded variables.\n\nThe driver supports Linux \"kprobes\" and \"kallsyms\" to set custom triggers. That code\nis all optional and can be removed. Such optional code is generally marked as\noptional.\n\nThe user tools should be portable to POSIX C99 based systems. The code to access the kernel\nimage will need to be adapted.  Porting to non DWARF/ELF based systems will need more work.\n\n# Contact\n\nFor bugs please file a github issue.\n\nAndi Kleen\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandikleen%2Fsimple-pt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandikleen%2Fsimple-pt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandikleen%2Fsimple-pt/lists"}