{"id":15703882,"url":"https://github.com/benediktalkin/kappaprofiler","last_synced_at":"2025-07-19T00:37:31.028Z","repository":{"id":48241828,"uuid":"507590024","full_name":"BenediktAlkin/KappaProfiler","owner":"BenediktAlkin","description":"lightweight simple profiling for python/pytorch","archived":false,"fork":false,"pushed_at":"2023-03-19T08:37:39.000Z","size":46,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-12T15:54:41.085Z","etag":null,"topics":["cuda","profiler","python","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BenediktAlkin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-06-26T14:02:56.000Z","updated_at":"2024-04-24T13:08:38.000Z","dependencies_parsed_at":"2023-12-17T22:42:12.124Z","dependency_job_id":"d1cf838f-523c-4645-a6ad-1300596b0a66","html_url":"https://github.com/BenediktAlkin/KappaProfiler","commit_stats":{"total_commits":48,"total_committers":1,"mean_commits":48.0,"dds":0.0,"last_synced_commit":"2e221a1556e3c1a2c56a8641f0a6c3fe3fa67de6"},"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/BenediktAlkin/KappaProfiler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FKappaProfiler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FKappaProfiler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FKappaProfiler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FKappaProfiler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BenediktAlkin","download_url":"https://codeload.github.com/BenediktAlkin/KappaProfiler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BenediktAlkin%2FKappaProfiler/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265863793,"owners_count":23840888,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","profiler","python","pytorch"],"created_at":"2024-10-03T20:07:43.422Z","updated_at":"2025-07-19T00:37:31.003Z","avatar_url":"https://github.com/BenediktAlkin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# KappaProfiler\n[![publish](https://github.com/BenediktAlkin/KappaProfiler/actions/workflows/publish.yaml/badge.svg)](https://github.com/BenediktAlkin/KappaProfiler/actions/workflows/publish.yaml)\n\nLightweight profiling utilities for identifying bottlenecks and timing program parts in your python application. \n\nAlso supports [async profiling for cuda](https://github.com/BenediktAlkin/KappaProfiler#time-async-operations).\n\n# Setup\n- new install: `pip install kappaprofiler`\n- uprade to new version: `pip install kappaprofiler --upgrade` \n\n# Usage\n## Time your whole application\n### With decorators\n```\nimport kappaprofiler as kp\nimport time\n\n@kp.profile\ndef main():\n  time.sleep(0.3)  # simulate some operation\n  some_method()\n \n@kp.profile\ndef some_method():\n  time.sleep(0.5)  # simulate some operation\n\nif __name__ == \"__main__\":\n  main()\n  print(kp.profiler.to_string())\n```\nThe result will be (time.sleep calls are not 100% accurate)\n```\n0.82 main\n0.51 main.some_method\n```\n### With contextmanagers\n```\nimport kappaprofiler as kp\nimport time\n\ndef main():\n  with kp.named_profile(\"main\"):\n    time.sleep(0.3)  # simulate some operation\n    with kp.named_profile(\"method\"):\n        some_method()\n  with kp.named_profile(\"main2\"):\n    time.sleep(0.2)  # simulate some operation\n \ndef some_method():\n  time.sleep(0.5)  # simulate some operation\n\nif __name__ == \"__main__\":\n  main()\n  print(kp.profiler.to_string())\n```\nThe result will be (time.sleep calls are not 100% accurate)\n```\n0.82 main\n0.51 main.method\n0.20 main2\n```\n\n## Query nodes\nEach profiling entry is represented by a node from which detailed information can be retrieved\n```\nquery = \"main.some_method\"\nnode = kp.profiler.get_node(query)\nprint(f\"{query} was called {node.count} time and took {node.to_string()} seconds in total\")\n```\n`main.some_method was called 1 time and took 0.51 seconds in total`\n\n## Time only a part of your program\n```\nimport kappaprofiler as kp\nwith kp.Stopwatch() as sw:\n    # some operation\n    ...\nprint(f\"operation took {sw.elapsed_milliseconds} milliseconds\")\nprint(f\"operation took {sw.elapsed_seconds} seconds\")\n```\n\n\n#### Time subparts\n```\nimport kappaprofiler as kp\nimport time\n\nsw1 = kp.Stopwatch()\nsw2 = kp.Stopwatch()\n\nfor i in range(1, 3):\n    with sw1:\n        # operation1\n        time.sleep(0.1 * i)\n    with sw2:\n        # operation2\n        time.sleep(0.2 * i)\n\nprint(f\"operation1 took {sw1.elapsed_seconds:.2f} seconds (average {sw1.average_lap_time:.2f})\")\nprint(f\"operation2 took {sw2.elapsed_seconds:.2f} seconds (average {sw2.average_lap_time:.2f})\")\n```\n```\noperation1 took 0.32 seconds (average 0.16)\noperation2 took 0.61 seconds (average 0.30)\n```\n\n## Time async operations\nShowcase: timing [cuda](https://developer.nvidia.com/cuda-toolkit) operations in \n[pytorch](https://github.com/pytorch/pytorch)\n\nAsynchronous operations can only be timed properly when the asynchronous call is awaited or a synchronization point is\ncreated after the timing should end. Natively in pytorch this would look something like this:\n```\n# submit a start event to the event stream\nstart_event = torch.cuda.Event(enable_timing=True)\nstart_event.record()\n\n# submit a async operation to the event stream\n...\n\n# submit a end event to the event stream\nend_event = torch.cuda.Event(enable_timing=True)\nend_event.record()\n\n# synchronize\ntorch.cuda.synchronize()\n\nprint(start_event.elapsed_time(end_event))\n```\nwhich is quite a lot of boilerplate for timing one operation.\n\nWith kappaprofiler it looks like this:\n```\nimport kappaprofiler as kp\nimport torch\n\ndef main():\n    device = torch.device(\"cuda\")\n    x = torch.randn(15000, 15000, device=device)\n    with kp.named_profile(\"matmul_wrong\"):\n        # matrix multiplication (@) is asynchronous\n        _ = x @ x\n    # the timing for \"matmul_wrong\" is only the time it took to\n    # submit the x @ x operation to the cuda event stream\n    # not the actual time the x @ x operation took\n\n    with kp.named_profile_async(\"matmul_right\"):\n        _ = x @ x\n    matmul_method(x)\n\n@kp.profile_async\ndef matmul_method(x):\n    _ = x @ x\n\ndef start_async():\n    start_event = torch.cuda.Event(enable_timing=True)\n    start_event.record()\n    return start_event\n\ndef end_async(start_event):\n    end_event = torch.cuda.Event(enable_timing=True)\n    end_event.record()\n    torch.cuda.synchronize()\n    # torch.cuda.Event.elapsed_time returns milliseconds but kappaprofiler expects seconds\n    return start_event.elapsed_time(end_event) / 1000\n\n\nif __name__ == \"__main__\":\n    kp.setup_async(start_async, end_async)\n    main()\n    print(kp.profiler.to_string())\n```\n```\n0.56 matmul_wrong\n4.69 matmul_right\n4.72 matmul_method\n```\n\n\u003cb\u003eNOTE: Synchronization points slow down overall program execution, so they should only be used for investigating \nbottlenecks/runtimes\u003c/b\u003e\n\nTo remove all synchronization points in your program either:\n- remove the `kp.setup_async` call -\u003e `kp.named_profile_async`/`kp.profile_async` will default to a noop (NOTE: this\n  removes the node completely, so it's also not possible to query it)\n- replace the `kp.setup_async` call with `kp.setup_async_as_sync` to make the asynchronous calls behave just like the \n  synchronous calls. This will make the async times wrong (like `matmul_wrong` above) but still creates a node for the \n  operation (e.g. for querying how often it was called).\n\n### Multi-process pytorch profiling\nOnly synchronizing cuda operations is not sufficient when multiple processes are used (e.g. for multi-gpu training).\nIn addition to cuda synchronization, the processes have to be synced up.\n```\nimport torch.distributed as dist\ndef end_async(start_event):\n    if dist.is_available() and dist.is_initialized():\n        torch.cuda.synchronize()\n        dist.barrier()\n    end_event = torch.cuda.Event(enable_timing=True)\n    end_event.record()\n    torch.cuda.synchronize()\n    # torch.cuda.Event.elapsed_time returns milliseconds but kappaprofiler expects seconds\n    return start_event.elapsed_time(end_event) / 1000\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenediktalkin%2Fkappaprofiler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbenediktalkin%2Fkappaprofiler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbenediktalkin%2Fkappaprofiler/lists"}