{"id":16392814,"url":"https://github.com/LLNL/dftracer","last_synced_at":"2025-03-16T16:31:20.496Z","repository":{"id":157973696,"uuid":"620618199","full_name":"LLNL/dftracer","owner":"LLNL","description":"A multi-level dataflow tracer for capturing I/O calls from workflows.","archived":false,"fork":false,"pushed_at":"2025-03-15T07:18:18.000Z","size":32207,"stargazers_count":15,"open_issues_count":15,"forks_count":9,"subscribers_count":2,"default_branch":"develop","last_synced_at":"2025-03-16T00:34:34.404Z","etag":null,"topics":["deep","dlio","io","learning","profiler"],"latest_commit_sha":null,"homepage":"https://dftracer.readthedocs.io/en/latest/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LLNL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-29T03:17:54.000Z","updated_at":"2025-03-14T06:21:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"d15b1c4b-d24a-4290-8db5-d315210a272f","html_url":"https://github.com/LLNL/dftracer","commit_stats":null,"previous_names":["hariharan-devarajan/dftracer","hariharan-devarajan/dlio-profiler"],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fdftracer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fdftracer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fdftracer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLNL%2Fdftracer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LLNL","download_url":"https://codeload.github.com/LLNL/dftracer/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243822292,"owners_count":20353499,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep","dlio","io","learning","profiler"],"created_at":"2024-10-11T04:51:29.773Z","updated_at":"2025-03-16T16:31:20.474Z","avatar_url":"https://github.com/LLNL.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DFTracer\n\n[![Build and Test](https://github.com/LLNL/dftracer/actions/workflows/ci.yml/badge.svg)](https://github.com/LLNL/dftracer/actions/workflows/ci.yml)\n[![Documentation Status](https://readthedocs.org/projects/dftracer/badge/?version=latest)](https://dftracer.readthedocs.io/en/latest/?badge=latest)\n![PyPI - Version](https://img.shields.io/pypi/v/pydftracer?label=PyPI)\n![PyPI - Wheel](https://img.shields.io/pypi/wheel/pydftracer?label=Wheel)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pydftracer?label=Python)\n![PyPI - License](https://img.shields.io/pypi/l/pydftracer?label=License)\n\n## Overview\n\nDFTracer is a tracing tool designed to capture both application-code and I/O-call level events from workflows. It provides a unified tracing interface, optimized trace format, and compression mechanism to enable efficient distributed analysis for large-scale AI-driven workloads.\n\n## Prerequisites\n\nRequirements for DFTracer\n\n1. Python\u003e=3.7\n1. pybind11\n\nRequirements for DFAnalyzer\n\n1. bokeh\u003e=2.4.2\n1. dask\u003e=2023.5.0\n1. distributed\n1. matplotlib\u003e=3.7.3\n1. numpy\u003e=1.24.3\n1. pandas\u003e=2.0.3\n1. pyarrow\u003e=12.0.1\n1. pybind11\n1. python-intervals\u003e=1.10.0.post1\n1. rich\u003e=13.6.0\n1. seaborn\u003e=0.13.2\n1. [zindex_py](https://github.com/hariharan-devarajan/zindex.git)\n\n## Installation\n\nUsers can easily install DFTracer using `pip`, the standard tool for installing Python packages. \nThis method works for both native Python and Conda environments.\n\n### From PyPI\n\n```bash\npip install pydftracer\npip install pydftracer[dfanalyzer]\n```\n\n### From Github\n\n```bash\nDFTRACER_VERSION=develop\npip install git+https://github.com/LLNL/dftracer.git@${DFTRACER_VERSION}\npip install git+https://github.com/LLNL/dftracer.git@${DFTRACER_VERSION}#egg=pydftracer[dfanalyzer]\n```\n\n### From Source\n\n```bash\ngit clone git@github.com:LLNL/dftracer.git\ncd dftracer\n# You can skip this for installing the dev branch.\n# for latest stable version use master branch.\ngit checkout tags/\u003cRelease\u003e -b \u003cRelease\u003e\npip install .\n```\n\nFor detailed build instructions, click [here](https://dftracer.readthedocs.io/en/latest/build.html).\n\n## Usage\n\n```python\nfrom dftracer.logger import dftracer, dft_fn\nlog_inst = dftracer.initialize_log(logfile=None, data_dir=None, process_id=-1)\ndft_fn = dft_fn(\"COMPUTE\")\n\n# Example of using function decorators\n@dft_fn.log\ndef log_events(index):\n    sleep(1)\n\n# Example of function spawning and implicit I/O calls\ndef posix_calls(val):\n    index, is_spawn = val\n    path = f\"{cwd}/data/demofile{index}.txt\"\n    f = open(path, \"w+\")\n    f.write(\"Now the file has more content!\")\n    f.close()\n    if is_spawn:\n        print(f\"Calling spawn on {index} with pid {os.getpid()}\")\n        log_inst.finalize() # This need to be called to correctly finalize DFTracer.\n    else:\n        print(f\"Not calling spawn on {index} with pid {os.getpid()}\")\n\n# NPZ calls internally calls POSIX calls.\ndef npz_calls(index):\n    path = f\"{cwd}/data/demofile{index}.npz\"\n    if os.path.exists(path):\n        os.remove(path)\n    records = np.random.randint(255, size=(8, 8, 1024), dtype=np.uint8)\n    record_labels = [0] * 1024\n    np.savez(path, x=records, y=record_labels)\n\ndef main():\n    log_events(0)\n    npz_calls(1)\n    with get_context('spawn').Pool(1, initializer=init) as pool:\n        pool.map(posix_calls, ((2, True),))\n    log_inst.finalize()\n\nif __name__ == \"__main__\":\n    main()\n```\n\nFor this example, as the `dftracer.initialize_log` do not pass `logfile` or `data_dir`, we need to set `DFTRACER_LOG_FILE` and `DFTRACER_DATA_DIR`.\nBy default the DFTracer mode is set to `FUNCTION`.\nExample of running this configurations are:\n\n```bash\n# The process id, app_name and .pfw will be appended by DFTracer for each app and process.\n# The name of the final log file will be ~/log_file-\u003cAPP_NAME\u003e-\u003cPID\u003e.pfw\nDFTRACER_LOG_FILE=~/log_file\n# Colon separated paths to include in the tracing\nDFTRACER_DATA_DIR=/dev/shm/:/p/gpfs1/$USER/dataset:$PWD/data\n# Enable DFTracer\nDFTRACER_ENABLE=1\n```\n\nFor more examples, click [here](https://dftracer.readthedocs.io/en/latest/examples.html).\n\n## Documentation\n\n* Building DFTracer: [https://dftracer.readthedocs.io/en/latest/build.html](https://dftracer.readthedocs.io/en/latest/build.html)\n* Integrating DFTracer: [https://dftracer.readthedocs.io/en/latest/examples.html](https://dftracer.readthedocs.io/en/latest/examples.html)\n* Visualizing DFTracer Traces: [https://dftracer.readthedocs.io/en/latest/perfetto.html](https://dftracer.readthedocs.io/en/latest/perfetto.html)\n* Building DFAnalyzer: [https://dftracer.readthedocs.io/en/latest/dfanalyzer_build.html](https://dftracer.readthedocs.io/en/latest/dfanalyzer_build.html)\n\n## Citation and Reference\n\nThe original SC'24 paper describes the design and implementation of the DFTracer code. Please cite this paper and the code if you use DFTracer in your research. \n\n```\n@inproceedings{devarajan_dftracer_2024,\n    address = {Atlanta, GA},\n    title = {{DFTracer}: {An} {Analysis}-{Friendly} {Data} {Flow} {Tracer} for {AI}-{Driven} {Workflows}},\n    shorttitle = {{DFTracer}},\n    urldate = {2024-07-31},\n    booktitle = {{SC24}: {International} {Conference} for {High} {Performance} {Computing}, {Networking}, {Storage} and {Analysis}},\n    publisher = {IEEE},\n    author = {Devarajan, Hariharan and Pottier, Loic and Velusamy, Kaushik and Zheng, Huihuo and Yildirim, Izzet and Kogiou, Olga and Yu, Weikuan and Kougkas, Anthony and Sun, Xian-He and Yeom, Jae Seung and Mohror, Kathryn},\n    month = nov,\n    year = {2024},\n}\n\n@misc{devarajan_dftracer_code_2024,\n    type = {Github},\n    title = {Github {DFTracer}},\n    shorttitle = {{DFTracer}},\n    url = {https://github.com/LLNL/dftracer.git},\n    urldate = {2024-07-31},\n    journal = {DFTracer: A multi-level dataflow tracer for capture I/O calls from worklows.},\n    author = {Devarajan, Hariharan and Pottier, Loic and Velusamy, Kaushik and Zheng, Huihuo and Yildirim, Izzet and Kogiou, Olga and Yu, Weikuan and Kougkas, Anthony and Sun, Xian-He and Yeom, Jae Seung and Mohror, Kathryn},\n    month = jun,\n    year = {2024},\n}\n```\n\n## Acknowledgments\n\nThis work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344; and under the auspices of the National Cancer Institute (NCI) by Frederick National Laboratory for Cancer Research (FNLCR) under Contract 75N91019D00024. This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science-Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357. Office of Advanced Scientific Computing Research under the DOE Early Career Research Program. Also, This material is based upon work partially supported by LLNL LDRD 23-ERD-045 and 24-SI-005. LLNL-CONF-857447.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLLNL%2Fdftracer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FLLNL%2Fdftracer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FLLNL%2Fdftracer/lists"}