{"id":17229308,"url":"https://github.com/fpoli/view-spark-timeline","last_synced_at":"2025-04-14T01:41:45.784Z","repository":{"id":26848386,"uuid":"111108894","full_name":"fpoli/view-spark-timeline","owner":"fpoli","description":"Visualize in an SVG the timeline of an Apache Spark execution.","archived":false,"fork":false,"pushed_at":"2022-07-05T21:06:36.000Z","size":2188,"stargazers_count":6,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-27T15:52:28.437Z","etag":null,"topics":["cli","spark","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fpoli.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-17T13:52:52.000Z","updated_at":"2024-10-19T06:02:45.000Z","dependencies_parsed_at":"2022-09-14T03:31:53.085Z","dependency_job_id":null,"html_url":"https://github.com/fpoli/view-spark-timeline","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fpoli%2Fview-spark-timeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fpoli%2Fview-spark-timeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fpoli%2Fview-spark-timeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fpoli%2Fview-spark-timeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fpoli","download_url":"https://codeload.github.com/fpoli/view-spark-timeline/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248809040,"owners_count":21164893,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","spark","visualization"],"created_at":"2024-10-15T04:47:26.675Z","updated_at":"2025-04-14T01:41:45.755Z","avatar_url":"https://github.com/fpoli.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"View Spark Timeline\n===================\n\n.. image:: https://travis-ci.org/fpoli/view-spark-timeline.svg?branch=master\n    :target: https://travis-ci.org/fpoli/view-spark-timeline\n\nCommand line application to visualize the timeline of Apache Spark executions, reading Spark's log files.\n\nCan you spot the bottleneck from the following visualization?\n\n.. image:: docs/example-timeline.svg\n\n\nImage explanation\n-----------------\n\nOn the vertical axis we have the executor cores (grouped by executor).\nOn the horizontal axis we have the time, going from left to right.\nEach task is a horizontal bar that starts at a certain time on a core of an executor and ends after some time.\nThe color normally ranges from green, used for shorter tasks, to red, used for longer tasks. Failed tasks are black.\nAll the white space corresponds to some unused core.\n\nUsually, the greener the image is, the better. If there is a bottleneck in the execution it is easy to spot the guilty task(s).\nBy opening the SVG in a browser and by moving the mouse over a task there should appear a tooltip with the task ID.\nIt is then useful to inspect the task using the standard Spark UI.\n\n\nInstallation\n------------\n\nThis project requires Python 3.\n\n.. code-block:: bash\n\n    pip3 install view-spark-timeline\n\n\nExample\n-------\n\n.. code-block:: bash\n\n    view-spark-timeline -i examples/application_1472176676028_555248_1 -o docs/timeline.svg -u 1000\n\n\nOutput:\n\n.. code-block:: text\n\n    Read events from 'examples/application_1472176676028_555248_1'...\n    Total cores: 32\n    Total duration: 312.5s\n    Number of tasks: 2990\n    Min task duration: 0.0s\n    Max task duration: 25.9s\n    Cluster utilization: 57.70%\n    Drawing events...\n    Read events from 'examples/application_1472176676028_555248_1'...\n    SVG size: 1500 160\n    Saving SVG...\n\n\nUsage\n-----\n\n.. code-block:: bash\n\n    view-spark-timeline --help\n\nOutput:\n\n.. code-block:: text\n\n    usage: view-spark-timeline [-h] -i INPUT_LOG -o OUTPUT_IMAGE\n                           [-t TIME_UNCERTAINTY] [-v]\n\n    Visualize the timeline of a Spark execution from its log file. (v0.2.0)\n\n    optional arguments:\n    -h, --help            show this help message and exit\n    -i INPUT_LOG, --input-log INPUT_LOG\n                            path to the spark's application log\n    -o OUTPUT_IMAGE, --output-image OUTPUT_IMAGE\n                            path of the output image\n    -u TIME_UNCERTAINTY, --time-uncertainty TIME_UNCERTAINTY\n                            maximum allowed time uncertainty (in ms) of the\n                            timestamps in the log file. An high uncertainty\n                            determines a slower, but more robust, execution.\n                            (Default: 0)\n    -v, --version         print version and exit\n\n\nLicense\n-------\n\nCopyright (c) 2017-2020, Federico Poli \u003cfederpoli@gmail.com\u003e\n\nThis project, except for files in the :literal:`lib` and :literal:`examples` folders, is released under the MIT license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffpoli%2Fview-spark-timeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffpoli%2Fview-spark-timeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffpoli%2Fview-spark-timeline/lists"}