https://github.com/coderjolly/utilisation-analysis

This provides a small glimpse of the IISc's, Supercomputer Education Research Centre (SERC) resource data, and how it was ingested, extracted to produced relevant results for data analysis between actual resource utilisation and simulated resource utilisation.
https://github.com/coderjolly/utilisation-analysis

csv-parser-python data-transformation data-visualization flow plotly-dash plotly-python

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/coderjolly/utilisation-analysis
Owner: coderjolly
Created: 2023-03-08T15:49:47.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2023-05-30T08:16:50.000Z (over 2 years ago)
Last Synced: 2025-03-27T23:45:08.821Z (9 months ago)
Topics: csv-parser-python, data-transformation, data-visualization, flow, plotly-dash, plotly-python
Language: Jupyter Notebook
Homepage:
Size: 1.86 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Utilization Analysis

This is small demonstration of a data pipeline that uses the `/data` folder with system log or `0_sim_log.csv` that contains information of system utilisations throughout a period of time and a supercomputer resource utilisation csv or `3_apst_03_2021.csv` that contains the actual system utilizations of that period in order to generate a simulation for resource utilisation of High Performance Computing (HPC) systems.

The flow of data is expantiated in the diagrams below:

- First, the data is extracted from various HPC sources, for simplicity and demonstration purposes, the data is extracted from a single source, the `/data` folder. `0_sim_log.csv` contains the system utilisation of a system for a period of time and `3_apst_03_2021.csv` contains the actual system utilisation of that period.
- Now, in order to transform these data files in to their respective dataframes, the `data-transformations` scripts and notebooks are used. These files are located in the `/src` and `notebooks` folder. The `data-transformations` scripts are used to transform the data into a dataframe that can be used for the simulation. The final dataframes are then stored in the `/data` folder as a csv.

![1st data flow](/figures/premiere-pipeline.png)

- The second data flow employs the final dataframes discussed above and uses them to generate a simulation of the system utilisation with a camparison with actual utilization of resources in the form of a graph or a plot. The `gen_ploy` notebook is used to compare the simulations with graphs using `plots.csv` in the `/data` folder.

![2nd data flow](/figures/second-pipeline.png)

- Finally, the third data flow uses the `plot_utilization` Flask application to generate a simulation of the system utilisation with a camparison with actual utilization of resources in the form of a grapph using `plotly-Dash` application. The `plot_utilization` Flask application is located in the `/src` folder and the result is a web application that can be accessed through the browser.

![3rd data flow](/figures/third-pipeline.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/coderjolly/utilisation-analysis

Awesome Lists containing this project

README