https://github.com/coderjolly/utilisation-analysis
This provides a small glimpse of the IISc's, Supercomputer Education Research Centre (SERC) resource data, and how it was ingested, extracted to produced relevant results for data analysis between actual resource utilisation and simulated resource utilisation.
https://github.com/coderjolly/utilisation-analysis
csv-parser-python data-transformation data-visualization flow plotly-dash plotly-python
Last synced: 3 months ago
JSON representation
This provides a small glimpse of the IISc's, Supercomputer Education Research Centre (SERC) resource data, and how it was ingested, extracted to produced relevant results for data analysis between actual resource utilisation and simulated resource utilisation.
- Host: GitHub
- URL: https://github.com/coderjolly/utilisation-analysis
- Owner: coderjolly
- Created: 2023-03-08T15:49:47.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2023-05-30T08:16:50.000Z (over 2 years ago)
- Last Synced: 2025-03-27T23:45:08.821Z (9 months ago)
- Topics: csv-parser-python, data-transformation, data-visualization, flow, plotly-dash, plotly-python
- Language: Jupyter Notebook
- Homepage:
- Size: 1.86 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Utilization Analysis
This is small demonstration of a data pipeline that uses the `/data` folder with system log or `0_sim_log.csv` that contains information of system utilisations throughout a period of time and a supercomputer resource utilisation csv or `3_apst_03_2021.csv` that contains the actual system utilizations of that period in order to generate a simulation for resource utilisation of High Performance Computing (HPC) systems.
The flow of data is expantiated in the diagrams below:
- First, the data is extracted from various HPC sources, for simplicity and demonstration purposes, the data is extracted from a single source, the `/data` folder. `0_sim_log.csv` contains the system utilisation of a system for a period of time and `3_apst_03_2021.csv` contains the actual system utilisation of that period.
- Now, in order to transform these data files in to their respective dataframes, the `data-transformations` scripts and notebooks are used. These files are located in the `/src` and `notebooks` folder. The `data-transformations` scripts are used to transform the data into a dataframe that can be used for the simulation. The final dataframes are then stored in the `/data` folder as a csv.

- The second data flow employs the final dataframes discussed above and uses them to generate a simulation of the system utilisation with a camparison with actual utilization of resources in the form of a graph or a plot. The `gen_ploy` notebook is used to compare the simulations with graphs using `plots.csv` in the `/data` folder.

- Finally, the third data flow uses the `plot_utilization` Flask application to generate a simulation of the system utilisation with a camparison with actual utilization of resources in the form of a grapph using `plotly-Dash` application. The `plot_utilization` Flask application is located in the `/src` folder and the result is a web application that can be accessed through the browser.
