https://github.com/southernmethodistuniversity/msds_hpc_project_template
MSDS HPC and DS Final Project
https://github.com/southernmethodistuniversity/msds_hpc_project_template
Last synced: 4 months ago
JSON representation
MSDS HPC and DS Final Project
- Host: GitHub
- URL: https://github.com/southernmethodistuniversity/msds_hpc_project_template
- Owner: SouthernMethodistUniversity
- Created: 2022-06-09T22:02:28.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-06-09T22:27:37.000Z (about 4 years ago)
- Last Synced: 2025-03-03T03:13:21.966Z (over 1 year ago)
- Size: 3.91 KB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MSDS HPC and DS Final Project
The goal of semester project is to produce a single-submit, end-to-end,
performant pipeline for a complex and computationally intensive data analysis
workflow.
## Semester Project Details
- The analysis and dataset, possibly generative, needs to be
sufficiently computationally intensive such that a reasonable
performance analysis can be conducted.
- The specific dataset, analysis, and performance analysis will be
agreed to at various stages during the semester.
- The pipeline should be single-submit, meaning that a single job is
submitted to the queue system and then entire pipeline is run with
each stage run on appropriate hardware with appropriately optimized
software stacks.
- The deliverable will be a ready to present slide deck in your GitHub
repo, *i.e.* a job will be submitted on an SMU HPC cluster and then,
sometime later with zero human interaction, a PDF presentation will
appear in your GitHub repo.
- The presentation should discuss both the dataset analysis and
performance analysis.
- Specific compute resources will be reserved for final testing and
the production run.
## Repository Structure
- `bin`, Executable scripts.
- `src`, Non-directly executable source code.
- `data`, Datasets, where appropriate, and parameter files
- `docs`, Workflow documentation and location of the final deliverable