https://github.com/simonskodt/big-data-processes
All weekly exercises in the Spring course Big Data Processes
https://github.com/simonskodt/big-data-processes
big-data data-science ethics ml-models
Last synced: 8 months ago
JSON representation
All weekly exercises in the Spring course Big Data Processes
- Host: GitHub
- URL: https://github.com/simonskodt/big-data-processes
- Owner: simonskodt
- Created: 2024-02-11T13:01:26.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-16T12:25:46.000Z (about 2 years ago)
- Last Synced: 2025-02-23T05:13:39.151Z (over 1 year ago)
- Topics: big-data, data-science, ethics, ml-models
- Language: Jupyter Notebook
- Homepage:
- Size: 77.5 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README

## About Course
The Big Data Processes course teaches management and usage of data sets, interpretation and visualisation of data, and understanding data in larger contexts. It enables the identification of Big Data trends, understanding the value of insights to organizations, and designing Big Data processes. It also promotes the production of analytical insights and understanding the implications of Big Data processes.
## Prerequisites
This course is available to all DIM students. As a non-DIM student, one should have basic literacy in a programming language (for instance R or Python), corresponding to an introductory course in programming or equivalent.
## Weekly Exercises
| Weeks | Topics | Exercise Description |
|--------|-----------------------------------------|---------------------------------------|
| Week 1 | Introduction | Opening, examining of simple datasets |
| Week 2 | Prediction | Where to get datasets, dataset manipulation, visualisations |
| Week 3 | Classification | Pearson correlation matrix, decision trees for classification, K-NN |
| Week 4 | Ensemble Methods | Splitting and scaling, bagging, boosing, ensemble voting|
| Week 5 | Evaluating | Confusion matrix, scores and metrics, over- and undersampling |
| Week 6 | ML & Climate Change | Using codecarbon from EmissionsTracker |
| Week 7 | Exploratory Data Analysis | Data cleaning, exploration, outliers, and visualisation |
| Week 8 | Power | **NO CODE** |
| Week 9 | Development | **NO CODE** |
| Week 10 | Implementation & Maintenance | **NO CODE** |
| Week 11 | AI Ethics | **NO CODE** |
| Week 12 | International Contexts | **NO CODE** |