{"id":19198370,"url":"https://github.com/simonskodt/big-data-processes","last_synced_at":"2025-10-29T22:12:07.364Z","repository":{"id":221971757,"uuid":"755908594","full_name":"simonskodt/big-data-processes","owner":"simonskodt","description":"All weekly exercises in the Spring course Big Data Processes","archived":false,"fork":false,"pushed_at":"2024-05-16T12:25:46.000Z","size":81265,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-23T05:13:39.151Z","etag":null,"topics":["big-data","data-science","ethics","ml-models"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simonskodt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-11T13:01:26.000Z","updated_at":"2024-05-16T12:25:51.000Z","dependencies_parsed_at":"2024-11-09T12:37:04.544Z","dependency_job_id":null,"html_url":"https://github.com/simonskodt/big-data-processes","commit_stats":null,"previous_names":["simonskodt/big-data-processes"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/simonskodt/big-data-processes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonskodt%2Fbig-data-processes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonskodt%2Fbig-data-processes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonskodt%2Fbig-data-processes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonskodt%2Fbig-data-processes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simonskodt","download_url":"https://codeload.github.com/simonskodt/big-data-processes/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonskodt%2Fbig-data-processes/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264748998,"owners_count":23658117,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-science","ethics","ml-models"],"created_at":"2024-11-09T12:21:34.632Z","updated_at":"2025-10-29T22:12:07.300Z","avatar_url":"https://github.com/simonskodt.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Header](./header.png)\n\n## About Course\n\nThe Big Data Processes course teaches management and usage of data sets, interpretation and visualisation of data, and understanding data in larger contexts. It enables the identification of Big Data trends, understanding the value of insights to organizations, and designing Big Data processes. It also promotes the production of analytical insights and understanding the implications of Big Data processes.\n\n## Prerequisites\n\nThis course is available to all DIM students. As a non-DIM student, one should have basic literacy in a programming language (for instance R or Python), corresponding to an introductory course in programming or equivalent.\n\n## Weekly Exercises\n\n| Weeks  | Topics                                  | Exercise Description                  |\n|--------|-----------------------------------------|---------------------------------------|\n| Week 1 | Introduction                            | Opening, examining of simple datasets |\n| Week 2 | Prediction                              | Where to get datasets, dataset manipulation, visualisations |\n| Week 3 | Classification                          | Pearson correlation matrix, decision trees for classification, K-NN |\n| Week 4 | Ensemble Methods                        | Splitting and scaling, bagging, boosing, ensemble voting|\n| Week 5 | Evaluating                              | Confusion matrix, scores and metrics, over- and undersampling |\n| Week 6 | ML \u0026 Climate Change                     | Using codecarbon from EmissionsTracker |\n| Week 7 | Exploratory Data Analysis               | Data cleaning, exploration, outliers, and visualisation |\n| Week 8 | Power                                   | \u003cspan style=\"color:lightblue\"\u003e**NO CODE**\u003c/span\u003e |\n| Week 9 | Development                             | \u003cspan style=\"color:lightblue\"\u003e**NO CODE**\u003c/span\u003e |\n| Week 10 | Implementation \u0026 Maintenance           | \u003cspan style=\"color:lightblue\"\u003e**NO CODE**\u003c/span\u003e |\n| Week 11 | AI Ethics                              | \u003cspan style=\"color:lightblue\"\u003e**NO CODE**\u003c/span\u003e |\n| Week 12 | International Contexts                 | \u003cspan style=\"color:lightblue\"\u003e**NO CODE**\u003c/span\u003e |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonskodt%2Fbig-data-processes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimonskodt%2Fbig-data-processes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonskodt%2Fbig-data-processes/lists"}