{"id":13471774,"url":"https://github.com/dcai-course/dcai-lab","last_synced_at":"2025-03-26T14:32:32.169Z","repository":{"id":70654308,"uuid":"574646981","full_name":"dcai-course/dcai-lab","owner":"dcai-course","description":"Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻","archived":false,"fork":false,"pushed_at":"2025-02-24T15:58:39.000Z","size":4651,"stargazers_count":445,"open_issues_count":3,"forks_count":155,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-02-24T16:52:38.723Z","etag":null,"topics":["course","data-centric-ai","data-science","deep-learning","homework","lab","machine-learning"],"latest_commit_sha":null,"homepage":"https://dcai.csail.mit.edu/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dcai-course.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-12-05T19:12:40.000Z","updated_at":"2025-02-24T15:58:44.000Z","dependencies_parsed_at":"2023-12-28T13:41:41.406Z","dependency_job_id":"3c3cbaa3-21f4-45f3-b483-5929a3d16e60","html_url":"https://github.com/dcai-course/dcai-lab","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcai-course%2Fdcai-lab","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcai-course%2Fdcai-lab/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcai-course%2Fdcai-lab/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcai-course%2Fdcai-lab/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dcai-course","download_url":"https://codeload.github.com/dcai-course/dcai-lab/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245671009,"owners_count":20653467,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["course","data-centric-ai","data-science","deep-learning","homework","lab","machine-learning"],"created_at":"2024-07-31T16:00:49.167Z","updated_at":"2025-03-26T14:32:29.871Z","avatar_url":"https://github.com/dcai-course.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# Lab assignments for Introduction to Data-Centric AI\n\nThis repository contains the lab assignments for the [Introduction to\nData-Centric AI](https://dcai.csail.mit.edu/) class.\n\nContributions are most welcome! If you have ideas for improving the labs,\nplease open an issue or submit a pull request.\n\nIf you're looking for the 2023 version of the labs, check out the [2023\nbranch](https://github.com/dcai-course/dcai-lab/tree/2023).\n\n## [Lab 1: Data-Centric AI vs. Model-Centric AI][lab-1]\n\nThe [first lab assignment][lab-1] walks you through an ML task of building a\ntext classifier, and illustrates the power (and often simplicity) of\ndata-centric approaches.\n\n[lab-1]: data_centric_model_centric/Lab%20-%20Data-Centric%20AI%20vs%20Model-Centric%20AI.ipynb\n\n## [Lab 2: Label Errors][lab-2]\n\n[This lab][lab-2] guides you through writing your own implementation of\nautomatic label error identification using Confident Learning, the technique\ntaught in [today’s lecture][lec-2].\n\n[lab-2]: label_errors/Lab%20-%20Label%20Errors.ipynb\n[lec-2]: https://dcai.csail.mit.edu/lectures/label-errors/\n\n## [Lab 3: Dataset Creation and Curation][lab-3]\n\n[This lab assignment][lab-3] is to analyze an already collected dataset labeled\nby multiple annotators.\n\n[lab-3]: dataset_curation/Lab%20-%20Dataset%20Curation.ipynb\n\n## [Lab 4: Data-centric Evaluation of ML Models][lab-4]\n\n[This lab assignment][lab-4] is to try improving the performance of a given\nmodel solely by improving its training data via some of the various strategies\ncovered here.\n\n[lab-4]: data_centric_evaluation/Lab%20-%20Data-Centric%20Evaluation.ipynb\n\n## [Lab 5: Class Imbalance, Outliers, and Distribution Shift][lab-5]\n\n[The lab assignment][lab-5] for this lecture is to implement and compare\ndifferent methods for identifying outliers. For this lab, we've focused on\nanomaly detection. You are given a clean training dataset consisting of many\npictures of dogs, and an evaluation dataset that contains outliers (non-dogs).\nYour task is to implement and compare various methods for detecting these\noutliers. You may implement some of the ideas presented in [today's\nlecture][lec-5], or you can look up other outlier detection algorithms in the\nlinked references or online.\n\n[lab-5]: outliers/Lab%20-%20Outliers.ipynb\n[lec-5]: https://dcai.csail.mit.edu/lectures/imbalance-outliers-shift/\n\n## [Lab 6: Growing or Compressing Datasets][lab-6]\n\n[This lab][lab-6] guides you through an implementation of active learning.\n\n[lab-6]: growing_datasets/Lab%20-%20Growing%20Datasets.ipynb\n\n## [Lab 7: Interpretability in Data-Centric ML][lab-7]\n\n[This lab][lab-7] guides you through finding issues in a dataset’s features by\napplying interpretability techniques.\n\n[lab-7]: interpretable_features/Lab%20-%20Interpretable%20Features.ipynb\n\n## [Lab 8: Encoding Human Priors: Data Augmentation and Prompt Engineering][lab-8]\n\n[This lab] guides you through prompt engineering, crafting inputs for large\nlanguage models (LLMs). With these large pre-trained models, even small amounts\nof data can make them very useful. This lab is also [available on\nColab][lab-8-colab].\n\n[lab-8]: prompt_engineering/Lab_Prompt_Engineering.ipynb\n[lab-8-colab]: https://colab.research.google.com/drive/1cipH-u6Jz0EH-6Cd9MPYgY4K0sJZwRJq\n\n## [Lab 9: Data Privacy and Security][lab-9]\n\nThe [lab assignment][lab-9] for this lecture is to implement a membership\ninference attack. You are given a trained machine learning model, available as\na black-box prediction function. Your task is to devise a method to determine\nwhether or not a given data point was in the training set of this model. You\nmay implement some of the ideas presented in [today’s lecture][lec-9], or you\ncan look up other membership inference attack algorithms.\n\n\n[lab-9]: membership_inference/Lab%20-%20Membership%20Inference.ipynb\n[lec-9]: https://dcai.csail.mit.edu/lectures/data-privacy-security/\n\n## License\n\nCopyright (c) by the instructors of Introduction to Data-Centric AI (dcai.csail.mit.edu).\n\ndcai-lab is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.\n\ndcai-lab is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\nSee [GNU Affero General Public LICENSE](https://github.com/dcai-course/dcai-lab/blob/master/LICENSE.txt) for details.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcai-course%2Fdcai-lab","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdcai-course%2Fdcai-lab","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcai-course%2Fdcai-lab/lists"}