Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dcai-course/dcai-lab
Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 π©π½βπ»
https://github.com/dcai-course/dcai-lab
course data-centric-ai data-science deep-learning homework lab machine-learning
Last synced: 14 days ago
JSON representation
Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 π©π½βπ»
- Host: GitHub
- URL: https://github.com/dcai-course/dcai-lab
- Owner: dcai-course
- License: agpl-3.0
- Created: 2022-12-05T19:12:40.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2023-12-28T12:24:54.000Z (11 months ago)
- Last Synced: 2024-08-01T16:27:41.844Z (3 months ago)
- Topics: course, data-centric-ai, data-science, deep-learning, homework, lab, machine-learning
- Language: Jupyter Notebook
- Homepage: https://dcai.csail.mit.edu/
- Size: 4.44 MB
- Stars: 409
- Watchers: 13
- Forks: 145
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Lab assignments for Introduction to Data-Centric AI
This repository contains the lab assignments for the [Introduction to
Data-Centric AI](https://dcai.csail.mit.edu/) class.Contributions are most welcome! If you have ideas for improving the labs,
please open an issue or submit a pull request.If you're looking for the 2023 version of the labs, check out the [2023
branch](https://github.com/dcai-course/dcai-lab/tree/2023).## [Lab 1: Data-Centric AI vs. Model-Centric AI][lab-1]
The [first lab assignment][lab-1] walks you through an ML task of building a
text classifier, and illustrates the power (and often simplicity) of
data-centric approaches.[lab-1]: data_centric_model_centric/Lab%20-%20Data-Centric%20AI%20vs%20Model-Centric%20AI.ipynb
## [Lab 2: Label Errors][lab-2]
[This lab][lab-2] guides you through writing your own implementation of
automatic label error identification using Confident Learning, the technique
taught in [todayβs lecture][lec-2].[lab-2]: label_errors/Lab%20-%20Label%20Errors.ipynb
[lec-2]: https://dcai.csail.mit.edu/lectures/label-errors/## [Lab 3: Dataset Creation and Curation][lab-3]
[This lab assignment][lab-3] is to analyze an already collected dataset labeled
by multiple annotators.[lab-3]: dataset_curation/Lab%20-%20Dataset%20Curation.ipynb
## [Lab 4: Data-centric Evaluation of ML Models][lab-4]
[This lab assignment][lab-4] is to try improving the performance of a given
model solely by improving its training data via some of the various strategies
covered here.[lab-4]: data_centric_evaluation/Lab%20-%20Data-Centric%20Evaluation.ipynb
## [Lab 5: Class Imbalance, Outliers, and Distribution Shift][lab-5]
[The lab assignment][lab-5] for this lecture is to implement and compare
different methods for identifying outliers. For this lab, we've focused on
anomaly detection. You are given a clean training dataset consisting of many
pictures of dogs, and an evaluation dataset that contains outliers (non-dogs).
Your task is to implement and compare various methods for detecting these
outliers. You may implement some of the ideas presented in [today's
lecture][lec-5], or you can look up other outlier detection algorithms in the
linked references or online.[lab-5]: outliers/Lab%20-%20Outliers.ipynb
[lec-5]: https://dcai.csail.mit.edu/lectures/imbalance-outliers-shift/## [Lab 6: Growing or Compressing Datasets][lab-6]
[This lab][lab-6] guides you through an implementation of active learning.
[lab-6]: growing_datasets/Lab%20-%20Growing%20Datasets.ipynb
## [Lab 7: Interpretability in Data-Centric ML][lab-7]
[This lab][lab-7] guides you through finding issues in a datasetβs features by
applying interpretability techniques.[lab-7]: interpretable_features/Lab%20-%20Interpretable%20Features.ipynb
## [Lab 8: Encoding Human Priors: Data Augmentation and Prompt Engineering][lab-8]
[This lab] guides you through prompt engineering, crafting inputs for large
language models (LLMs). With these large pre-trained models, even small amounts
of data can make them very useful. This lab is also [available on
Colab][lab-8-colab].[lab-8]: prompt_engineering/Lab_Prompt_Engineering.ipynb
[lab-8-colab]: https://colab.research.google.com/drive/1cipH-u6Jz0EH-6Cd9MPYgY4K0sJZwRJq## [Lab 9: Data Privacy and Security][lab-9]
The [lab assignment][lab-9] for this lecture is to implement a membership
inference attack. You are given a trained machine learning model, available as
a black-box prediction function. Your task is to devise a method to determine
whether or not a given data point was in the training set of this model. You
may implement some of the ideas presented in [todayβs lecture][lec-9], or you
can look up other membership inference attack algorithms.[lab-9]: membership_inference/Lab%20-%20Membership%20Inference.ipynb
[lec-9]: https://dcai.csail.mit.edu/lectures/data-privacy-security/## License
Copyright (c) by the instructors of Introduction to Data-Centric AI (dcai.csail.mit.edu).
dcai-lab is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
dcai-lab is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See [GNU Affero General Public LICENSE](https://github.com/dcai-course/dcai-lab/blob/master/LICENSE.txt) for details.