https://github.com/dcai-course/dcai-lab

Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻
https://github.com/dcai-course/dcai-lab

course data-centric-ai data-science deep-learning homework lab machine-learning

Last synced: 6 months ago
JSON representation

Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻

Host: GitHub
URL: https://github.com/dcai-course/dcai-lab
Owner: dcai-course
License: agpl-3.0
Created: 2022-12-05T19:12:40.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2025-02-24T15:58:39.000Z (7 months ago)
Last Synced: 2025-02-24T16:52:38.723Z (7 months ago)
Topics: course, data-centric-ai, data-science, deep-learning, homework, lab, machine-learning
Language: Jupyter Notebook
Homepage: https://dcai.csail.mit.edu/
Size: 4.44 MB
Stars: 445
Watchers: 13
Forks: 155
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # Lab assignments for Introduction to Data-Centric AI

This repository contains the lab assignments for the [Introduction to

Data-Centric AI](https://dcai.csail.mit.edu/) class.

Contributions are most welcome! If you have ideas for improving the labs,

please open an issue or submit a pull request.

If you're looking for the 2023 version of the labs, check out the [2023

branch](https://github.com/dcai-course/dcai-lab/tree/2023).

## [Lab 1: Data-Centric AI vs. Model-Centric AI][lab-1]

The [first lab assignment][lab-1] walks you through an ML task of building a

text classifier, and illustrates the power (and often simplicity) of

data-centric approaches.

[lab-1]: data_centric_model_centric/Lab%20-%20Data-Centric%20AI%20vs%20Model-Centric%20AI.ipynb

## [Lab 2: Label Errors][lab-2]

[This lab][lab-2] guides you through writing your own implementation of

automatic label error identification using Confident Learning, the technique

taught in [today’s lecture][lec-2].

[lab-2]: label_errors/Lab%20-%20Label%20Errors.ipynb

[lec-2]: https://dcai.csail.mit.edu/lectures/label-errors/

## [Lab 3: Dataset Creation and Curation][lab-3]

[This lab assignment][lab-3] is to analyze an already collected dataset labeled

by multiple annotators.

[lab-3]: dataset_curation/Lab%20-%20Dataset%20Curation.ipynb

## [Lab 4: Data-centric Evaluation of ML Models][lab-4]

[This lab assignment][lab-4] is to try improving the performance of a given

model solely by improving its training data via some of the various strategies

covered here.

[lab-4]: data_centric_evaluation/Lab%20-%20Data-Centric%20Evaluation.ipynb

## [Lab 5: Class Imbalance, Outliers, and Distribution Shift][lab-5]

[The lab assignment][lab-5] for this lecture is to implement and compare

different methods for identifying outliers. For this lab, we've focused on

anomaly detection. You are given a clean training dataset consisting of many

pictures of dogs, and an evaluation dataset that contains outliers (non-dogs).

Your task is to implement and compare various methods for detecting these

outliers. You may implement some of the ideas presented in [today's

lecture][lec-5], or you can look up other outlier detection algorithms in the

linked references or online.

[lab-5]: outliers/Lab%20-%20Outliers.ipynb

[lec-5]: https://dcai.csail.mit.edu/lectures/imbalance-outliers-shift/

## [Lab 6: Growing or Compressing Datasets][lab-6]

[This lab][lab-6] guides you through an implementation of active learning.

[lab-6]: growing_datasets/Lab%20-%20Growing%20Datasets.ipynb

## [Lab 7: Interpretability in Data-Centric ML][lab-7]

[This lab][lab-7] guides you through finding issues in a dataset’s features by

applying interpretability techniques.

[lab-7]: interpretable_features/Lab%20-%20Interpretable%20Features.ipynb

## [Lab 8: Encoding Human Priors: Data Augmentation and Prompt Engineering][lab-8]

[This lab] guides you through prompt engineering, crafting inputs for large

language models (LLMs). With these large pre-trained models, even small amounts

of data can make them very useful. This lab is also [available on

Colab][lab-8-colab].

[lab-8]: prompt_engineering/Lab_Prompt_Engineering.ipynb

[lab-8-colab]: https://colab.research.google.com/drive/1cipH-u6Jz0EH-6Cd9MPYgY4K0sJZwRJq

## [Lab 9: Data Privacy and Security][lab-9]

The [lab assignment][lab-9] for this lecture is to implement a membership

inference attack. You are given a trained machine learning model, available as

a black-box prediction function. Your task is to devise a method to determine

whether or not a given data point was in the training set of this model. You

may implement some of the ideas presented in [today’s lecture][lec-9], or you

can look up other membership inference attack algorithms.

[lab-9]: membership_inference/Lab%20-%20Membership%20Inference.ipynb

[lec-9]: https://dcai.csail.mit.edu/lectures/data-privacy-security/

## License

Copyright (c) by the instructors of Introduction to Data-Centric AI (dcai.csail.mit.edu).

dcai-lab is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

dcai-lab is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See [GNU Affero General Public LICENSE](https://github.com/dcai-course/dcai-lab/blob/master/LICENSE.txt) for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dcai-course/dcai-lab

Awesome Lists containing this project

README