Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/c17hawke/data-augmentation-dmls
Summary of Data Augmentation topics from book - "Huyen, C. (2022). Designing Machine Learning Systems"
https://github.com/c17hawke/data-augmentation-dmls
data-augmentation python
Last synced: 23 days ago
JSON representation
Summary of Data Augmentation topics from book - "Huyen, C. (2022). Designing Machine Learning Systems"
- Host: GitHub
- URL: https://github.com/c17hawke/data-augmentation-dmls
- Owner: c17hawke
- Created: 2023-12-01T18:17:32.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2023-12-02T05:06:40.000Z (11 months ago)
- Last Synced: 2023-12-02T07:29:56.559Z (11 months ago)
- Topics: data-augmentation, python
- Language: Jupyter Notebook
- Homepage:
- Size: 17.7 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Augmentation
It is a family of techniques to increase the amount of training data
There are 3 kinds of data augmentation techniques as per the author -
1. Simple label-preserving transformation
2. Perturbation
3. Data synthesis| | Simple label-preserving transformation | Perturbation | Data synthesis |
| --- | --- | --- | --- |
| What? | Random modification of data while preserving the label. | Adding noise to the data while preserving the label. | Use GANs to generate synthetic data. Can use costly DALL-E-like services as well. |
| Examples in CV | Random flipping, Random rotation, etc. | Adding noise patches, or changing a single pixel values | Using CycleGAN to synthesize or generate new samples. |
| Examples in NLP | Replacing words in a sentence with its synonyms | Adding random symbols, or words in a sentence | Using templating to generate new samples |
| Why? | Increase training sample per label/class | To improve model performance as well as evaluate model performance (i.e. How good is our model to adversarial attacks) | Increase training data using GAN techniques. |---
## Example notebooks -
- Example notebooks for CV - [link](https://github.com/c17hawke/Data-augmentation-DMLS/tree/main/notebooks/CV)
- Example notebooks for NLP - [link](https://github.com/c17hawke/Data-augmentation-DMLS/tree/main/notebooks/NLP)## References -
- [1] Huyen, C. (2022). Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications