https://github.com/chiphuyen/metaflow-transformers-tutorials
Metaflow tutorials for ODSC West 2021
https://github.com/chiphuyen/metaflow-transformers-tutorials
machine-learning metaflow
Last synced: 12 months ago
JSON representation
Metaflow tutorials for ODSC West 2021
- Host: GitHub
- URL: https://github.com/chiphuyen/metaflow-transformers-tutorials
- Owner: chiphuyen
- Created: 2021-11-11T05:21:54.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-11-16T19:08:26.000Z (over 4 years ago)
- Last Synced: 2025-03-28T04:41:39.522Z (about 1 year ago)
- Topics: machine-learning, metaflow
- Language: Jupyter Notebook
- Homepage:
- Size: 24.8 MB
- Stars: 64
- Watchers: 4
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-metaflow - metaflow-transformers-tutorials - HuggingFace DistilBERT fine-tuning tutorials by Chip Huyen. (Examples & Tutorials)
README
First, get started with Metaflow by executing these simple flows:
1. `helloworld.py` - a simple hello world flow
2. `counter_branch.py` - test artifacts
3. `parameters.py` - test parameters
4. `foreach.py` - test foreaches (parallel tasks)
After these simple examples, you can take a look at a more realistic case:
In this tutorial, we'll fine-tune a sentiment analysis model on top of
HuggingFace's DistilBERT model with the IMDB dataset.
First, we'll show how to do it without Metaflow.
1. sent_analysis_train.py is the training code (6-7 minutes on the small dataset of 100 samples on my Mac)
2. sent_analysis_predict.py is the prediction code (30 seconds)
We'll do live coding to show how to convert the training code to Metaflow.
See sent_analysis_metaflow.py for instructions.
We'll run `python sent_analysis_metaflow.py --no-pylint run --mode small`
to train a model on 100 samples locally.
We'll show how Metaflow automatically saves trained models which we can access for predictions.
We'll use @batch to train the full dataset (40,000 samples) on AWS.
We'll need GPU since it'll take a while for the full data on CPU.