https://github.com/ksm26/pretraining-llms
Master the essential steps of pretraining large language models (LLMs). Learn to create high-quality datasets, configure model architectures, execute training runs, and assess model performance for efficient and effective LLM pretraining.
https://github.com/ksm26/pretraining-llms
ai-training cost-effective-pretraining data-preparation depth-upscaling developer-advocacy high-quality-datasets hugging-face large-language-models llm-evaluation machine-learning meta-llama model-configuration model-initialization performance-assessment pretraining-llms text-generation training-runs
Last synced: 6 months ago
JSON representation
Master the essential steps of pretraining large language models (LLMs). Learn to create high-quality datasets, configure model architectures, execute training runs, and assess model performance for efficient and effective LLM pretraining.
- Host: GitHub
- URL: https://github.com/ksm26/pretraining-llms
- Owner: ksm26
- Created: 2024-07-29T12:27:40.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-07T12:51:30.000Z (about 1 year ago)
- Last Synced: 2024-08-07T17:26:53.108Z (about 1 year ago)
- Topics: ai-training, cost-effective-pretraining, data-preparation, depth-upscaling, developer-advocacy, high-quality-datasets, hugging-face, large-language-models, llm-evaluation, machine-learning, meta-llama, model-configuration, model-initialization, performance-assessment, pretraining-llms, text-generation, training-runs
- Language: Jupyter Notebook
- Homepage: https://www.deeplearning.ai/short-courses/pretraining-llms/
- Size: 29.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# π§ [Pretraining LLMs](https://www.deeplearning.ai/short-courses/pretraining-llms/)
Welcome to the "Pretraining LLMs" course! π§βπ« The course dives into the essential steps of pretraining large language models (LLMs).
## π Course Summary
In this course, youβll explore pretraining, the foundational step in training LLMs, which involves teaching an LLM to predict the next token using vast text datasets.π§ You'll learn the essential steps to pretrain an LLM, understand the associated costs, and discover cost-effective methods by leveraging smaller, existing open-source models.
**Detailed Learning Outcomes:**
1. π§ **Pretraining Basics**: Understand the scenarios where pretraining is the optimal choice for model performance. Compare text generation across different versions of the same model to grasp the performance differences between base, fine-tuned, and specialized pre-trained models.
2. ποΈ **Creating High-Quality Datasets**: Learn how to create and clean a high-quality training dataset using web text and existing datasets, and how to package this data for use with the Hugging Face library.
3. π§ **Model Configuration**: Explore ways to configure and initialize a model for training, including modifying Metaβs Llama models and initializing weights either randomly or from other models.
4. π **Executing Training Runs**: Learn how to configure and execute a training run to train your own model effectively.
5. π **Performance Assessment**: Assess your trained modelβs performance and explore common evaluation strategies for LLMs, including benchmark tasks used to compare different modelsβ performance.## π Key Points
- π§© **Pretraining Process**: Gain in-depth knowledge of the steps to pretrain an LLM, from data preparation to model configuration and performance assessment.
- ποΈ **Model Architecture Configuration**: Explore various options for configuring your modelβs architecture, including modifying Metaβs Llama models and innovative pretraining techniques like Depth Upscaling, which can reduce training costs by up to 70%.
- π οΈ **Practical Implementation**: Learn how to pretrain a model from scratch and continue the pretraining process on your own data using existing pre-trained models.## π©βπ« About the Instructors
- π¨βπ« **Sung Kim**: CEO of Upstage, bringing extensive expertise in LLM pretraining and optimization.
- π©βπ¬ **Lucy Park**: Chief Scientific Officer of Upstage, with a deep background in scientific research and LLM development.π To enroll in the course or for further information, visit π [deeplearning.ai](https://www.deeplearning.ai/short-courses/).