https://github.com/saforem2/llm-workshop-talk

Simple tutorial on creating Small(-ish) LLMs (pt. 2 🎉!!)
https://github.com/saforem2/llm-workshop-talk

Last synced: 8 months ago
JSON representation

Simple tutorial on creating Small(-ish) LLMs (pt. 2 🎉!!)

Host: GitHub
URL: https://github.com/saforem2/llm-workshop-talk
Owner: saforem2
Created: 2024-02-12T20:58:36.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-09-10T14:51:56.000Z (over 1 year ago)
Last Synced: 2025-04-23T17:59:05.396Z (9 months ago)
Language: Python
Homepage: https://saforem2.github.io/llm-workshop-talk/
Size: 5.04 MB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: docs/README 2.html

Awesome Lists containing this project

README

          

Creating Small(-ish) LLMs

code{white-space: pre-wrap;}

span.smallcaps{font-variant: small-caps;}

div.columns{display: flex; gap: min(4vw, 1.5em);}

div.column{flex: auto; overflow-x: auto;}

div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}

ul.task-list{list-style: none;}

ul.task-list li input[type="checkbox"] {

  width: 0.8em;

  margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 

  vertical-align: middle;

}

/* CSS for syntax highlighting */

pre > code.sourceCode { white-space: pre; position: relative; }

pre > code.sourceCode > span { line-height: 1.25; }

pre > code.sourceCode > span:empty { height: 1.2em; }

.sourceCode { overflow: visible; }

code.sourceCode > span { color: inherit; text-decoration: inherit; }

div.sourceCode { margin: 1em 0; }

pre.sourceCode { margin: 0; }

@media screen {

div.sourceCode { overflow: auto; }

}

@media print {

pre > code.sourceCode { white-space: pre-wrap; }

pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }

}

pre.numberSource code

  { counter-reset: source-line 0; }

pre.numberSource code > span

  { position: relative; left: -4em; counter-increment: source-line; }

pre.numberSource code > span > a:first-child::before

  { content: counter(source-line);

    position: relative; left: -1em; text-align: right; vertical-align: baseline;

    border: none; display: inline-block;

    -webkit-touch-callout: none; -webkit-user-select: none;

    -khtml-user-select: none; -moz-user-select: none;

    -ms-user-select: none; user-select: none;

    padding: 0 4px; width: 4em;

  }

pre.numberSource { margin-left: 3em;  padding-left: 4px; }

div.sourceCode

  {   }

@media screen {

pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }

}

/* CSS for citations */

div.csl-bib-body { }

div.csl-entry {

  clear: both;

  margin-bottom: 0em;

}

.hanging-indent div.csl-entry {

  margin-left:2em;

  text-indent:-2em;

}

div.csl-left-margin {

  min-width:2em;

  float:left;

}

div.csl-right-inline {

  margin-left:2em;

  padding-left:1em;

}

div.csl-indent {

  margin-left: 2em;

}

{

  "location": "navbar",

  "copy-button": false,

  "collapse-after": 3,

  "panel-placement": "end",

  "type": "overlay",

  "limit": 50,

  "keyboard-shortcut": [

    "f",

    "/",

    "s"

  ],

  "show-item-context": false,

  "language": {

    "search-no-results-text": "No results",

    "search-matching-documents-text": "matching documents",

    "search-copy-link-title": "Copy link to search",

    "search-hide-matches-text": "Hide additional matches",

    "search-more-match-text": "more match in this document",

    "search-more-matches-text": "more matches in this document",

    "search-clear-button-title": "Clear",

    "search-text-placeholder": "",

    "search-detached-cancel-button-title": "Cancel",

    "search-submit-button-title": "Submit",

    "search-label": "Search"

  }

}

window.dataLayer = window.dataLayer || [];

function gtag(){dataLayer.push(arguments);}

gtag('js', new Date());

gtag('config', 'G-XVM2Y822Y1', { 'anonymize_ip': true});



  

    

      

      

    

    

    

    

    Creating Small(-ish) LLMs

    

  

            

          

  

          


            

  

     

 



    



           

          

  



       

    



    


        

    On this page

   

  

  Creating Small(-ish) LLMs

  LLMs from Scratch

  Emergent Abilities

  Training LLMs

  Life-Cycle of the LLM

  Forward Pass

  Generating Text

  Life-Cycle of the LLM: Pre-training

  Life-Cycle of the LLM: Fine-Tuning

  Assistant Models

  saforem2/wordplay 🎮💬

  saforem2/wordplay 🎮💬

  Install

  Dependencies

  Quick Start

  Model model.py

  Trainer trainer.py

  Hands-on Tutorial

  

  Links

  References

  



View source

Edit this page

Report an issue



    






Creating Small(-ish) LLMs









  

  

  

    

    Sam Foreman   

  

  

        

            

            Argonne National Laboratory

            

          

      

  




      

    


    

    

      February 13, 2024

    

  

  

    

    

    

      February 13, 2024

    

  

    

  

  

Creating Small(-ish) LLMs

Sam Foreman  2024-02-13


LLMs from Scratch





Emergent Abilities





Emergent abilities of Large Language Models Yao et al. (2023)



Training LLMs





Life-Cycle of the LLM





Forward Pass


Generating Text


Life-Cycle of the LLM: Pre-training



Life-Cycle of the LLM: Fine-Tuning



Assistant Models



saforem2/wordplay 🎮💬




Fork of Andrej Karpathy’s nanoGPT







saforem2/wordplay 🎮💬





Install

python3 -m pip install "git+https://github.com/saforem2/wordplay.git"

python3 -c 'import wordplay; print(wordplay.__file__)'

# ./wordplay/src/wordplay/__init__.py


Dependencies





transformers for transformers (to load GPT-2 checkpoints)



datasets for datasets (if you want to use OpenWebText)



tiktoken for OpenAI’s fast BPE code



wandb for optional logging



tqdm for progress bars



Quick Start





We start with training a character-level GPT on the works of Shakespeare.



Downloading the data (~ 1MB) file

Convert raw text to one large stream of integers



python3 data/shakespeare_char/prepare.py

This will create data/shakespeare_char/{train.bin, val.bin}.





Model  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
< 
< 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
< 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
< 
 
 
 
 
 
 
< 
 
 
 
 
 
 
 
 
 
 
 
< 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


com/saforem2/wordplay/blob/master/src/wordplay/model.py">model.py ntainer">
 class="nav-link active">CausalSelfAttention class="nav-link">LayerNorm class="nav-link">MLP class="nav-link">Block class="nav-link">GPT class="sourceCode python"> class="kw">class CausalSelfAttention(nn.Module): def __init__(self, config: GPTModelConfig): super().__init__() assert config.n_embd % config.n_head == 0 # key, query, value projections for all heads, but in a batch self.c_attn = nn.Linear( config.n_embd, 3 * config.n_embd, bias=config.bias ) # output projection self.c_proj = nn.Linear( config.n_embd, config.n_embd, bias=config.bias ) # regularization self.attn_dropout = nn.Dropout(config.dropout) self.resid_dropout = nn.Dropout(config.dropout) self.n_head = config.n_head self.n_embd = config.n_embd self.dropout = config.dropout # flash attention make GPU go brrrrr but support is only in # PyTorch >= 2.0 self.flash = hasattr( torch.nn.functional, 'scaled_dot_product_attention' ) # if self.flash and RANK == 0: #     log.warning( #         f'Using torch.nn.functional.scaled_dot_product_attention' #         '(Flash Attn)' #     ) if not self.flash: log.warning( "WARNING: using slow attention." "Flash Attention requires PyTorch >= 2.0" ) # causal mask to ensure that attention is only applied to the left # in the input sequence self.register_buffer( "bias", torch.tril( torch.ones( config.block_size, config.block_size ) ).view(1, 1, config.block_size, config.block_size) )  def forward(self, x): # batch size, sequence length, embedding dimensionality (n_embd) B, T, C = x.size()  # calculate query, key, values for all heads in batch and move head # forward to be the batch dim q, k, v = self.c_attn(x).split(self.n_embd, dim=2) # (B, nh, T, hs) k = k.view(B, T, self.n_head, C // self.n_head).transpose(1, 2) # (B, nh, T, hs) q = q.view(B, T, self.n_head, C // self.n_head).transpose(1, 2) # (B, nh, T, hs) v = v.view(B, T, self.n_head, C // self.n_head).transpose(1, 2) # causal self-attention; Self-attend: # (B, nh, T, hs) x (B, nh, hs, T) -> (B, nh, T, T) if self.flash: # efficient attention using Flash Attention CUDA kernels y = torch.nn.functional.scaled_dot_product_attention( q, k, v, attn_mask=None, dropout_p=(self.dropout if self.training else 0), is_causal=True ) else: # manual implementation of attention att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1))) att = att.masked_fill( self.bias[:, :, :T, :T] == 0,  # type:ignore float('-inf') ) att = F.softmax(att, dim=-1) att = self.attn_dropout(att) y = att @ v  # (B, nh, T, T) x (B, nh, T, hs) -> (B, nh, T, hs) # re-assemble all head outputs side by side y = y.transpose(1, 2).contiguous().view(B, T, C)  # output projection y = self.resid_dropout(self.c_proj(y)) return y

class="sourceCode python">

class LayerNorm(nn.Module): """ class="co">    LayerNorm but with an optional bias. /span> class="co">    (PyTorch doesn't support simply bias=False) class="co">    """ /span> def __init__(self, ndim, bias): super().__init__() self.weight = nn.Parameter(torch.ones(ndim)) self.bias = nn.Parameter(torch.zeros(ndim)) if bias else None  def forward(self, input): return F.layer_norm( input, self.weight.shape, self.weight, self.bias, 1e-5 )

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/saforem2/llm-workshop-talk

Awesome Lists containing this project

README

On this page

Creating Small(-ish) LLMs

Creating Small(-ish) LLMs

LLMs from Scratch

Emergent Abilities

Training LLMs

Life-Cycle of the LLM

Forward Pass

Generating Text

Life-Cycle of the LLM: Pre-training

Life-Cycle of the LLM: Fine-Tuning

Assistant Models

`saforem2/wordplay` 🎮💬

`saforem2/wordplay` 🎮💬

Install

Dependencies

Quick Start

Trainer `trainer.py`

Hands-on Tutorial

Links

Acknowledgements

References

Citation