https://github.com/lukasdrews97/dumblellm

Decoder-only LLM trained on the Harry Potter books.
https://github.com/lukasdrews97/dumblellm

byte-pair-encoding flash-attention grouped-query-attention large-language-model rotary-position-embedding transformer

Last synced: 6 months ago
JSON representation

Decoder-only LLM trained on the Harry Potter books.

Host: GitHub
URL: https://github.com/lukasdrews97/dumblellm
Owner: LukasDrews97
Created: 2024-12-11T15:47:42.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-12-20T11:37:49.000Z (10 months ago)
Last Synced: 2025-02-10T14:26:41.335Z (8 months ago)
Topics: byte-pair-encoding, flash-attention, grouped-query-attention, large-language-model, rotary-position-embedding, transformer
Language: Python
Homepage:
Size: 235 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # DumbleLLM - Custom Large Language Model

This repository contains the code for a decoder-only transformer, similar to Llama or GPT. It was trained on an English corpus built from the seven Harry Potter books and has roughly 75M trainable parameters.

# Technical Features

- Tokenization: Byte pair encoding (sentencepiece)

- FlashAttention, Grouped Query Attention

- Rotary Position Embeddings

- Key Value Cache

- Sampling: top-p, top-k

# Training Configuration

| **Parameter**          | **Value**   |

|------------------------|-------------|

| Layer                  | 4           |

| Model Dimension        | 768         |

| Context Length         | 1024        |

| Attention Heads        | 8           |

| Key/Value Heads        | 4           |

| Vocabulary Size        | 32000       |

| RoPE Theta             | 10000       |

# Roadmap

- [x] ~~Grouped Query Attention~~

- [x] ~~Rotary Position Embeddings~~

- [x] ~~Key Value Cache~~

- [ ] Distributed training

- [ ] Finetuning with (Q)LoRA

- [ ] Add Mixture of Experts model

# Example Prompts

TODO

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lukasdrews97/dumblellm

Awesome Lists containing this project

README