https://github.com/lukasdrews97/dumblellm
Decoder-only LLM trained on the Harry Potter books.
https://github.com/lukasdrews97/dumblellm
byte-pair-encoding flash-attention grouped-query-attention large-language-model rotary-position-embedding transformer
Last synced: about 2 months ago
JSON representation
Decoder-only LLM trained on the Harry Potter books.
- Host: GitHub
- URL: https://github.com/lukasdrews97/dumblellm
- Owner: LukasDrews97
- Created: 2024-12-11T15:47:42.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-12-20T11:37:49.000Z (5 months ago)
- Last Synced: 2025-02-10T14:26:41.335Z (3 months ago)
- Topics: byte-pair-encoding, flash-attention, grouped-query-attention, large-language-model, rotary-position-embedding, transformer
- Language: Python
- Homepage:
- Size: 235 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DumbleLLM - Custom Large Language Model
This repository contains the code for a decoder-only transformer, similar to Llama or GPT. It was trained on an English corpus built from the seven Harry Potter books and has roughly 75M trainable parameters.
# Technical Features
- Tokenization: Byte pair encoding (sentencepiece)
- FlashAttention, Grouped Query Attention
- Rotary Position Embeddings
- Key Value Cache
- Sampling: top-p, top-k# Training Configuration
| **Parameter** | **Value** |
|------------------------|-------------|
| Layer | 4 |
| Model Dimension | 768 |
| Context Length | 1024 |
| Attention Heads | 8 |
| Key/Value Heads | 4 |
| Vocabulary Size | 32000 |
| RoPE Theta | 10000 |# Roadmap
- [x] ~~Grouped Query Attention~~
- [x] ~~Rotary Position Embeddings~~
- [x] ~~Key Value Cache~~
- [ ] Distributed training
- [ ] Finetuning with (Q)LoRA
- [ ] Add Mixture of Experts model# Example Prompts
TODO