An open API service indexing awesome lists of open source software.

https://github.com/lukasdrews97/dumblellm

Decoder-only LLM trained on the Harry Potter books.
https://github.com/lukasdrews97/dumblellm

byte-pair-encoding flash-attention grouped-query-attention large-language-model rotary-position-embedding transformer

Last synced: about 2 months ago
JSON representation

Decoder-only LLM trained on the Harry Potter books.

Awesome Lists containing this project

README

        

# DumbleLLM - Custom Large Language Model

This repository contains the code for a decoder-only transformer, similar to Llama or GPT. It was trained on an English corpus built from the seven Harry Potter books and has roughly 75M trainable parameters.

# Technical Features

- Tokenization: Byte pair encoding (sentencepiece)
- FlashAttention, Grouped Query Attention
- Rotary Position Embeddings
- Key Value Cache
- Sampling: top-p, top-k

# Training Configuration
| **Parameter** | **Value** |
|------------------------|-------------|
| Layer | 4 |
| Model Dimension | 768 |
| Context Length | 1024 |
| Attention Heads | 8 |
| Key/Value Heads | 4 |
| Vocabulary Size | 32000 |
| RoPE Theta | 10000 |

# Roadmap
- [x] ~~Grouped Query Attention~~
- [x] ~~Rotary Position Embeddings~~
- [x] ~~Key Value Cache~~
- [ ] Distributed training
- [ ] Finetuning with (Q)LoRA
- [ ] Add Mixture of Experts model

# Example Prompts
TODO