https://github.com/iraikov/llama-chicken
Llama2 inference in CHICKEN Scheme
https://github.com/iraikov/llama-chicken
chicken-scheme llama2 llm llm-inference scheme-language
Last synced: 4 months ago
JSON representation
Llama2 inference in CHICKEN Scheme
- Host: GitHub
- URL: https://github.com/iraikov/llama-chicken
- Owner: iraikov
- License: mit
- Created: 2025-06-13T23:43:32.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-06-14T00:11:51.000Z (4 months ago)
- Last Synced: 2025-06-14T00:28:26.410Z (4 months ago)
- Topics: chicken-scheme, llama2, llm, llm-inference, scheme-language
- Language: Scheme
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# LLAMA CHICKEN Scheme
A high-performance LLAMA2 inference implementation in CHICKEN Scheme,
based on Andrej Karpathy's
[llama2.c](https://github.com/karpathy/llama2.c) and its OCaml port
[llama2.ml](https://github.com/jackpeck/llama2.ml).### System Dependencies
- **CHICKEN Scheme 5.0+**: [The Scheme implementation](https://call-cc.org/)
- **BLAS Library**: For optimized linear algebra (OpenBLAS, Intel MKL, or system BLAS)
- **C Compiler**: GCC or Clang for compiling extensions## 🛠️ Installation
### 1. Install CHICKEN Scheme
```bash
# Ubuntu/Debian
sudo apt-get install chicken-bin libchicken-dev# macOS with Homebrew
brew install chicken# From source
wget https://code.call-cc.org/releases/5.3.0/chicken-5.3.0.tar.gz
tar xzf chicken-5.3.0.tar.gz
cd chicken-5.3.0
make PLATFORM=linux PREFIX=/usr/local
sudo make PLATFORM=linux PREFIX=/usr/local install
```### 2. Install BLAS Library
```bash
# Ubuntu/Debian
sudo apt-get install libopenblas-dev# macOS with Homebrew
brew install openblas# CentOS/RHEL
sudo yum install openblas-devel
```### 3. Install Required CHICKEN Extensions
```bash
chicken-install llama
```## Quick Start
### Model Checkpoint
Download this 15M parameter model trained on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset (~60MB download):
```bash
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
```### Basic Text Generation
Ensure that file tokenizer.bin is the current directory. Then run:
```bash
# Generate text with default settings
llama-cli -c stories15M.bin -p "Once upon a time"# Creative generation with temperature
llama-cli -c stories15M.bin -t 0.8 -s 100 -p "The meaning of life is"# Deterministic generation
llama-cli -c stories15M.bin -t 0.0 -s 50 -p "To be or not to be"
```### Verify Model Checkpoint
```bash
llama-cli -c stories15M.bin --verify-checkpoint
```## API Documentation
### Core Data Types
#### `config`
Model configuration parameters.
```scheme
(make-config dim hidden-dim n-layers n-heads n-kv-heads vocab-size seq-len shared-weights)
```**Fields:**
- `dim`: Model embedding dimension
- `hidden-dim`: FFN hidden layer dimension
- `n-layers`: Number of transformer layers
- `n-heads`: Number of attention heads
- `n-kv-heads`: Number of key-value heads
- `vocab-size`: Vocabulary size
- `seq-len`: Maximum sequence length
- `shared-weights`: Whether to share input/output embeddings#### `transformer-weights`
Container for all model parameters.
```scheme
(make-transformer-weights token-embedding-table rms-att-weight wq wk wv wo
rms-ffn-weight w1 w2 w3 rms-final-weight
freq-cis-real freq-cis-imag wcls)
```#### `run-state`
Runtime state for transformer computation.
```scheme
(make-run-state x xb q k v att key-cache value-cache xb2 hb hb2 logits)
```**Fields:**
- `x`: Current hidden state
- `xb`, `xb2`: Temporary buffers
- `q`, `k`, `v`: Query, Key, Value vectors
- `att`: Attention scores
- `key-cache`, `value-cache`: Attention caches
- `hb`, `hb2`: FFN hidden buffers
- `logits`: Output logits### High-Level Functions
#### `(run args)`
Main inference function.
```scheme
(define args (make-args "model.bin" "tokenizer.bin" 0.8 100 "Hello world" #f))
(run args)
```#### `(bpe-encode text vocab vocab-scores)`
Tokenize text using Byte-Pair Encoding.
```scheme
(bpe-encode "Hello world" vocab vocab-scores)
;; => (15496 1776)
```#### `(transformer token pos config state weights)`
Run transformer forward pass.
```scheme
(transformer token-id position config state weights)
;; => updated state with new logits
```### Transformer Components
The modular architecture provides fine-grained control over transformer computation:
#### Token Processing
```scheme
;; Load token embedding
(token-embedding-lookup state weights token-id);; Get positional frequencies
(let-values (((freq-real freq-imag)
(get-rope-frequencies weights position head-size)))
...)
```#### Attention Components
```scheme
;; Attention normalization
(attention-rmsnorm state weights layer-idx config);; Compute Q, K, V matrices
(compute-qkv state weights layer-idx config);; Apply rotary position embedding
(apply-rope state config freq-real freq-imag);; Cache key-value pairs
(cache-kv state layer-idx position config);; Compute attention scores and apply
(compute-attention state layer-idx position config);; Output projection
(attention-output state weights layer-idx config)
```#### Feed-Forward Network
```scheme
;; FFN normalization
(ffn-rmsnorm state weights layer-idx config);; Compute W1 and W3 projections
(compute-ffn-w1w3 state weights layer-idx config);; Apply SwiGLU activation
(apply-swiglu state config);; Final projection
(ffn-output state weights layer-idx config)
```#### Layer Processing
```scheme
;; Process complete transformer layer
(process-transformer-layer state weights layer-idx position config
freq-real freq-imag)
```### Utility Functions
#### Vector Operations
```scheme
;; RMS normalization
(rmsnorm output input weights);; Matrix-vector multiplication
(matmul output input matrix rows cols);; Softmax activation
(softmax output input size);; Vector accumulation (residual connections)
(accum target source)
```#### Sampling Functions
```scheme
;; Greedy sampling (argmax)
(argmax logits-vector);; Probabilistic sampling
(sample probability-vector random-state)
```### CLI Options
| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| `--help` | `-h` | Show help message | - |
| `--checkpoint` | `-c` | Model checkpoint file (required) | - |
| `--tokenizer` | `-k` | Tokenizer file | `tokenizer.bin` |
| `--temperature` | `-t` | Sampling temperature (0.0-2.0) | `0.0` |
| `--steps` | `-s` | Number of tokens to generate | `256` |
| `--prompt` | `-p` | Input prompt text | `""` |
| `--seed` | | Random seed for sampling | Random |
| `--verify-checkpoint` | | Verify checkpoint integrity | `false` |## 🔧 Configuration
### Model Files
- **Checkpoint**: Binary file containing model weights (`.bin`)
- **Tokenizer**: Binary file containing vocabulary and BPE merge rules### Temperature Guidelines
- **0.0**: Deterministic (greedy sampling)
- **0.1-0.3**: Focused, coherent output
- **0.5-0.8**: Balanced creativity and coherence
- **0.9-1.2**: Creative, diverse output
- **1.5+**: Highly random, experimental## Examples
### Interactive REPL Usage
```scheme
(import llama);; Load model
(define config (make-config 512 2048 8 8 8 32000 2048 #t))
(define weights (load-checkpoint "model.bin"))
(define state (make-run-state ...));; Generate single token
(transformer 1 0 config state weights)
(argmax (run-state-logits state));; Custom sampling
(define probs (softmax (make-f32vector 32000) (run-state-logits state) 32000))
(sample probs random-state)
```### Batch Processing
```scheme
;; Process multiple prompts
(define prompts '("Hello world" "The meaning of life" "Once upon a time"))(for-each (lambda (prompt)
(printf "Prompt: ~A\n" prompt)
(let ((args (make-args "model.bin" "tokenizer.bin" 0.5 50 prompt #f)))
(run args)
(newline)))
prompts)
```## License
MIT License - see LICENSE file for details.
## Acknowledgments
- Original LLAMA2 paper and implementation by Meta AI
- Andrej Karpathy's C implementation of LLAMA2 [llama2.c](https://github.com/karpathy/llama2.c)
- The LLAMA2 Common Lisp port [llama.cl](https://github.com/snunez1/llama.cl)
- The LLAMA2 OCaml port [llama2.ml](https://github.com/jackpeck/llama2.ml)
- BLAS library maintainers for high-performance linear algebra
- CHICKEN Scheme community for excellent libraries
## Original README.md
For instructions on conversions to/from .bin format, training and
other background, see the [original repo](https://github.com/karpathy/llama2.c).