https://github.com/iraikov/llama-chicken

Llama2 inference in CHICKEN Scheme
https://github.com/iraikov/llama-chicken
chicken-scheme llama2 llm llm-inference scheme-language
Last synced: 4 months ago
JSON representation
Llama2 inference in CHICKEN Scheme
Host: GitHub
URL: https://github.com/iraikov/llama-chicken
Owner: iraikov
License: mit
Created: 2025-06-13T23:43:32.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-06-14T00:11:51.000Z (4 months ago)
Last Synced: 2025-06-14T00:28:26.410Z (4 months ago)
Topics: chicken-scheme, llama2, llm, llm-inference, scheme-language
Language: Scheme
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          
# LLAMA CHICKEN Scheme

A high-performance LLAMA2 inference implementation in CHICKEN Scheme,

based on Andrej Karpathy's

[llama2.c](https://github.com/karpathy/llama2.c) and its OCaml port

[llama2.ml](https://github.com/jackpeck/llama2.ml).

### System Dependencies

- **CHICKEN Scheme 5.0+**: [The Scheme implementation](https://call-cc.org/)

- **BLAS Library**: For optimized linear algebra (OpenBLAS, Intel MKL, or system BLAS)

- **C Compiler**: GCC or Clang for compiling extensions

## 🛠️ Installation

### 1. Install CHICKEN Scheme

```bash

# Ubuntu/Debian

sudo apt-get install chicken-bin libchicken-dev

# macOS with Homebrew

brew install chicken

# From source

wget https://code.call-cc.org/releases/5.3.0/chicken-5.3.0.tar.gz

tar xzf chicken-5.3.0.tar.gz

cd chicken-5.3.0

make PLATFORM=linux PREFIX=/usr/local

sudo make PLATFORM=linux PREFIX=/usr/local install

```

### 2. Install BLAS Library

```bash

# Ubuntu/Debian

sudo apt-get install libopenblas-dev

# macOS with Homebrew

brew install openblas

# CentOS/RHEL

sudo yum install openblas-devel

```

### 3. Install Required CHICKEN Extensions

```bash

chicken-install llama

```

## Quick Start

### Model Checkpoint

Download this 15M parameter model trained on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset (~60MB download):

```bash

wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin

```

### Basic Text Generation

Ensure that file tokenizer.bin is the current directory. Then run:

```bash

# Generate text with default settings

llama-cli -c stories15M.bin -p "Once upon a time"

# Creative generation with temperature

llama-cli -c stories15M.bin -t 0.8 -s 100 -p "The meaning of life is"

# Deterministic generation

llama-cli -c stories15M.bin -t 0.0 -s 50 -p "To be or not to be"

```

### Verify Model Checkpoint

```bash

llama-cli -c stories15M.bin --verify-checkpoint

```

## API Documentation

### Core Data Types

#### `config`

Model configuration parameters.

```scheme

(make-config dim hidden-dim n-layers n-heads n-kv-heads vocab-size seq-len shared-weights)

```

**Fields:**

- `dim`: Model embedding dimension

- `hidden-dim`: FFN hidden layer dimension  

- `n-layers`: Number of transformer layers

- `n-heads`: Number of attention heads

- `n-kv-heads`: Number of key-value heads

- `vocab-size`: Vocabulary size

- `seq-len`: Maximum sequence length

- `shared-weights`: Whether to share input/output embeddings

#### `transformer-weights`

Container for all model parameters.

```scheme

(make-transformer-weights token-embedding-table rms-att-weight wq wk wv wo 

                         rms-ffn-weight w1 w2 w3 rms-final-weight 

                         freq-cis-real freq-cis-imag wcls)

```

#### `run-state`

Runtime state for transformer computation.

```scheme

(make-run-state x xb q k v att key-cache value-cache xb2 hb hb2 logits)

```

**Fields:**

- `x`: Current hidden state

- `xb`, `xb2`: Temporary buffers

- `q`, `k`, `v`: Query, Key, Value vectors

- `att`: Attention scores

- `key-cache`, `value-cache`: Attention caches

- `hb`, `hb2`: FFN hidden buffers

- `logits`: Output logits

### High-Level Functions

#### `(run args)`

Main inference function.

```scheme

(define args (make-args "model.bin" "tokenizer.bin" 0.8 100 "Hello world" #f))

(run args)

```

#### `(bpe-encode text vocab vocab-scores)`

Tokenize text using Byte-Pair Encoding.

```scheme

(bpe-encode "Hello world" vocab vocab-scores)

;; => (15496 1776)

```

#### `(transformer token pos config state weights)`

Run transformer forward pass.

```scheme

(transformer token-id position config state weights)

;; => updated state with new logits

```

### Transformer Components

The modular architecture provides fine-grained control over transformer computation:

#### Token Processing

```scheme

;; Load token embedding

(token-embedding-lookup state weights token-id)

;; Get positional frequencies

(let-values (((freq-real freq-imag) 

              (get-rope-frequencies weights position head-size)))

  ...)

```

#### Attention Components

```scheme

;; Attention normalization

(attention-rmsnorm state weights layer-idx config)

;; Compute Q, K, V matrices

(compute-qkv state weights layer-idx config)

;; Apply rotary position embedding

(apply-rope state config freq-real freq-imag)

;; Cache key-value pairs

(cache-kv state layer-idx position config)

;; Compute attention scores and apply

(compute-attention state layer-idx position config)

;; Output projection

(attention-output state weights layer-idx config)

```

#### Feed-Forward Network

```scheme

;; FFN normalization

(ffn-rmsnorm state weights layer-idx config)

;; Compute W1 and W3 projections

(compute-ffn-w1w3 state weights layer-idx config)

;; Apply SwiGLU activation

(apply-swiglu state config)

;; Final projection

(ffn-output state weights layer-idx config)

```

#### Layer Processing

```scheme

;; Process complete transformer layer

(process-transformer-layer state weights layer-idx position config 

                          freq-real freq-imag)

```

### Utility Functions

#### Vector Operations

```scheme

;; RMS normalization

(rmsnorm output input weights)

;; Matrix-vector multiplication

(matmul output input matrix rows cols)

;; Softmax activation

(softmax output input size)

;; Vector accumulation (residual connections)

(accum target source)

```

#### Sampling Functions

```scheme

;; Greedy sampling (argmax)

(argmax logits-vector)

;; Probabilistic sampling

(sample probability-vector random-state)

```

### CLI Options

| Option | Short | Description | Default |

|--------|-------|-------------|---------|

| `--help` | `-h` | Show help message | - |

| `--checkpoint` | `-c` | Model checkpoint file (required) | - |

| `--tokenizer` | `-k` | Tokenizer file | `tokenizer.bin` |

| `--temperature` | `-t` | Sampling temperature (0.0-2.0) | `0.0` |

| `--steps` | `-s` | Number of tokens to generate | `256` |

| `--prompt` | `-p` | Input prompt text | `""` |

| `--seed` | | Random seed for sampling | Random |

| `--verify-checkpoint` | | Verify checkpoint integrity | `false` |

## 🔧 Configuration

### Model Files

- **Checkpoint**: Binary file containing model weights (`.bin`)

- **Tokenizer**: Binary file containing vocabulary and BPE merge rules

### Temperature Guidelines

- **0.0**: Deterministic (greedy sampling)

- **0.1-0.3**: Focused, coherent output

- **0.5-0.8**: Balanced creativity and coherence

- **0.9-1.2**: Creative, diverse output

- **1.5+**: Highly random, experimental

## Examples

### Interactive REPL Usage

```scheme

(import llama)

;; Load model

(define config (make-config 512 2048 8 8 8 32000 2048 #t))

(define weights (load-checkpoint "model.bin"))

(define state (make-run-state ...))

;; Generate single token

(transformer 1 0 config state weights)

(argmax (run-state-logits state))

;; Custom sampling

(define probs (softmax (make-f32vector 32000) (run-state-logits state) 32000))

(sample probs random-state)

```

### Batch Processing

```scheme

;; Process multiple prompts

(define prompts '("Hello world" "The meaning of life" "Once upon a time"))

(for-each (lambda (prompt)

            (printf "Prompt: ~A\n" prompt)

            (let ((args (make-args "model.bin" "tokenizer.bin" 0.5 50 prompt #f)))

              (run args)

              (newline)))

          prompts)

```

## License

MIT License - see LICENSE file for details.

## Acknowledgments

- Original LLAMA2 paper and implementation by Meta AI

- Andrej Karpathy's C implementation of LLAMA2 [llama2.c](https://github.com/karpathy/llama2.c)

- The LLAMA2 Common Lisp port [llama.cl](https://github.com/snunez1/llama.cl) 

- The LLAMA2 OCaml port [llama2.ml](https://github.com/jackpeck/llama2.ml) 

- BLAS library maintainers for high-performance linear algebra

- CHICKEN Scheme community for excellent libraries

 

## Original README.md

For instructions on conversions to/from .bin format, training and

other background, see the [original repo](https://github.com/karpathy/llama2.c).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/iraikov/llama-chicken

Awesome Lists containing this project

README