Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/EugenHotaj/zig_gpt2
GPT-2 inference engine written in Zig
https://github.com/EugenHotaj/zig_gpt2
gpt-2 inference-engine zig
Last synced: about 1 month ago
JSON representation
GPT-2 inference engine written in Zig
- Host: GitHub
- URL: https://github.com/EugenHotaj/zig_gpt2
- Owner: EugenHotaj
- Created: 2023-06-07T02:25:03.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-24T01:38:54.000Z (over 1 year ago)
- Last Synced: 2024-05-22T18:12:30.041Z (7 months ago)
- Topics: gpt-2, inference-engine, zig
- Language: Zig
- Homepage:
- Size: 17.2 MB
- Stars: 26
- Watchers: 4
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-zig - EugenHotaj/zig_gpt2
README
# zig_gpt2
GPT-2 inference engine written in Zig. Generation time: ~28ms per token.### Features:
* No third-party dependencies besides BLAS (Accelerate or OpenBLAS).
* No memory allocations at runtime.
* Can run [NanoGPT](https://github.com/karpathy/nanoGPT).### How to Run:
Download the GPT-2 checkpoint from OpenAI.
```bash
python3 download_weights.py
```Build the Zig binary and run it with a prompt to generate completions:
```bash
zig build -DOptimize=ReleaseFast
./zig-out/bin/zig_gpt2 "Marcus Aurelius said"
```### How to Test:
Generate test data by forwarding random tensors through PyTorch ops.
```bash
python3 generate_test_data.py
```Run tests. Verifies Zig ops produce the same output as PyTorch.
```bash
zig build test
```---
### TODO
Implementation:
* ✅ Implement basic ops: Embedding, Linear, LayerNorm, GELU, Softmax, CausalSelfAttention.
* ✅ Implement transformer modules: MLP, Transformer block.
* ✅ Implement the full GPT model.
* ✅ Implement sampling from the model.
* ✅ Implement BPE encoding/decoding.
Efficiency:
* ✅ Replace custom linear algebra kernels with BLAS.
* ✅ Stream output as each new token is generated.
* ✅ Create central set of memory buffers and reuse them for each layer. No allocations at runtime.
* ✅ Add KV cache.
* Parallelize `softmax` and `gelu` operations.