https://github.com/gitctrlx/llama.rs

Llama from scratch in Rust.
https://github.com/gitctrlx/llama.rs

Last synced: 2 months ago
JSON representation

Llama from scratch in Rust.

Host: GitHub
URL: https://github.com/gitctrlx/llama.rs
Owner: gitctrlx
License: apache-2.0
Created: 2025-08-04T09:44:17.000Z (11 months ago)
Default Branch: main
Last Pushed: 2026-01-22T02:52:33.000Z (5 months ago)
Last Synced: 2026-01-22T16:20:42.815Z (5 months ago)
Language: Rust
Homepage:
Size: 52.3 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE-APACHE

Awesome Lists containing this project

README

          # llama.rs

A pure Rust implementation of the LLaMA model for inference and educational purposes. Supports LLaMA 1, 2, and 3 architectures.

This repository demonstrates how to run LLaMA inference with minimal dependencies, making it ideal for learning and understanding transformer internals.

## Features

- **HF-aligned Architecture** – Matches **`HuggingFace`** reference implementation with clean, structured codebase matching official model layouts

- **Parallel MHA** – Multi-head attention parallelized with **Rayon** for 2-4x speedup on multi-core systems

- **Minimal Dependencies** – Only uses `byteorder`, `rayon`, `rand`, and `thiserror`

- **Educational** – Line-by-line readable transformer implementation with inline documentation

- **Type-safe** – Leverages Rust's type system for memory safety without garbage collection overhead

## Usage

```sh

cargo run --release --   [prompt] [options]

```

### Options

| Flag | Description | Default |

| ------ | ----------- | --------- |

| `--temp ` | Sampling temperature (0 = greedy) | 1.0 |

| `--topp ` | Top-p (nucleus) sampling | 0.9 |

| `--steps ` | Max tokens to generate | 256 |

| `--seed ` | Random seed | 0 |

### Example

```sh

cargo run --release -- stories15M.bin tokenizer.bin "Once upon a time" --temp 0.8 --steps 128

```

The examples use small models trained by [`Andrej Karpathy`](https://github.com/karpathy/llama2.c?tab=readme-ov-file#models) for demonstration.

## Related Work

If you're interested in LLaMA implementations in other languages:

- **[llama.go](https://github.com/gitctrlx/llama.go)** – Pure Go implementation

- **[llama.np](https://github.com/gitctrlx/llama.np)** – NumPy-based implementation

- **[llama.cu](https://github.com/gitctrlx/llama.cu)** – CUDA-accelerated implementation

## License

This project is licensed under either of the following licenses, at your option:

- Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0))

- MIT license ([LICENSE-MIT](LICENSE-MIT) or [https://opensource.org/licenses/MIT](https://opensource.org/licenses/MIT))

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in `llama.rs` by you, as defined in the Apache-2.0 license, shall be dually licensed as above, without any additional terms or conditions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gitctrlx/llama.rs

Awesome Lists containing this project

README