Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cafaxo/Llama2.jl
Julia package for inference and training of Llama-style language models
https://github.com/cafaxo/Llama2.jl
Last synced: about 2 months ago
JSON representation
Julia package for inference and training of Llama-style language models
- Host: GitHub
- URL: https://github.com/cafaxo/Llama2.jl
- Owner: cafaxo
- License: mit
- Created: 2023-07-28T14:54:33.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-07-18T19:01:49.000Z (5 months ago)
- Last Synced: 2024-07-19T01:42:53.550Z (5 months ago)
- Language: Julia
- Homepage:
- Size: 53.7 KB
- Stars: 125
- Watchers: 5
- Forks: 19
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-generative-ai-meets-julia-language - Llama2.jl - Llama2.jl provides simple code for inference and training of llama2-based language models based on [llama2.c](https://github.com/karpathy/llama2.c). It supports loading quantized weights in GGUF format (`q4_K_S` variant). Other similar projects: [LanguageModels.jl](https://github.com/rai-llc/LanguageModels.jl) (Models)
README
# Llama2.jl
Llama2.jl supports inference and training of Llama-style language models.
## Installation
This package is not yet registered. It can be installed by running
```julia
pkg> add https://github.com/cafaxo/Llama2.jl
```## Usage
We currently support:
- GGUF models: Llama 2, Llama 3, and Phi-3 (not all quantization variants may work)
- Andrej Karpathy's llama2.c formatFor example, here is some output from Llama 3:
```julia
julia> using Llama2julia> model = load_gguf_model("Meta-Llama-3-8B.Q4_K_S.gguf")
LanguageModel(
ModelConfig(
dim = 4096,
hidden_dim = 14336,
n_layers = 32,
n_heads = 32,
n_kv_heads = 8,
vocab_size = 128256,
seq_len = 512,
rope_freq_base = 500000.0,
))julia> sample(model, "The Julia programming language is"; temperature=0.0f0)
The Julia programming language is a high-level, high-performance dynamic language for technical computing. The language's feature set, based on modern application development platforms, includes support for metaprogramming, type declarations, multiple dispatch, and parallel computing. It also provides a sophisticated ecosystem of tools, libraries, and toolchains all accessible from the Julia REPL. Julia is an open-source project and is released under the MIT license.
```Andrej Karpathy's llama2.c models can be found at https://huggingface.co/karpathy/tinyllamas.
With these models, the [tokenizer.bin](https://github.com/karpathy/llama2.c/raw/b4bb47bb7baf0a5fb98a131d80b4e1a84ad72597/tokenizer.bin) file is also required.Here is an output sample from the 42M tinyllama model:
```julia
julia> using Llama2julia> download("https://huggingface.co/karpathy/tinyllamas/resolve/main/stories42M.bin", "stories42M.bin")
"stories42M.bin"julia> download("https://raw.githubusercontent.com/karpathy/llama2.c/b4bb47bb7baf0a5fb98a131d80b4e1a84ad72597/tokenizer.bin", "tokenizer.bin")
"tokenizer.bin"julia> model = load_karpathy_model("stories42M.bin", "tokenizer.bin");
julia> sample(model, "Tim was happy."; temperature = 0.8f0)
Tim was happy. He had a new toy. It was a big red car. He wanted to play with it all day.
Tim took his car outside. He found a drain. It was a big drain. Tim put his car on the drain. The car went down the drain.
Tim was sad. He missed his car. He went home. His mom saw him. She said, "Don't worry, we will get your car back." Tim was glad. He knew his mom would help him. They went to the drain. Tim's car came back. He was happy again.
-------
achieved tok/s: 282.80
```## Experimental training support
Llama2.jl can train very small Llama-style models on the CPU:
```julia
julia> using Llama2julia> text = read("tinyshakespeare.txt", String);
julia> tokenizer = CharTokenizer(text);
julia> tokens = encode(text, tokenizer);
julia> config = ModelConfig(dim=64, hidden_dim=96, n_layers=4, n_heads=4, n_kv_heads=4, vocab_size=length(tokenizer.id_to_token), seq_len=128)
ModelConfig(
dim = 64,
hidden_dim = 96,
n_layers = 4,
n_heads = 4,
n_kv_heads = 4,
vocab_size = 65,
seq_len = 128,
)julia> weights = train(config, tokens; n_tokens=4_000_000, batch_size=4);
Training a model with 148160 parameters...
Progress: 100%|███████████████████████████| Time: 0:01:31 (11.66 ms/it)
iteration: 7812 / 7812
training_loss: 1.497julia> model = LanguageModel(config, tokenizer, weights);
julia> sample(model, "Julia is"; stop_on_special_token=false, bos_token=false)
Julia is to seen all?FLORIZEL:
Now the sir?JOHN ORTHNROLIO:
Talksabated with me with a more thou Vrequitest.
The city good of
-------
achieved tok/s: 11479.52
```## Acknowledgements
This project started as a port of Andrej Karpathy's llama2.c (https://github.com/karpathy/llama2.c).
The quantization code is a port from https://github.com/ggerganov/llama.cpp.