https://github.com/lwch/llama2.go

Port of Facebook's LLaMA 2 model in pure go and use little memory
https://github.com/lwch/llama2.go

golang inference llama llama2

Last synced: about 1 month ago
JSON representation

Port of Facebook's LLaMA 2 model in pure go and use little memory

Host: GitHub
URL: https://github.com/lwch/llama2.go
Owner: lwch
License: mit
Created: 2023-09-15T10:08:35.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-03-08T17:15:09.000Z (about 1 year ago)
Last Synced: 2024-10-18T21:59:25.087Z (6 months ago)
Topics: golang, inference, llama, llama2
Language: Go
Homepage:
Size: 165 KB
Stars: 38
Watchers: 2
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # llama2.go

Port of Facebook's LLaMA model in pure go and use little memory.

## memory usage

| Model | Precision | Memory | Memory(Cached Params) |

| ----- | --------- | ------ | --------------------- |

| 7B  | bf16 | 600M+ | 17G+ |

| 13B | bf16 | 1G+ | 32G+ |

| 70B | bf16 | 3G+ | untest |

## usage

1. download model from [huggingface](https://huggingface.co/lwch/llama2.go)

2. build this project

    ```shell

    go build

    ```

3. text completion

    ```shell

    cat << EOF | ./llama2 text-completion -m 7B.model [--cache]

    2023/10/20 17:02:36 [INFO]model loaded

    2023/10/20 17:02:36 [INFO]model info:

    2023/10/20 17:02:36 [INFO]  + embedding dim: 4096

    2023/10/20 17:02:36 [INFO]  + layers: 32

    2023/10/20 17:02:36 [INFO]  + heads: 32

    2023/10/20 17:02:36 [INFO]  + kv_heads: 32

    2023/10/20 17:02:36 [INFO]  + q_embedding each head: 128

    2023/10/20 17:02:36 [INFO]  + kv_embedding each head: 128

    2023/10/20 17:02:36 [INFO]  + norm_eps: 1.000000e-05

    2023/10/20 17:02:36 [INFO]warm up model...

    2023/10/20 17:02:43 [INFO]warm up done

    cost: 805.096668ms, prompt: [], inference: [ The]

    cost: 794.034869ms, prompt: [Tra], inference: [vel]

    cost: 791.610401ms, prompt: [ns], inference: [it]

    cost: 785.028969ms, prompt: [late], inference: [.]

    cost: 797.443303ms, prompt: [ Eng], inference: [

    ]

    cost: 789.511594ms, prompt: [lish], inference: [ to]

    cost: 795.109561ms, prompt: [ to], inference: [ H]

    cost: 786.051182ms, prompt: [ Fre], inference: [ch]

    cost: 785.656581ms, prompt: [nc], inference: [sh]

    cost: 802.650179ms, prompt: [h], inference: [

    ]

    cost: 789.006086ms, prompt: [:], inference: [

    ]

    cost: 785.612863ms, prompt: [

    ], inference: [What]

    cost: 790.960297ms, prompt: [

    ], inference: [

    ]

    cost: 788.613766ms, prompt: [se], inference: [ek]

    cost: 793.616298ms, prompt: [a], inference: [

    ]

    cost: 788.519639ms, prompt: [ ott], inference: [om]

    cost: 786.268165ms, prompt: [er], inference: [

    ]

    cost: 799.88383ms, prompt: [ =>], inference: [

    ]

    cost: 790.308092ms, prompt: [ lo], inference: [b]

    cost: 789.359057ms, prompt: [ut], inference: [ier]

    cost: 792.335992ms, prompt: [re], inference: [

    ]

    cost: 799.485639ms, prompt: [ de], inference: [ la]

    cost: 790.988376ms, prompt: [ mer], inference: [

    ]

    cost: 792.840929ms, prompt: [

    ], inference: [

    ]

    cost: 789.421174ms, prompt: [pe], inference: [as]

    cost: 796.805358ms, prompt: [pper], inference: [ =>]

    cost: 797.496379ms, prompt: [min], inference: [ =>]

    cost: 799.876148ms, prompt: [t], inference: [ =>]

    cost: 795.272906ms, prompt: [ =>], inference: [ po]

    cost: 800.170421ms, prompt: [ ment], inference: [he]

    cost: 793.94568ms, prompt: [he], inference: [ po]

    cost: 798.776807ms, prompt: [ poi], inference: [il]

    cost: 792.626643ms, prompt: [vr], inference: [ée]

    cost: 793.800323ms, prompt: [ée], inference: [

    ]

    cost: 802.375475ms, prompt: [

    ], inference: [

    ]

    cost: 792.565524ms, prompt: [pl], inference: [aster]

    cost: 802.16297ms, prompt: [ush], inference: [ =>]

    cost: 802.694426ms, prompt: [ gir], inference: [ll]

    cost: 792.011098ms, prompt: [af], inference: [ =>]

    cost: 794.957684ms, prompt: [e], inference: [ =>]

    cost: 793.161168ms, prompt: [ =>], inference: [ gir]

    cost: 808.839479ms, prompt: [ gir], inference: [raf]

    cost: 793.328555ms, prompt: [af], inference: [ou]

    cost: 798.369926ms, prompt: [e], inference: [ dou]

    cost: 799.757471ms, prompt: [ pel], inference: [uche]

    cost: 791.91408ms, prompt: [uche], inference: [

    ]

    cost: 795.313464ms, prompt: [

    ], inference: [

    ]

    cost: 802.494011ms, prompt: [che], inference: [ese]

    cost: 801.804049ms, prompt: [ese], inference: [ c]

    cost: 799.417178ms, prompt: [ =>], inference: [ from]

    cost: 794.279909ms, inference: [age]

    cost: 794.393727ms, inference: [

    ]

    cost: 800.798864ms, inference: [

    ]

    cost: 803.566172ms, inference: [Tra]

    cost: 805.789997ms, inference: [ans]

    cost: 798.737076ms, inference: [late]

    cost: 801.030808ms, inference: [ Fre]

    cost: 801.655272ms, inference: [nc]

    cost: 798.778104ms, inference: [ch]

    cost: 807.62774ms, inference: [ to]

    cost: 800.958739ms, inference: [ English]

    cost: 798.907121ms, inference: [:]

    cost: 796.359556ms, inference: [

    ]

    cost: 807.457454ms, inference: [

    ]

    cost: 810.997564ms, inference: [lo]

    cost: 803.872655ms, inference: [ut]

    =====================================

    Translate English to French:

    sea otter => loutre de mer

    peppermint => menthe poivrée

    plush girafe => girafe peluche

    cheese => fromage

    Traanslate Frencch to English:

    lout

    2023/10/20 17:03:35 [INFO]total cost: 52.582370596s, 1.27token/s

    ```

4. chat

    ```shell

    ./llama2 chat -m 7B.chat.model [--cache]

    2023/10/17 10:13:50 [INFO]model loaded

    2023/10/17 10:13:50 [INFO]model info:

    2023/10/17 10:13:50 [INFO]  + embedding dim: 4096

    2023/10/17 10:13:50 [INFO]  + layers: 32

    2023/10/17 10:13:50 [INFO]  + heads: 32

    2023/10/17 10:13:50 [INFO]  + kv_heads: 32

    2023/10/17 10:13:50 [INFO]  + q_embedding each head: 128

    2023/10/17 10:13:50 [INFO]  + kv_embedding each head: 128

    2023/10/17 10:13:50 [INFO]  + norm_eps: 1.000000e-06

    2023/10/17 10:13:50 [INFO]warm up model...

    2023/10/17 10:14:10 [INFO]warm up done

    Enter system prompt (optional):

    Enter user prompt: Whats's your name?

    thinking.................

    Hello! My name is LLaMA, I'm a large language model trained by a team of researcher at Meta AI. ð

    Enter user prompt: Where are you from?

    thinking................

    I'm just an AI, I don't have a physical body or a specific location where I "come from."" I was created by a group of researcher at Meta AI and my primary function is to assist and converse with users like you through the internet. I'm excited to be here and help you with any questions or topics you'd like to discuss!

    ```

Note: if you have enough memory you can run with `--cache` param to persist params in memory

## convert model by yourself

[request](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and download llama2 model, the model directory like this.

    .

    ├── llama-2-7b

    │   ├── consolidated.00.pth

    │   └── params.json

    └── tokenizer.model

using command below to convert llama model to llama2 model fot this project.

NOTE: make sure you have enough hard disk space.

```shell

./llama2 convert --output 7B.model ~/llama/llama-2-7b

```

NOTE: support quantization in the feature

## cluster computing

cluster computing will implement in the feature

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lwch/llama2.go

Awesome Lists containing this project

README