Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/coreylowman/llama-dfdx

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
https://github.com/coreylowman/llama-dfdx

cuda deep-learning inference language-model llama neural-network rust rust-lang

Last synced: 2 months ago
JSON representation

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!

Awesome Lists containing this project

README

        

# LLaMa 7b in rust

[![](https://dcbadge.vercel.app/api/server/AtUhGqBDP5)](https://discord.gg/AtUhGqBDP5)

This repo contains the popular [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
language model, fully implemented in the rust programming language!

Uses [dfdx](https://github.com/coreylowman/dfdx) tensors and CUDA acceleration.

**This runs LLaMa directly in f16, meaning there is no hardware acceleration on CPU.** Using CUDA is heavily recommended.

Here is the 7b model running on an A10 GPU:

![](llama-7b-a10.gif)

# How To Run

## (Once) Setting up model weights

### Download model weights
1. Install git lfs. On ubuntu you can run `sudo apt install git-lfs`
2. Activate git lfs with `git lfs install`.
3. Run the following commands to download the model weights in pytorch format (~25 GB):
1. LLaMa 7b (~25 GB): `git clone https://huggingface.co/decapoda-research/llama-7b-hf`
2. LLaMa 13b (~75 GB): `git clone https://huggingface.co/decapoda-research/llama-13b-hf`
3. LLaMa 65b (~244 GB): `git clone https://huggingface.co/decapoda-research/llama-65b-hf`

### Convert the model
1. (Optional) Run `python3.x -m venv ` to create a python virtual environment, where `x` is your prefered python version
2. (Optional, requires 1.) Run `source \bin\activate` (or `\Scripts\activate` if on Windows) to activate the environment
3. Run `pip install numpy torch`
4. Run `python convert.py` to convert the model weights to rust understandable format:
a. LLaMa 7b: `python convert.py`
b. LLaMa 13b: `python convert.py llama-13b-hf`
c. LLaMa 65b: `python convert.py llama-65b-hf`

## (Once) Compile

You can compile with normal rust commands:

With cuda:
```bash
cargo build --release -F cuda
```

Without cuda:
```bash
cargo build --release
```

## Run the executable

With default args:
```bash
./target/release/llama-dfdx --model generate ""
./target/release/llama-dfdx --model chat
./target/release/llama-dfdx --model file
```

To see what commands/custom args you can use:
```bash
./target/release/llama-dfdx --help
```