https://github.com/coreylowman/llama-dfdx

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
https://github.com/coreylowman/llama-dfdx

cuda deep-learning inference language-model llama neural-network rust rust-lang

Last synced: 2 months ago
JSON representation

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!

Host: GitHub
URL: https://github.com/coreylowman/llama-dfdx
Owner: coreylowman
License: mit
Created: 2023-04-28T15:01:29.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-07-27T13:46:57.000Z (almost 2 years ago)
Last Synced: 2024-10-18T23:17:02.617Z (8 months ago)
Topics: cuda, deep-learning, inference, language-model, llama, neural-network, rust, rust-lang
Language: Rust
Homepage:
Size: 396 KB
Stars: 100
Watchers: 3
Forks: 5
Open Issues: 5
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

awesome-llm-and-aigc - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Summary)
awesome-llm-and-aigc - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Summary)
awesome-cuda-and-hpc - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Frameworks)
awesome-cuda-and-hpc - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Frameworks)
awesome-rust-list - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Machine Learning)
awesome-rust-list - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Machine Learning)

README

# LLaMa 7b in rust

[![](https://dcbadge.vercel.app/api/server/AtUhGqBDP5)](https://discord.gg/AtUhGqBDP5)

This repo contains the popular [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
language model, fully implemented in the rust programming language!

Uses [dfdx](https://github.com/coreylowman/dfdx) tensors and CUDA acceleration.

**This runs LLaMa directly in f16, meaning there is no hardware acceleration on CPU.** Using CUDA is heavily recommended.

Here is the 7b model running on an A10 GPU:

![](llama-7b-a10.gif)

# How To Run

## (Once) Setting up model weights

### Download model weights
1. Install git lfs. On ubuntu you can run `sudo apt install git-lfs`
2. Activate git lfs with `git lfs install`.
3. Run the following commands to download the model weights in pytorch format (~25 GB):
1. LLaMa 7b (~25 GB): `git clone https://huggingface.co/decapoda-research/llama-7b-hf`
2. LLaMa 13b (~75 GB): `git clone https://huggingface.co/decapoda-research/llama-13b-hf`
3. LLaMa 65b (~244 GB): `git clone https://huggingface.co/decapoda-research/llama-65b-hf`

### Convert the model
1. (Optional) Run `python3.x -m venv ` to create a python virtual environment, where `x` is your prefered python version
2. (Optional, requires 1.) Run `source \bin\activate` (or `\Scripts\activate` if on Windows) to activate the environment
3. Run `pip install numpy torch`
4. Run `python convert.py` to convert the model weights to rust understandable format:
a. LLaMa 7b: `python convert.py`
b. LLaMa 13b: `python convert.py llama-13b-hf`
c. LLaMa 65b: `python convert.py llama-65b-hf`

## (Once) Compile

You can compile with normal rust commands:

With cuda:
```bash
cargo build --release -F cuda
```

Without cuda:
```bash
cargo build --release
```

## Run the executable

With default args:
```bash
./target/release/llama-dfdx --model generate ""
./target/release/llama-dfdx --model chat
./target/release/llama-dfdx --model file
```

To see what commands/custom args you can use:
```bash
./target/release/llama-dfdx --help
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/coreylowman/llama-dfdx

Awesome Lists containing this project

README