Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/coreylowman/llama-dfdx
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
https://github.com/coreylowman/llama-dfdx
cuda deep-learning inference language-model llama neural-network rust rust-lang
Last synced: 2 months ago
JSON representation
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
- Host: GitHub
- URL: https://github.com/coreylowman/llama-dfdx
- Owner: coreylowman
- License: mit
- Created: 2023-04-28T15:01:29.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-27T13:46:57.000Z (over 1 year ago)
- Last Synced: 2024-10-10T06:32:35.293Z (3 months ago)
- Topics: cuda, deep-learning, inference, language-model, llama, neural-network, rust, rust-lang
- Language: Rust
- Homepage:
- Size: 396 KB
- Stars: 100
- Watchers: 3
- Forks: 5
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
- awesome-llm-and-aigc - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Summary)
- awesome-llm-and-aigc - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Summary)
- awesome-cuda-triton-hpc - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Frameworks)
- awesome-cuda-triton-hpc - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Frameworks)
- awesome-rust-list - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Machine Learning)
- awesome-rust-list - coreylowman/llama-dfdx - dfdx?style=social"/> : [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) with CUDA acceleration implemented in rust. Minimal GPU memory needed! (Machine Learning)
README
# LLaMa 7b in rust
[![](https://dcbadge.vercel.app/api/server/AtUhGqBDP5)](https://discord.gg/AtUhGqBDP5)
This repo contains the popular [LLaMa 7b](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
language model, fully implemented in the rust programming language!Uses [dfdx](https://github.com/coreylowman/dfdx) tensors and CUDA acceleration.
**This runs LLaMa directly in f16, meaning there is no hardware acceleration on CPU.** Using CUDA is heavily recommended.
Here is the 7b model running on an A10 GPU:
![](llama-7b-a10.gif)
# How To Run
## (Once) Setting up model weights
### Download model weights
1. Install git lfs. On ubuntu you can run `sudo apt install git-lfs`
2. Activate git lfs with `git lfs install`.
3. Run the following commands to download the model weights in pytorch format (~25 GB):
1. LLaMa 7b (~25 GB): `git clone https://huggingface.co/decapoda-research/llama-7b-hf`
2. LLaMa 13b (~75 GB): `git clone https://huggingface.co/decapoda-research/llama-13b-hf`
3. LLaMa 65b (~244 GB): `git clone https://huggingface.co/decapoda-research/llama-65b-hf`### Convert the model
1. (Optional) Run `python3.x -m venv ` to create a python virtual environment, where `x` is your prefered python version
2. (Optional, requires 1.) Run `source \bin\activate` (or `\Scripts\activate` if on Windows) to activate the environment
3. Run `pip install numpy torch`
4. Run `python convert.py` to convert the model weights to rust understandable format:
a. LLaMa 7b: `python convert.py`
b. LLaMa 13b: `python convert.py llama-13b-hf`
c. LLaMa 65b: `python convert.py llama-65b-hf`## (Once) Compile
You can compile with normal rust commands:
With cuda:
```bash
cargo build --release -F cuda
```Without cuda:
```bash
cargo build --release
```## Run the executable
With default args:
```bash
./target/release/llama-dfdx --model generate ""
./target/release/llama-dfdx --model chat
./target/release/llama-dfdx --model file
```To see what commands/custom args you can use:
```bash
./target/release/llama-dfdx --help
```