https://github.com/adamydwang/mobilellama
a lightweight C++ LLaMA inference engine for mobile devices
https://github.com/adamydwang/mobilellama
cpp inference llama llm openblas
Last synced: 2 months ago
JSON representation
a lightweight C++ LLaMA inference engine for mobile devices
- Host: GitHub
- URL: https://github.com/adamydwang/mobilellama
- Owner: adamydwang
- License: mit
- Created: 2023-10-18T16:50:53.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-28T14:56:56.000Z (over 2 years ago)
- Last Synced: 2025-10-28T22:42:11.149Z (5 months ago)
- Topics: cpp, inference, llama, llm, openblas
- Language: C++
- Homepage:
- Size: 40 KB
- Stars: 15
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# MobileLLaMA
## A Lightweight C++ Implementation of LLaMA for Mobile Devices
# 1. Milestones
- [ ] Naive version: pure c++ implementation of Matrix manipulation
- [x] C++ Inference Engine
- [ ] Model Transform tool: PyTorch model to MobileLLaMA model
- [ ] OpenBLAS version: speedup matrix manipulation by OpenBLAS
# 2. Build and Run
## How to build?
```
$ git clone --recurse-submodules https://github.com/adamydwang/mobilellama.git
$ cd mobilellama
$ cd deps && bash build.sh #build thirdparty dependencies
$ mkdir ../build && cd ../build && cmake .. && make
```
***Reminds:*** executables and libraries are output to directories: *lib* and *bin*
## How to run demo
***!!!caution: model transformer tool has not been ready, so can not run yet***
```
$ cd bin
$ ./demo ${model_path} ${tokenizer_path}
```