https://github.com/DaveBben/esp32-llm
Running a LLM on the ESP32
https://github.com/DaveBben/esp32-llm
Last synced: 10 months ago
JSON representation
Running a LLM on the ESP32
- Host: GitHub
- URL: https://github.com/DaveBben/esp32-llm
- Owner: DaveBben
- Created: 2024-09-03T22:31:37.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-04T03:07:08.000Z (almost 2 years ago)
- Last Synced: 2025-01-03T16:31:19.401Z (over 1 year ago)
- Language: C
- Homepage: https://www.youtube.com/watch?v=E6E_KrfyWFQ
- Size: 95.1 MB
- Stars: 230
- Watchers: 3
- Forks: 17
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - DaveBben/esp32-llm - S3FH4R2因为它有 2MB 的嵌入式 PSRAM。通过对 llama2.c 进行以下更改,我能够达到 19.13 tok/s:在数学运算中利用 ESP32 的两个内核。利用 ESP-DSP 库中专为 ESP32-S3 设计的一些特殊点积函数,这些功能利用了 ESP32-S3 为数不多的 SIMD 指令。将 CPU 速度提高到 240 MHz,将 PSRAM 速度提高到 80MHZ,并增加指令缓存大小。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
README
# Running a LLM on the ESP32


## Summary
I wanted to see if it was possible to run a Large Language Model (LLM) on the ESP32. Surprisingly it is possible, though probably not very useful.
The "Large" Language Model used is actually quite small. It is a 260K parameter [tinyllamas checkpoint](https://huggingface.co/karpathy/tinyllamas/tree/main/stories260K) trained on the [tiny stories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.
The LLM implementation is done using [llama.2c](https://github.com/karpathy/llama2.c) with minor optimizations to make it run faster on the ESP32.
## Hardware
LLMs require a great deal of memory. Even this small one still requires 1MB of RAM. I used the [ESP32-S3FH4R2](https://www.mouser.com/ProductDetail/Espressif-Systems/ESP32-S3FH4R2?qs=tlsG%2FOw5FFjPrwkmZSBQNA%3D%3D) because it has 2MB of embedded PSRAM.
## Optimizing Llama2.c for the ESP32
With the following changes to `llama2.c`, I am able to achieve **19.13 tok/s**:
1. Utilizing both cores of the ESP32 during math heavy operations.
2. Utilizing some special [dot product functions](https://github.com/espressif/esp-dsp/tree/master/modules/dotprod/float) from the [ESP-DSP library](https://github.com/espressif/esp-dsp) that are designed for the ESP32-S3. These functions utilize some of the [few SIMD instructions](https://bitbanksoftware.blogspot.com/2024/01/surprise-esp32-s3-has-few-simd.html) the ESP32-S3 has.
3. Maxing out CPU speed to 240 MHz and PSRAM speed to 80MHZ and increasing the instruction cache size.
## Setup
This requires the [ESP-IDF](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/get-started/index.html#installation) toolchain to be installed
```
idf.py build
idf.py -p /dev/{DEVICE_PORT} flash
```