https://github.com/DaveBben/esp32-llm

Running a LLM on the ESP32
https://github.com/DaveBben/esp32-llm

Last synced: 10 months ago
JSON representation

Running a LLM on the ESP32

Host: GitHub
URL: https://github.com/DaveBben/esp32-llm
Owner: DaveBben
Created: 2024-09-03T22:31:37.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-09-04T03:07:08.000Z (almost 2 years ago)
Last Synced: 2025-01-03T16:31:19.401Z (over 1 year ago)
Language: C
Homepage: https://www.youtube.com/watch?v=E6E_KrfyWFQ
Size: 95.1 MB
Stars: 230
Watchers: 3
Forks: 17
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

StarryDivineSky - DaveBben/esp32-llm - S3FH4R2因为它有 2MB 的嵌入式 PSRAM。通过对 llama2.c 进行以下更改，我能够达到 19.13 tok/s：在数学运算中利用 ESP32 的两个内核。利用 ESP-DSP 库中专为 ESP32-S3 设计的一些特殊点积函数，这些功能利用了 ESP32-S3 为数不多的 SIMD 指令。将 CPU 速度提高到 240 MHz，将 PSRAM 速度提高到 80MHZ，并增加指令缓存大小。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

README

# Running a LLM on the ESP32
![LLM on ESP32](/ESP32_LLM.jpg)
![LLM Output](/llm_output.gif)

## Summary
I wanted to see if it was possible to run a Large Language Model (LLM) on the ESP32. Surprisingly it is possible, though probably not very useful.

The "Large" Language Model used is actually quite small. It is a 260K parameter [tinyllamas checkpoint](https://huggingface.co/karpathy/tinyllamas/tree/main/stories260K) trained on the [tiny stories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.

The LLM implementation is done using [llama.2c](https://github.com/karpathy/llama2.c) with minor optimizations to make it run faster on the ESP32.

## Hardware
LLMs require a great deal of memory. Even this small one still requires 1MB of RAM. I used the [ESP32-S3FH4R2](https://www.mouser.com/ProductDetail/Espressif-Systems/ESP32-S3FH4R2?qs=tlsG%2FOw5FFjPrwkmZSBQNA%3D%3D) because it has 2MB of embedded PSRAM.

## Optimizing Llama2.c for the ESP32

With the following changes to `llama2.c`, I am able to achieve **19.13 tok/s**:

1. Utilizing both cores of the ESP32 during math heavy operations.
2. Utilizing some special [dot product functions](https://github.com/espressif/esp-dsp/tree/master/modules/dotprod/float) from the [ESP-DSP library](https://github.com/espressif/esp-dsp) that are designed for the ESP32-S3. These functions utilize some of the [few SIMD instructions](https://bitbanksoftware.blogspot.com/2024/01/surprise-esp32-s3-has-few-simd.html) the ESP32-S3 has.
3. Maxing out CPU speed to 240 MHz and PSRAM speed to 80MHZ and increasing the instruction cache size.

## Setup
This requires the [ESP-IDF](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/get-started/index.html#installation) toolchain to be installed

```
idf.py build
idf.py -p /dev/{DEVICE_PORT} flash
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/DaveBben/esp32-llm

Awesome Lists containing this project

README