{"id":14668874,"url":"https://github.com/DaveBben/esp32-llm","last_synced_at":"2025-09-08T23:31:16.894Z","repository":{"id":256618771,"uuid":"851908816","full_name":"DaveBben/esp32-llm","owner":"DaveBben","description":"Running a LLM on the ESP32","archived":false,"fork":false,"pushed_at":"2024-09-04T03:07:08.000Z","size":99738,"stargazers_count":230,"open_issues_count":1,"forks_count":17,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-03T16:31:19.401Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.youtube.com/watch?v=E6E_KrfyWFQ","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DaveBben.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-03T22:31:37.000Z","updated_at":"2025-01-03T09:09:32.000Z","dependencies_parsed_at":"2024-09-15T23:45:19.631Z","dependency_job_id":null,"html_url":"https://github.com/DaveBben/esp32-llm","commit_stats":null,"previous_names":["davebben/esp32-llm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DaveBben/esp32-llm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DaveBben%2Fesp32-llm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DaveBben%2Fesp32-llm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DaveBben%2Fesp32-llm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DaveBben%2Fesp32-llm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DaveBben","download_url":"https://codeload.github.com/DaveBben/esp32-llm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DaveBben%2Fesp32-llm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274231182,"owners_count":25245675,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-12T02:00:49.886Z","updated_at":"2025-09-08T23:31:16.877Z","avatar_url":"https://github.com/DaveBben.png","language":"C","funding_links":[],"categories":["C","A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"readme":"# Running a LLM on the ESP32\n![LLM on ESP32](/ESP32_LLM.jpg)\n![LLM Output](/llm_output.gif)\n\n## Summary\nI wanted to see if it was possible to run a Large Language Model (LLM) on the ESP32. Surprisingly it is possible, though probably not very useful.\n\nThe \"Large\" Language Model used is actually quite small. It is a 260K parameter [tinyllamas checkpoint](https://huggingface.co/karpathy/tinyllamas/tree/main/stories260K) trained on the [tiny stories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.\n\nThe LLM implementation is done using [llama.2c](https://github.com/karpathy/llama2.c) with minor optimizations to make it run faster on the ESP32.\n\n## Hardware\nLLMs require a great deal of memory. Even this small one still requires 1MB of RAM. I used the [ESP32-S3FH4R2](https://www.mouser.com/ProductDetail/Espressif-Systems/ESP32-S3FH4R2?qs=tlsG%2FOw5FFjPrwkmZSBQNA%3D%3D) because it has 2MB of embedded PSRAM.\n\n## Optimizing Llama2.c for the ESP32\n\nWith the following changes to `llama2.c`, I am able to achieve **19.13 tok/s**:\n\n1. Utilizing both cores of the ESP32 during math heavy operations.\n2. Utilizing some special [dot product functions](https://github.com/espressif/esp-dsp/tree/master/modules/dotprod/float) from the [ESP-DSP library](https://github.com/espressif/esp-dsp) that are designed for the ESP32-S3. These functions utilize some of the [few SIMD instructions](https://bitbanksoftware.blogspot.com/2024/01/surprise-esp32-s3-has-few-simd.html) the ESP32-S3 has.\n3. Maxing out CPU speed to 240 MHz and PSRAM speed to 80MHZ and increasing the instruction cache size.\n\n\n## Setup\nThis requires the [ESP-IDF](https://docs.espressif.com/projects/esp-idf/en/stable/esp32/get-started/index.html#installation) toolchain to be installed\n\n```\nidf.py build\nidf.py -p /dev/{DEVICE_PORT} flash\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDaveBben%2Fesp32-llm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDaveBben%2Fesp32-llm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDaveBben%2Fesp32-llm/lists"}