https://github.com/mryndzionek/esp32s3_eye_kws_demo
Keyword spotting on ESP32-S3-EYE
https://github.com/mryndzionek/esp32s3_eye_kws_demo
edgeml esp32-s3 esp32-s3-eye keyword-spotting lightweight machine-learning tinyml wakeword
Last synced: about 2 months ago
JSON representation
Keyword spotting on ESP32-S3-EYE
- Host: GitHub
- URL: https://github.com/mryndzionek/esp32s3_eye_kws_demo
- Owner: mryndzionek
- License: mit
- Created: 2024-07-31T15:24:01.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-24T08:24:18.000Z (about 1 year ago)
- Last Synced: 2025-04-01T14:14:41.112Z (6 months ago)
- Topics: edgeml, esp32-s3, esp32-s3-eye, keyword-spotting, lightweight, machine-learning, tinyml, wakeword
- Language: C
- Homepage:
- Size: 2.12 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# esp32s3_eye_kws_demo
Speech recognition is based on [this](https://github.com/microsoft/EdgeML/blob/master/docs/publications/Sha-RNN.pdf)
architecture and examples from the same repository. The cell type in this model is [FastGRNN](https://github.com/microsoft/EdgeML/blob/master/docs/publications/FastGRNN.pdf).
More detailed view on data flow through the network with specific vector/matrix sizes:
The inference is run nine times a second. The CPU utilization due to inference is only ~24%.
FastRNN cell is also supported (can be changed via `menuconfig`).A bigger, LSTM-based model with ~550ms inference time can be found [here](https://github.com/mryndzionek/esp32s3_eye_kws_demo/tree/lstm_model).
It is slightly more accurate, especially to the `up` label.https://github.com/user-attachments/assets/861b4d5a-1f38-4653-9b4f-e0f713c1e0ba
## Notes
Number of TinyML model conversion frameworks were tested,
but none gave satisfactory results. The main problem seems
to be that the graphs exported from PyTorch (or other
training-oriented NN frameworks) contain much additional
information needed only for training, but information
which obscures the essential structure needed only for inference.
Here is for example a ONNX graph exported directly from PyTorch:
and [this](https://github.com/mryndzionek/esp32s3_eye_kws_demo/blob/main/main/fast_grnn.c) is
all the "manually-transpiled" code needed for inference (~170 LoCs of C) ...