https://github.com/mryndzionek/esp32s3_eye_kws_demo

Keyword spotting on ESP32-S3-EYE
https://github.com/mryndzionek/esp32s3_eye_kws_demo

edgeml esp32-s3 esp32-s3-eye keyword-spotting lightweight machine-learning tinyml wakeword

Last synced: about 2 months ago
JSON representation

Keyword spotting on ESP32-S3-EYE

Host: GitHub
URL: https://github.com/mryndzionek/esp32s3_eye_kws_demo
Owner: mryndzionek
License: mit
Created: 2024-07-31T15:24:01.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-08-24T08:24:18.000Z (about 1 year ago)
Last Synced: 2025-04-01T14:14:41.112Z (6 months ago)
Topics: edgeml, esp32-s3, esp32-s3-eye, keyword-spotting, lightweight, machine-learning, tinyml, wakeword
Language: C
Homepage:
Size: 2.12 MB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# esp32s3_eye_kws_demo

Speech recognition is based on [this](https://github.com/microsoft/EdgeML/blob/master/docs/publications/Sha-RNN.pdf)
architecture and examples from the same repository. The cell type in this model is [FastGRNN](https://github.com/microsoft/EdgeML/blob/master/docs/publications/FastGRNN.pdf).
More detailed view on data flow through the network with specific vector/matrix sizes:

![sharnn](images/sharnn.png)

The inference is run nine times a second. The CPU utilization due to inference is only ~24%.
FastRNN cell is also supported (can be changed via `menuconfig`).

A bigger, LSTM-based model with ~550ms inference time can be found [here](https://github.com/mryndzionek/esp32s3_eye_kws_demo/tree/lstm_model).
It is slightly more accurate, especially to the `up` label.

https://github.com/user-attachments/assets/861b4d5a-1f38-4653-9b4f-e0f713c1e0ba

## Notes

Number of TinyML model conversion frameworks were tested,
but none gave satisfactory results. The main problem seems
to be that the graphs exported from PyTorch (or other
training-oriented NN frameworks) contain much additional
information needed only for training, but information
which obscures the essential structure needed only for inference.
Here is for example a ONNX graph exported directly from PyTorch:

![graph](images/pytorch_graph.png)

and [this](https://github.com/mryndzionek/esp32s3_eye_kws_demo/blob/main/main/fast_grnn.c) is
all the "manually-transpiled" code needed for inference (~170 LoCs of C) ...

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mryndzionek/esp32s3_eye_kws_demo

Awesome Lists containing this project

README