https://github.com/cnadler86/mp_esp_dl_models

Micropython binding for the ESP32 DL AI vision models like face detection / recognition, imagenet classifier or pedestrian (human) detection
https://github.com/cnadler86/mp_esp_dl_models

ai esp32 face-detection face-recognition imagenet-classifier micropython pedestrian-detection

Last synced: 4 months ago
JSON representation

Micropython binding for the ESP32 DL AI vision models like face detection / recognition, imagenet classifier or pedestrian (human) detection

Host: GitHub
URL: https://github.com/cnadler86/mp_esp_dl_models
Owner: cnadler86
License: mit
Created: 2025-03-07T13:42:42.000Z (8 months ago)
Default Branch: master
Last Pushed: 2025-04-18T07:14:38.000Z (7 months ago)
Last Synced: 2025-04-18T21:17:20.942Z (7 months ago)
Topics: ai, esp32, face-detection, face-recognition, imagenet-classifier, micropython, pedestrian-detection
Language: C++
Homepage:
Size: 75.2 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-micropython - mp_esp_dl_models - MicroPython binding for the ESP DL vision models like face detection. (Libraries / AI)
trackawesomelist - mp\_esp\_dl\_models (⭐0) - MicroPython binding for the ESP DL vision models like face detection. (Recently Updated / [Mar 25, 2025](/content/2025/03/25/README.md))

README

          # ESP DL MicroPython Binding

This is a MicroPython binding for ESP-DL (Deep Learning) models that enables face detection, face recognition, human detection, and image classification on ESP32 devices.

## Donate

I spent a lot of time and effort to make this. If you find this project useful, please consider donating to support my work.

[![Donate](https://img.shields.io/badge/Donate-PayPal-blue.svg)](https://www.paypal.me/cnadler)

## Available Models

- `FaceDetector`: Detects faces in images and provides bounding boxes and facial features

- `FaceRecognizer`: Recognizes enrolled faces and manages a face database

- `HumanDetector`: Detects people in images and provides bounding boxes

- `ImageNet`: Classifies images into predefined categories

## Installation & Building

### Precompiled Images

You can find precompiled images in two ways:

1. In the Actions section for passed workflows under artifacts

2. By forking the repo and manually starting the action

### Building from Source

1. Clone the required repositories:

```sh

git clone https://github.com/cnadler86/mp_esp_dl_models.git

git clone https://github.com/cnadler86/micropython-camera-API.git

git clone https://github.com/cnadler86/mp_jpeg.git

```

2. Build the firmware:

Make sure you have the complete ESP32 build environment for MicroPython available.

```sh

cd boards/

idf.py -D MICROPY_DIR= -D MICROPY_BOARD= -D MICROPY_BOARD_VARIANT= -B build- build

cd build-

python ~/micropython/ports/esp32/makeimg.py sdkconfig bootloader/bootloader.bin partition_table/partition-table.bin micropython.bin firmware.bin micropython.uf2

```

## Module Usage

### Common Requirements

All models require input images in RGB888 format. You can use [mp_jpeg](https://github.com/cnadler86/mp_jpeg/) to decode camera images to the correct format.

### FaceDetector

The FaceDetector module detects faces in images and can optionally provide facial feature points.

#### Constructor

```python

FaceDetector(width=320, height=240, features=True)

```

**Parameters:**

- `width` (int, optional): Input image width. Default: 320

- `height` (int, optional): Input image height. Default: 240

- `features` (bool, optional): Whether to return facial feature points. Default: True

#### Methods

- **run(framebuffer)**

  

  Detects faces in the provided image.

  **Parameters:**

  - `framebuffer`: RGB888 image data (required)

  **Returns:**

  List of dictionaries with detection results, each containing:

  - `score`: Detection confidence (float)

  - `box`: Bounding box coordinates [x1, y1, x2, y2]

  - `features`: Facial feature points [(x,y) coordinates for: left eye, right eye, nose, left mouth, right mouth] if enabled, None otherwise

### FaceRecognizer

The FaceRecognizer module manages a database of faces and can recognize previously enrolled faces.

#### Constructor

```python

FaceRecognizer(width=320, height=240, db_path="face.db")

```

**Parameters:**

- `width` (int, optional): Input image width. Default: 320

- `height` (int, optional): Input image height. Default: 240

- `db_path` (str, optional): Path to the face database file. Default: "face.db"

#### Methods

- **run(framebuffer)**

  

  Detects and recognizes faces in the provided image.

  **Parameters:**

  - `framebuffer`: RGB888 image data (required)

  **Returns:**

  List of dictionaries with recognition results, each containing:

  - `score`: Detection confidence

  - `box`: Bounding box coordinates [x1, y1, x2, y2]

  - `features`: Facial feature points (if enabled)

  - `person`: Recognition result containing:

    - `id`: Face ID

    - `similarity`: Match confidence (0-1)

    - `name`: Person name (if provided during enrollment)

- **enroll(framebuffer, validate=False, name=None)**

  

  Enrolls a new face in the database.

  **Parameters:**

  - `framebuffer`: RGB888 image data

  - `validate` (bool, optional): Check if face is already enrolled. Default: False

  - `name` (str, optional): Name to associate with the face. Default: None

  **Returns:**

  - ID of the enrolled face

- **delete_face(id)**

  

  Deletes a face from the database.

  **Parameters:**

  - `id` (int): ID of the face to delete

- **print_database()**

  

  Prints the contents of the face database.

### HumanDetector

The HumanDetector module detects people in images.

#### Constructor

```python

HumanDetector(width=320, height=240)

```

**Parameters:**

- `width` (int, optional): Input image width. Default: 320

- `height` (int, optional): Input image height. Default: 240

#### Methods

- **run(framebuffer)**

  

  Detects people in the provided image.

  **Parameters:**

  - `framebuffer`: RGB888 image data

  **Returns:**

  List of dictionaries with detection results, each containing:

  - `score`: Detection confidence

  - `box`: Bounding box coordinates [x1, y1, x2, y2]

### ImageNet

The ImageNet module classifies images into predefined categories.

#### Constructor

```python

ImageNet(width=320, height=240)

```

**Parameters:**

- `width` (int, optional): Input image width. Default: 320

- `height` (int, optional): Input image height. Default: 240

#### Methods

- **run(framebuffer)**

  

  Classifies the provided image.

  **Parameters:**

  - `framebuffer`: RGB888 image data

  **Returns:**

  List alternating between class names and confidence scores:

  `[class1, score1, class2, score2, ...]`

## Usage Examples

### Face Detection Example

```python

from espdl import FaceDetector

import camera

from jpeg import Decoder

# Initialize components

cam = camera.Camera()

decoder = Decoder()

face_detector = FaceDetector()

# Capture and process image

img = cam.capture()

framebuffer = decoder.decode(img)  # Convert to RGB888

results = face_detector.run(framebuffer)

if results:

    for face in results:

        print(f"Face detected with confidence: {face['score']}")

        print(f"Bounding box: {face['box']}")

        if face['features']:

            print(f"Facial features: {face['features']}")

```

### Face Recognition Example

```python

from espdl import FaceRecognizer

import camera

from jpeg import Decoder

# Initialize components

cam = camera.Camera()

decoder = Decoder()

recognizer = FaceRecognizer(db_path="/faces.db")

# Enroll a face

img = cam.capture()

framebuffer = decoder.decode(img)

face_id = recognizer.enroll(framebuffer, name="John")

print(f"Enrolled face with ID: {face_id}")

# Later, recognize faces

img = cam.capture()

framebuffer = decoder.decode(img)

results = recognizer.run(framebuffer)

if results:

    for face in results:

        if face['person']:

            print(f"Recognized {face['person']['name']} (ID: {face['person']['id']})")

            print(f"Similarity: {face['person']['similarity']}")

```

## Benchmark results

The following table shows the frames per second (fps) for different image sizes and models. The results are based on a test with a 2MP camera and a ESP32S3.

| Frame Size  | FaceDetector | HumanDetector |

|-------------|--------------|---------------|

| QQVGA       | 14.5         | 6.6           |

| R128x128    | 21           | 6.6           |

| QCIF        | 19.7         | 6.5           |

| HQVGA       | 18           | 6.3           |

| R240X240    | 16.7         | 6.1           |

| QVGA        | 15.2         | 6.6           |

| CIF         | 13           | 5.5           |

| HVGA        | 11.9         | 5.3           |

| VGA         | 8.2          | 4.4           |

| SVGA        | 6.2          | 3.8           |

| XGA         | 4.1          | 2.8           |

| HD          | 3.6          | 2.6           |

## Notes & Best Practices

1. **Image Format**: Always ensure input images are in RGB888 format. Use mp_jpeg for JPEG decoding from camera.

2. **Memory Management**: 

   - Close/delete detector objects when no longer needed

   - Consider memory constraints when choosing image dimensions

3. **Face Recognition**:

   - Enroll faces in good lighting conditions

   - Multiple enrollments of the same person can improve recognition

   - Use `validate=True` during enrollment to avoid duplicates

4. **Storage**:

   - Face database is persistent across reboots

   - Consider backing up the face database file

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cnadler86/mp_esp_dl_models

Awesome Lists containing this project

README