https://github.com/preste-ai/ai_whiteboard

Transform any wall to an intelligent whiteboard
https://github.com/preste-ai/ai_whiteboard

fingertips-detector hand-detector jetson-devices jetson-xavier-nx raspberry-pi-camera tensorrt-engine whiteboard

Last synced: 4 months ago
JSON representation

Transform any wall to an intelligent whiteboard

Host: GitHub
URL: https://github.com/preste-ai/ai_whiteboard
Owner: preste-ai
License: mit
Created: 2021-01-27T13:02:56.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-02-11T14:18:07.000Z (over 4 years ago)
Last Synced: 2025-02-01T12:44:08.969Z (4 months ago)
Topics: fingertips-detector, hand-detector, jetson-devices, jetson-xavier-nx, raspberry-pi-camera, tensorrt-engine, whiteboard
Language: Python
Homepage:
Size: 6.08 MB
Stars: 16
Watchers: 2
Forks: 3
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # AI_whiteboard

![](images/ai_whiteboard.gif)

## Idea

The idea of this project is to transform any wall or surface into an interactive whiteboard just with an ordinary RGB camera and your hand.

I hope you'll find it interesting !

## Hardware

- Jetson Xavier NX **JetPack 4.4**

- Raspberry Pi Camera + ArduCam (8MP IMX219 Sensor Module)

**Note:** The system works also on Jetson Nano, TX2

## Details

To use AI whiteboard correctly you need to find a wall or flat surface and place a camera at a distance of about 1 meter. It can be any wall/surface but the system works more accurately with the dark or light monotonous walls/surfaces.

We capture an image from a camera. Then we crop this image into a square. Next, we use **a hand detector[1]  (YOLO[3] - deep neural network)**,to find a hand in the image. If there is a hand in the image, we crop that hand out of the image and feed it to **a Fingertip detector[1]  (modified VGG16 - deep neural network)**. Next, if we can detect fingertips, we use their coordinates to control the whiteboard (See the control section below). 

## Launch AI Whiteboard

##### 1. Set up your NVIDIA Jetson Device or use your PC.

- [Jetson Xavier NX](https://developer.nvidia.com/embedded/learn/get-started-jetson-xavier-nx-devkit) with [JetPack 4.4](https://developer.nvidia.com/jetpack-sdk-44-archive) (CUDA 10.2, TensorRT 7.1.3, cuDNN 8.0)

- [Install Tensorflow 1.15.3](https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html)  

##### 2. Download AI Whiteboard project. `$ git clone https://github.com/preste-ai/camera_ai_whiteboard.git `

 

##### 3. Install packages

You can download needed packages via pip using the `requirements.txt` file:

```python

  pip3 install -r requirements.txt

```

##### 4. Download [weights or TensorRT engines](https://drive.google.com/drive/folders/1eDBqbZfoY7XJ3fYv8FEMJ5AZe_3n0sjU?usp=sharing) and put it to `weights` or `weights/engines`.

**Note:** The current TensorRT engines work correctly **only** on Jetson Xavier NX devices as TensorRT runs device-specific profiling during the optimization phase.If you want to use this models(engines) on others Jetson devices please convert .h5 model with `h5_to_trt.py` script on your platform. 

##### 5. Set up the power mode (ID=2, 15W 6 cores) `$ sudo /usr/sbin/nvpmodel -m 2`

##### 6. Launch device's fan `sudo jetson_clocks --fan` 

##### 7. Run AI whiteboard script. 

Check `config.py` file and set up needed parameters.

- whiteboard_w : 200 - whiteboard width (px) (displayed on camera caputed image)

- whiteboard_h : 200 - whiteboard height (px) (displayed on camera caputed image)

- cam_w       : 320 - width (px) of a captured image 

- cam_h       : 240 - height (px) of a captured image

- framerate   : 60 - camera capture framerate (for Raspberry Pi Camera)

- zoom_koef   : 2 - zoom coefficient to resize whiteboard_w and whiteboard_h

- confidence_ft_threshold : 0.5 - confidence threshold of Fingertips detector

- confidence_hd_threshold : 0.8 - confidence threshold of Hand detector      

---

Run from a project root directory:

**Jetson Devices**

```python 

  python3 ai_whiteboard.py --rpc --trt 

```

- rpc : If you want to use a Raspberry Pi Camera. Default: False

- trt : If you want to use TensorRT engines. Default: False

**Laptop**

```python

  python3 ai_whiteboard.py 

```

###### Control gesture (combination)

| To draw | To move | To erase | To clean | To save | 

|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|

|![](images/to_paint.jpg)|![](images/to_move.jpg)|![](images/to_erase.jpg)|![](images/to_clean.jpg)|![](images/to_save.jpg)|

 

## Training Hand-detector

#### Custom Dataset

A [custom dataset](https://drive.google.com/drive/folders/1rFHtl6A4EKokuOQk-9vqvWV0WiKyRfco?usp=sharing) was collected and labeled (12,000 images) for training. For labeling I used [CVAT](https://github.com/openvinotoolkit/cvat).

- Train: 9,500 images

- Validation: 1000 images

- Test : 1500 images

|  |  |  |  |  | 

|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|

|![](images/1.jpg)|![](images/2.jpg)|![](images/3.jpg)|![](images/4.jpg)|![](images/5.jpg)|

|![](images/6.jpg)|![](images/7.jpg)|![](images/8.jpg)|![](images/9.jpg)|![](images/10.jpg)|

 

Run from a project root directory:

```python

 python3 yolo_train.py

```

## Testing Hand-detector

Run from a project root directory:

```python

 python3 yolo_test.py

```

## Convert .h5 model to TensorRT engine [2]

The transformation takes place in 3 stages:

1. Freeze graph and remove training nodes (.h5 -> .pb)

2. Convert frozen graph to onnx (.pb -> .onnx)

3. Convert onnx model to TensorRT engine (.onnx -> .engine)

Run from a project root directory:

```python

  python3 h5_to_trt.py --folder weights --weights_file yolo --fp 16

```

- folder weights : path to the folder with model

- weights_file : weights file name (**without .h5**)

- fp : TensorRT engine precision (16 or 32)

Metrics for **Hand detection** after model conversion. 

In order to determine the correctness of the detection, we use the value of [IOU](https://medium.com/towards-artificial-intelligence/understanding-iou-metric-in-object-detection-1e5532f06a76). If the value of IOU is more than 0.5 then the detector predicts a hand correctly otherwise - no. The results are given below. 

|  | keras model before training | keras model after training | TensorRT engine (fp32) | TensorRT engine (fp16) | 

|:---------------:|:---------------:|:---------------:|:---------------:|:---------------:|

| Accuracy    | 72.68 % | 89.14 % | 89.14 % | 89.07 % |

| Precision   | 84.80 % | 99.45 % | 99.45 % | 99.45 % |

| Recall      | 50.78 % | 77.24 % | 77.24 % | 77.10 % |

## Solution performance (Hand detector + Fingertips detector)

Captured image shape : 320x240

Jetson Xavier NX: power mode ID 2: 15W 6 cores

|  | keras model | TensorRT engine (fp32) | TensorRT engine (fp16) | 

|:---------------:|:---------------:|:---------------:|:---------------:|

| Average FPS | 12 | 33 | 60 |

## References

1. Unified Gesture and Fingertip Detection : https://github.com/MahmudulAlam/Unified-Gesture-and-Fingertip-Detection

2. TensorRT guide: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#python_topics

3. YOLO9000: Better, Faster, Stronger : https://arxiv.org/abs/1612.08242

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/preste-ai/ai_whiteboard

Awesome Lists containing this project

README