Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eesunmoon/on-device_multimodal_er
[Research] Multimodal Emotion Recognition for On-device AI
https://github.com/eesunmoon/on-device_multimodal_er
artificial-intelligence data-analysis deep-learning embedded-systems emotion-recognition heart-rate-analysis multimodal-fusion npu on-device python speech-processing speech-recognition tensorflow wearable-devices
Last synced: about 17 hours ago
JSON representation
[Research] Multimodal Emotion Recognition for On-device AI
- Host: GitHub
- URL: https://github.com/eesunmoon/on-device_multimodal_er
- Owner: EesunMoon
- Created: 2024-10-14T19:44:26.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-12-13T00:14:25.000Z (10 days ago)
- Last Synced: 2024-12-22T05:14:22.313Z (about 17 hours ago)
- Topics: artificial-intelligence, data-analysis, deep-learning, embedded-systems, emotion-recognition, heart-rate-analysis, multimodal-fusion, npu, on-device, python, speech-processing, speech-recognition, tensorflow, wearable-devices
- Language: Python
- Homepage:
- Size: 56.6 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# On-device Multimodal Emotion Recognition on Neural Processing Unit (NPU)
**Optimizing AI for low latency and power consumption in real-time applications.**---
## π Project Summary
As part of a government agency project, I led the development of an **on-device multimodal emotion recognition system** on NPUs (Neural Processing Units). The project focused on optimizing real-time AI applications for **high emotion classification accuracy**, **low latency**, and **power efficiency**, addressing constraints typical of edge systems, such as limited model size and computational resources.
### Objectives
1. **Enhancing emotion recognition performance** by leveraging multimodal data sources, including:
- Heart rate (HR)
- EEG
- Speech
- Images2. **Implementing a scalable real-time system** by embedding models on NPUs to reduce latency and power consumption.
---
## π Project Workflow
### Overall Architecture
![figure1](https://github.com/user-attachments/assets/e22babde-a2ad-42d1-bf5e-509ebed0e3f7)### Detailed Structures of Emotion Recognition Models
![figure2](https://github.com/user-attachments/assets/ed881ac7-39db-447f-a180-429580abd3cd)### 1. Model Design and Optimization
- **Simplified Architectures**: Developed deep learning models using architectures like CNNs and dense layers to balance performance and complexity.
- **Hyperparameter Tuning**: Conducted ablation studies to fine-tune parameters such as optimizer type, number of epochs, batch size, and loss functions.
- **Multimodal Fusion**: Adopted a **score-based fusion method** to combine outputs from multiple models at the decision level, avoiding additional neural network complexity.### 2. NPU Deployment
- Converted models into **ONNX format** and compiled them using the **MXQ compiler** for compatibility with Mobilintβs NPU chips.
- Applied **quantization techniques** (Max, Percentile, and Max-Percentile) to compress models, optimizing based on an efficiency metric combining:
- Matrix: Accuracy-increase ratio x Compression ratio## π Optimization Methods
### Multimodal Fusion and Simplified Models
- Built individual models for HR, EEG, speech, and image data.
- Focused on reducing model parameters while maintaining relative performance using simple architectures like CNN and dense layers.
- Used **score-based fusion** to integrate outputs without additional network complexity.### Quantization Techniques
- Converted models into NPU-compatible formats via **ONNX** and the **MXQ compiler**.
- Applied three quantization methods to determine the best compression:
- **MAX**: Clipping ranges based on minimum and maximum values.
- **Percentile**: Clipping ranges based on top percentile values.
- **Max-Percentile**: Clipping ranges based on the top percentile of maximum values.---
## βοΈ Evaluation Metrics
### Emotion Classification Accuracy
- Achieved an impressive **99.68% accuracy**, ensuring reliable and robust emotion recognition in real-time applications.### Latency
- Compared model size before and after compression.
- Achieved **1.47x reduction** in model size.### Power Consumption
- Measured power usage with an outlet power meter.
- Found **3.12x reduction** in power consumption for NPU-based models compared to GPU-based models.## π Key Findings
- The system achieved significant improvements in **efficiency and scalability**, making it suitable for real-time AI applications.
- Successfully implemented at the **Korean Institute of Science and Technology** as part of a government initiative.
- Findings were presented at an academic conference, and a related paper is currently under review.
- Reinforced my passion for developing **efficient, real-world AI systems**.## π€ Insights on Clipping Range for Quantization
- **MAX**: Activations clipped using minimum and maximum values.
- **Percentile**: Activations clipped using the top percentile of values.
- **Max-Percentile**: Activations clipped using the top percentile of maximum values.---
## π Conclusion
This project demonstrated the viability of deploying **real-time AI systems on edge devices** by optimizing multimodal emotion recognition models for **low latency** and **power efficiency**. It solidified my passion for creating **practical and scalable AI solutions** for real-world applications.