https://github.com/wangwilly/gaze-correction-cam

The Gaze Correction Camera project is an advanced real-time gaze correction system designed to enhance video communication by improving eye contact. Leveraging state-of-the-art computer vision and deep learning techniques, this system dynamically adjusts the user's eye gaze direction during live video calls.
https://github.com/wangwilly/gaze-correction-cam

computer-vision eye-detection eye-tracking gaze-estimation gaze-tracking image-to-image-translation machine-learning macos python tensorflow

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/wangwilly/gaze-correction-cam
Owner: WangWilly
License: bsd-3-clause
Created: 2025-05-11T04:38:54.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2026-03-05T02:12:45.000Z (4 months ago)
Last Synced: 2026-03-05T07:37:24.456Z (4 months ago)
Topics: computer-vision, eye-detection, eye-tracking, gaze-estimation, gaze-tracking, image-to-image-translation, machine-learning, macos, python, tensorflow
Language: Python
Homepage:
Size: 128 KB
Stars: 41
Watchers: 2
Forks: 5
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Gaze Correction Camera

MacOs Application Preview ([🔗 Download](https://drive.google.com/file/d/1E47OZ66YPab1QuTbxN97hL2u3GYwyUbz/view?usp=drive_link))

## Overview

This project implements a gaze correction system for video communication that uses computer vision and deep learning techniques to adjust eye gaze direction in real-time, providing a more natural eye contact experience during video calls. ([study more](./docs/orignal_doc.md))

## Demo

## Prerequisites

Environment:

```text
ProductName: macOS
ProductVersion: 15.2
BuildVersion: 24C101
```

The following dependencies are required to run this application:

- [Python 3.12+](https://www.python.org/downloads/)
- [Poetry](https://python-poetry.org/docs/) for dependency management
- [CMake](https://cmake.org/download/) (required for building dlib)
- [pkg-config](https://www.freedesktop.org/wiki/Software/pkg-config/) (required for certain dependencies)

## Installation

1. Install system dependencies:

```bash
brew install pkg-config
brew install cmake
```

2. Install Python dependencies using Poetry:

```bash
poetry install
```

3. Download pretrained model files:

Download the following files from [GitHub Releases](https://github.com/WangWilly/gaze-correction-cam/releases) and place them in the appropriate directories:
- **Face landmark detector**: `shape_predictor_68_face_landmarks.dat`
- Place in: `lm_feat/shape_predictor_68_face_landmarks.dat`
- **Gaze correction model weights**: FLX model (Left and Right eye models)
- Place in: `weights/warping_model/flx/12/L/` and `weights/warping_model/flx/12/R/`
- Required files per directory: `checkpoint`, `L.data-00000-of-00001` / `R.data-00000-of-00001`, `L.index` / `R.index`, `L.meta` / `R.meta`

- **(Optional) MediaPipe model**: `face_landmarker.task` (for MediaPipe backend)
- Place in: `models/face_landmarker.task`
- Download from [MediaPipe Solutions](https://developers.google.com/mediapipe/solutions/vision/face_landmarker)

## Usage

### Single Window Application (Recommended)

Run the simplified single-window gaze correction application:

```bash
# Using default dlib backend
poetry run python bin_single_window.py

# Using MediaPipe backend (requires face_landmarker.task)
poetry run python bin_single_window.py --backend mediapipe

# Specify camera device
poetry run python bin_single_window.py --camera 0
```

#### Controls

| Key | Action |
| --- | ----------------------------- |
| `g` | Toggle gaze correction on/off |
| `c` | Toggle calibration mode |
| `q` | Quit application |

#### Calibration Mode Controls

When calibration mode is enabled (press `c`):

| Key | Action |
| ---------------------------- | -------------------------------------- |
| Arrow keys (`↑` `↓` `←` `→`) | Adjust camera offset X/Y (±0.5 cm) |
| `+` / `-` | Adjust camera offset Z depth (±0.5 cm) |
| `[` / `]` | Adjust focal length (±10 pixels) |
| `r` | Reset to default values |

The calibration overlay displays:

- Current camera offset (X, Y, Z in cm)
- Estimated eye position (X, Y, Z in cm)
- Current focal length (in pixels)
- Top-view diagram showing camera, screen, and eye positions

## System Requirements

- macOS with camera access permissions
- Sufficient GPU resources for real-time processing
- Webcam or video capture device

## Architecture & Module Documentation

### System Overview

This is a **real-time gaze correction system** that redirects eye gaze in video streams to create natural eye contact during video calls. The system uses face detection, facial landmarks, and deep learning models to warp eye regions.

### File Structure & Module Organization

#### 1. Entry Points (bin\_\*.py)

##### bin*single_window.py ⭐ \_Main Application*

- **Purpose**: Single-window gaze correction app with real-time controls
- **Features**:
- Auto-detects camera resolution
- Toggle gaze correction on/off (`g` key)
- Calibration mode for camera offset adjustment (`c` key)
- Supports multiple backends (dlib/MediaPipe)
- **Flow**: `Camera Input → FacePredictor → GazeCorrector → Display Output`

##### bin_focal_length_calibration.py

- Standalone tool for camera focal length calibration

##### bin_test_mediapipe_detection.py

- Test utility for MediaPipe face detection

#### 2. Core Modules (displayers/)

The `displayers/` directory contains the main business logic components:

##### face_predictor.py - Face Detection & Landmark Extraction

**Purpose**: Abstract interface for face detection backends

**Key Classes**:

- `FacePredictor` (ABC): Interface for face detection
- `DlibFacePredictor`: Implementation using dlib (68 landmarks)
- `MediaPipeFacePredictor`: Implementation using Google MediaPipe
- Data classes: `FaceData`, `EyeData`, `EyeLandmarks`

**Process**: `Input Frame → Face Detection → Landmark Prediction → Eye Extraction → EyeData`

**Output**: `FaceData` containing:

- Left/right eye images (normalized 48×64)
- Anchor maps (feature point maps for spatial guidance)
- Eye center coordinates
- Original positions in frame

##### gaze_corrector.py - Gaze Correction Model

**Purpose**: Wraps TensorFlow models for eye gaze correction

**Key Classes**:

- `GazeModel`: TensorFlow model wrapper (loads L/R eye models)
- `GazeCorrector`: High-level interface for gaze correction
- `CameraConfig`: Camera geometry (focal length, IPD, camera offset)

**Process**:

```
EyeData + Camera Geometry → TF Model Inference → Warped Eye Image
↓
Angle Calculation (3D geometry)
```

**Components**:

1. **Model Loading**: Loads separate L/R eye TensorFlow models from `weights/`
2. **Angle Calculation**: Computes gaze redirection angle based on:
- Eye position in 3D space
- Camera position relative to screen
- Target gaze direction (toward camera)
3. **Eye Warping**: Applies learned transformation to redirect gaze

**Camera Geometry**:

- `focal_length`: Camera focal length (pixels)
- `ipd`: Inter-pupillary distance (cm)
- `camera_offset`: Camera position (X, Y, Z) relative to screen center

##### dis_single_window.py - Application Orchestrator

**Purpose**: Main application logic coordinating all components

**Key Class**: `SingleWindowGazeCorrector`

**Responsibilities**:

1. Camera capture and frame processing
2. FacePredictor → GazeCorrector pipeline
3. Real-time toggle controls
4. Calibration mode UI
5. Composite frame rendering

**Pipeline**:

```
Camera Frame
↓
Resize for Face Detection (320×240)
↓
FacePredictor.list_eye_data()
↓
For each eye:
- If gaze_enabled: GazeCorrector.correct_eye()
- Else: Use original eye image
↓
Composite corrected eyes onto original frame
↓
Draw status overlay
↓
Display in window
```

#### 3. TensorFlow Models (tf_models/)

##### flx.py - FLX Model Architecture

**Purpose**: Defines the neural network architecture for gaze correction

**Key Components**:

- `encoder()`: Encodes gaze angle into spatial feature map
- `trans_module()`: Transformation module with skip connections
- `apply_lcm()`: Light color modulation for realistic rendering
- `inference()`: Main forward pass combining all components

**Architecture**:

```
Eye Image + Anchor Map + Angle
↓
[Feature Extraction CNN]
↓
[Angle Encoder] → Spatial Feature Map
↓
[Transformation Module (Dense CNN)]
↓
[Flow Field Generation]
↓
[Spatial Transformer] → Warped Image
↓
[Light Color Modulation]
↓
Corrected Eye Image
```

##### transformation.py - Spatial Transformer

**Purpose**: Implements differentiable image warping

**Key Functions**:

- `meshgrid()`: Generates coordinate grid
- `interpolate()`: Bilinear interpolation for smooth warping
- `apply_transformation()`: Applies flow field to warp image

**Used for**: Applying learned pixel displacement fields to eye images

##### tf_utils.py

- Common TensorFlow utilities
- CNN/DNN blocks with batch normalization

#### 4. Utilities (utils/)

##### config.py - Configuration Management

**Purpose**: Centralized configuration using argparse

**Parameters**:

- Model dimensions (height=48, width=64, ef_dim=12)
- Camera parameters (focal length, IPD, camera offset)
- Network settings (IP, ports for multi-process mode)

##### logger.py - Logging Utility

**Purpose**: Formatted logging with timestamps and thread IDs

**Format**: `2026-01-27 10:30:45.123 Python[12345:67890] +[ClassName]: Message`

### Data Flow Pipeline

```
┌─────────────────────────────────────────────────────────────────┐
│ MAIN APPLICATION │
│ (bin_single_window.py) │
└──────────────────────┬──────────────────────────────────────────┘
│
↓
┌─────────────────────────────┐
│ Camera Capture (OpenCV) │
│ Original: 640×480 │
└─────────────┬───────────────┘
│
↓
┌─────────────────────────────┐
│ Resize for Detection │
│ Downscaled: 320×240 │
└─────────────┬───────────────┘
│
↓
┌──────────────────────────────────────────────────────────────────┐
│ FACE DETECTION LAYER │
│ (displayers/face_predictor.py) │
├──────────────────────────────────────────────────────────────────┤
│ • Detect face(s) in frame │
│ • Predict 68 facial landmarks (dlib) OR │
│ • Predict 478 landmarks (MediaPipe) │
│ • Extract eye regions (6 points per eye) │
│ • Resize eye images to 48×64 │
│ • Generate anchor maps (landmark feature maps) │
└─────────────┬────────────────────────────────────────────────────┘
│
↓ Output: List[FaceData]
│
┌─────────────────────────────────────────────────────────────────┐
│ FaceData { │
│ left_eye: EyeData { │
│ image: 48×64×3 (normalized) │
│ anchor_map: 48×64×12 (feature points) │
│ center: (x, y) │
│ top_left: (row, col) │
│ } │
│ right_eye: EyeData {...} │
│ } │
└─────────────┬───────────────────────────────────────────────────┘
│
↓
┌──────────────────────────────────────────────────────────────────┐
│ GAZE CORRECTION LAYER │
│ (displayers/gaze_corrector.py) │
├──────────────────────────────────────────────────────────────────┤
│ For each eye: │
│ 1. Calculate 3D eye position from landmarks │
│ 2. Compute gaze redirection angle (toward camera) │
│ 3. Feed to TensorFlow model: │
│ • Eye image (48×64×3) │
│ • Anchor map (48×64×12) │
│ • Gaze angle (θx, θy) │
│ 4. Model outputs warped eye image │
└─────────────┬────────────────────────────────────────────────────┘
│
↓
┌──────────────────────────────────────────────────────────────────┐
│ TensorFlow MODEL │
│ (tf_models/flx.py) │
├──────────────────────────────────────────────────────────────────┤
│ [Encoder] → Angle to spatial feature map │
│ [CNN Feature Extraction] → Image features │
│ [Transformation Module] → Flow field prediction │
│ [Spatial Transformer] → Apply warping │
│ [Light Color Module] → Adjust lighting │
└─────────────┬────────────────────────────────────────────────────┘
│
↓ Corrected Eye Image (48×64×3)
│
┌──────────────────────────────────────────────────────────────────┐
│ COMPOSITE & DISPLAY │
│ (dis_single_window.py) │
├──────────────────────────────────────────────────────────────────┤
│ 1. Resize corrected eyes back to original size │
│ 2. Paste onto original 640×480 frame at eye positions │
│ 3. Draw status overlay (GAZE ON/OFF) │
│ 4. Draw calibration overlay (if enabled) │
│ 5. Display in OpenCV window │
└──────────────────────────────────────────────────────────────────┘
```

### Key Design Patterns

#### 1. Dependency Injection

- `FacePredictor` is injectable → easy to swap backends (dlib ↔ MediaPipe)
- `GazeCorrector` is injectable → testable and modular

#### 2. Abstract Interface

- `FacePredictor` is abstract base class
- Implementations: `DlibFacePredictor`, `MediaPipeFacePredictor`

#### 3. Configuration Objects

- Dataclasses for configuration (immutable, type-safe)
- `DisplayConfig`, `CameraConfig`, `GazeModelConfig`, etc.

#### 4. Separation of Concerns

- Face detection ≠ Gaze correction
- Display logic ≠ Model inference
- Configuration ≠ Business logic

### Module Responsibilities

| Module | Input | Output | Responsibility |
| --------------------- | -------------------------- | ---------------- | --------------------------------- |
| **face_predictor** | Frame (BGR) | `List[FaceData]` | Detect faces, extract eye regions |
| **gaze_corrector** | `FaceData` + Camera Config | Corrected frame | Apply gaze correction model |
| **flx.py** | Eye image + Anchor + Angle | Warped eye | Neural network inference |
| **transformation.py** | Flow field + Image | Warped image | Spatial transformation |
| **dis_single_window** | Camera stream | Display window | Orchestrate pipeline, UI |

### How It Works (High-Level)

1. **Capture** video frame from webcam
2. **Detect** face and extract 68 facial landmarks
3. **Extract** left/right eye regions (48×64 each)
4. **Calculate** 3D eye position and required gaze angle
5. **Inference** through trained CNN to generate warping flow field
6. **Warp** eye image using spatial transformer
7. **Composite** corrected eyes back onto original frame
8. **Display** result in real-time

The key innovation is the **learned warping transformation** that realistically redirects gaze while preserving eye appearance, lighting, and texture.

## References

The implementation is based on research in gaze correction techniques using warping-based convolutional neural networks.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wangwilly/gaze-correction-cam

Awesome Lists containing this project

README