{"id":28288972,"url":"https://github.com/wangwilly/gaze-correction-cam","last_synced_at":"2026-03-12T07:07:30.599Z","repository":{"id":292618968,"uuid":"981427329","full_name":"WangWilly/gaze-correction-cam","owner":"WangWilly","description":"The Gaze Correction Camera project is an advanced real-time gaze correction system designed to enhance video communication by improving eye contact. Leveraging state-of-the-art computer vision and deep learning techniques, this system dynamically adjusts the user's eye gaze direction during live video calls.","archived":false,"fork":false,"pushed_at":"2026-03-05T02:12:45.000Z","size":131,"stargazers_count":41,"open_issues_count":5,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2026-03-05T07:37:24.456Z","etag":null,"topics":["computer-vision","eye-detection","eye-tracking","gaze-estimation","gaze-tracking","image-to-image-translation","machine-learning","macos","python","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WangWilly.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-11T04:38:54.000Z","updated_at":"2026-03-05T02:06:26.000Z","dependencies_parsed_at":"2025-07-25T23:30:36.815Z","dependency_job_id":"891111fc-280e-4b1a-b46d-a1d5612a5f35","html_url":"https://github.com/WangWilly/gaze-correction-cam","commit_stats":null,"previous_names":["wangwilly/gaze-correction-cam"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/WangWilly/gaze-correction-cam","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WangWilly%2Fgaze-correction-cam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WangWilly%2Fgaze-correction-cam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WangWilly%2Fgaze-correction-cam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WangWilly%2Fgaze-correction-cam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WangWilly","download_url":"https://codeload.github.com/WangWilly/gaze-correction-cam/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WangWilly%2Fgaze-correction-cam/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30417686,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-12T06:40:58.731Z","status":"ssl_error","status_checked_at":"2026-03-12T06:40:40.296Z","response_time":114,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","eye-detection","eye-tracking","gaze-estimation","gaze-tracking","image-to-image-translation","machine-learning","macos","python","tensorflow"],"created_at":"2025-05-22T00:13:49.432Z","updated_at":"2026-03-12T07:07:30.582Z","avatar_url":"https://github.com/WangWilly.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Gaze Correction Camera\n\n\u003cimg src=\"https://github.com/user-attachments/assets/66e2355a-20d7-4ac5-b711-cb1b2ff653d7\" style=\"width: 160px; display: block;\"\u003e\n\nMacOs Application Preview ([🔗 Download](https://drive.google.com/file/d/1E47OZ66YPab1QuTbxN97hL2u3GYwyUbz/view?usp=drive_link))\n\n## Overview\n\nThis project implements a gaze correction system for video communication that uses computer vision and deep learning techniques to adjust eye gaze direction in real-time, providing a more natural eye contact experience during video calls. ([study more](./docs/orignal_doc.md))\n\n## Demo\n\n\u003c!--\nSource - Adapted from Stack Overflow\nRetrieved 2026-01-27\n--\u003e\n\n\u003cdiv style=\"position: relative; display: inline-block;\"\u003e\n  \u003c!-- Video Thumbnail --\u003e\n  \u003ca href=\"https://www.youtube.com/watch?v=tOobANsNzOQ\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.youtube.com/vi/tOobANsNzOQ/0.jpg\" style=\"width: 320px; display: block;\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n## Prerequisites\n\nEnvironment:\n\n```text\nProductName:            macOS\nProductVersion:         15.2\nBuildVersion:           24C101\n```\n\nThe following dependencies are required to run this application:\n\n- [Python 3.12+](https://www.python.org/downloads/)\n- [Poetry](https://python-poetry.org/docs/) for dependency management\n- [CMake](https://cmake.org/download/) (required for building dlib)\n- [pkg-config](https://www.freedesktop.org/wiki/Software/pkg-config/) (required for certain dependencies)\n\n## Installation\n\n1. Install system dependencies:\n\n   ```bash\n   brew install pkg-config\n   brew install cmake\n   ```\n\n2. Install Python dependencies using Poetry:\n\n   ```bash\n   poetry install\n   ```\n\n3. Download pretrained model files:\n\n   Download the following files from [GitHub Releases](https://github.com/WangWilly/gaze-correction-cam/releases) and place them in the appropriate directories:\n   - **Face landmark detector**: `shape_predictor_68_face_landmarks.dat`\n     - Place in: `lm_feat/shape_predictor_68_face_landmarks.dat`\n   - **Gaze correction model weights**: FLX model (Left and Right eye models)\n     - Place in: `weights/warping_model/flx/12/L/` and `weights/warping_model/flx/12/R/`\n     - Required files per directory: `checkpoint`, `L.data-00000-of-00001` / `R.data-00000-of-00001`, `L.index` / `R.index`, `L.meta` / `R.meta`\n\n   - **(Optional) MediaPipe model**: `face_landmarker.task` (for MediaPipe backend)\n     - Place in: `models/face_landmarker.task`\n     - Download from [MediaPipe Solutions](https://developers.google.com/mediapipe/solutions/vision/face_landmarker)\n\n## Usage\n\n### Single Window Application (Recommended)\n\nRun the simplified single-window gaze correction application:\n\n```bash\n# Using default dlib backend\npoetry run python bin_single_window.py\n\n# Using MediaPipe backend (requires face_landmarker.task)\npoetry run python bin_single_window.py --backend mediapipe\n\n# Specify camera device\npoetry run python bin_single_window.py --camera 0\n```\n\n#### Controls\n\n| Key | Action                        |\n| --- | ----------------------------- |\n| `g` | Toggle gaze correction on/off |\n| `c` | Toggle calibration mode       |\n| `q` | Quit application              |\n\n#### Calibration Mode Controls\n\nWhen calibration mode is enabled (press `c`):\n\n| Key                          | Action                                 |\n| ---------------------------- | -------------------------------------- |\n| Arrow keys (`↑` `↓` `←` `→`) | Adjust camera offset X/Y (±0.5 cm)     |\n| `+` / `-`                    | Adjust camera offset Z depth (±0.5 cm) |\n| `[` / `]`                    | Adjust focal length (±10 pixels)       |\n| `r`                          | Reset to default values                |\n\nThe calibration overlay displays:\n\n- Current camera offset (X, Y, Z in cm)\n- Estimated eye position (X, Y, Z in cm)\n- Current focal length (in pixels)\n- Top-view diagram showing camera, screen, and eye positions\n\n## System Requirements\n\n- macOS with camera access permissions\n- Sufficient GPU resources for real-time processing\n- Webcam or video capture device\n\n## Architecture \u0026 Module Documentation\n\n### System Overview\n\nThis is a **real-time gaze correction system** that redirects eye gaze in video streams to create natural eye contact during video calls. The system uses face detection, facial landmarks, and deep learning models to warp eye regions.\n\n### File Structure \u0026 Module Organization\n\n#### 1. Entry Points (bin\\_\\*.py)\n\n##### bin*single_window.py ⭐ \\_Main Application*\n\n- **Purpose**: Single-window gaze correction app with real-time controls\n- **Features**:\n  - Auto-detects camera resolution\n  - Toggle gaze correction on/off (`g` key)\n  - Calibration mode for camera offset adjustment (`c` key)\n  - Supports multiple backends (dlib/MediaPipe)\n- **Flow**: `Camera Input → FacePredictor → GazeCorrector → Display Output`\n\n##### bin_focal_length_calibration.py\n\n- Standalone tool for camera focal length calibration\n\n##### bin_test_mediapipe_detection.py\n\n- Test utility for MediaPipe face detection\n\n#### 2. Core Modules (displayers/)\n\nThe `displayers/` directory contains the main business logic components:\n\n##### face_predictor.py - Face Detection \u0026 Landmark Extraction\n\n**Purpose**: Abstract interface for face detection backends\n\n**Key Classes**:\n\n- `FacePredictor` (ABC): Interface for face detection\n- `DlibFacePredictor`: Implementation using dlib (68 landmarks)\n- `MediaPipeFacePredictor`: Implementation using Google MediaPipe\n- Data classes: `FaceData`, `EyeData`, `EyeLandmarks`\n\n**Process**: `Input Frame → Face Detection → Landmark Prediction → Eye Extraction → EyeData`\n\n**Output**: `FaceData` containing:\n\n- Left/right eye images (normalized 48×64)\n- Anchor maps (feature point maps for spatial guidance)\n- Eye center coordinates\n- Original positions in frame\n\n##### gaze_corrector.py - Gaze Correction Model\n\n**Purpose**: Wraps TensorFlow models for eye gaze correction\n\n**Key Classes**:\n\n- `GazeModel`: TensorFlow model wrapper (loads L/R eye models)\n- `GazeCorrector`: High-level interface for gaze correction\n- `CameraConfig`: Camera geometry (focal length, IPD, camera offset)\n\n**Process**:\n\n```\nEyeData + Camera Geometry → TF Model Inference → Warped Eye Image\n                          ↓\n                    Angle Calculation (3D geometry)\n```\n\n**Components**:\n\n1. **Model Loading**: Loads separate L/R eye TensorFlow models from `weights/`\n2. **Angle Calculation**: Computes gaze redirection angle based on:\n   - Eye position in 3D space\n   - Camera position relative to screen\n   - Target gaze direction (toward camera)\n3. **Eye Warping**: Applies learned transformation to redirect gaze\n\n**Camera Geometry**:\n\n- `focal_length`: Camera focal length (pixels)\n- `ipd`: Inter-pupillary distance (cm)\n- `camera_offset`: Camera position (X, Y, Z) relative to screen center\n\n##### dis_single_window.py - Application Orchestrator\n\n**Purpose**: Main application logic coordinating all components\n\n**Key Class**: `SingleWindowGazeCorrector`\n\n**Responsibilities**:\n\n1. Camera capture and frame processing\n2. FacePredictor → GazeCorrector pipeline\n3. Real-time toggle controls\n4. Calibration mode UI\n5. Composite frame rendering\n\n**Pipeline**:\n\n```\nCamera Frame\n    ↓\nResize for Face Detection (320×240)\n    ↓\nFacePredictor.list_eye_data()\n    ↓\nFor each eye:\n    - If gaze_enabled: GazeCorrector.correct_eye()\n    - Else: Use original eye image\n    ↓\nComposite corrected eyes onto original frame\n    ↓\nDraw status overlay\n    ↓\nDisplay in window\n```\n\n#### 3. TensorFlow Models (tf_models/)\n\n##### flx.py - FLX Model Architecture\n\n**Purpose**: Defines the neural network architecture for gaze correction\n\n**Key Components**:\n\n- `encoder()`: Encodes gaze angle into spatial feature map\n- `trans_module()`: Transformation module with skip connections\n- `apply_lcm()`: Light color modulation for realistic rendering\n- `inference()`: Main forward pass combining all components\n\n**Architecture**:\n\n```\nEye Image + Anchor Map + Angle\n    ↓\n[Feature Extraction CNN]\n    ↓\n[Angle Encoder] → Spatial Feature Map\n    ↓\n[Transformation Module (Dense CNN)]\n    ↓\n[Flow Field Generation]\n    ↓\n[Spatial Transformer] → Warped Image\n    ↓\n[Light Color Modulation]\n    ↓\nCorrected Eye Image\n```\n\n##### transformation.py - Spatial Transformer\n\n**Purpose**: Implements differentiable image warping\n\n**Key Functions**:\n\n- `meshgrid()`: Generates coordinate grid\n- `interpolate()`: Bilinear interpolation for smooth warping\n- `apply_transformation()`: Applies flow field to warp image\n\n**Used for**: Applying learned pixel displacement fields to eye images\n\n##### tf_utils.py\n\n- Common TensorFlow utilities\n- CNN/DNN blocks with batch normalization\n\n#### 4. Utilities (utils/)\n\n##### config.py - Configuration Management\n\n**Purpose**: Centralized configuration using argparse\n\n**Parameters**:\n\n- Model dimensions (height=48, width=64, ef_dim=12)\n- Camera parameters (focal length, IPD, camera offset)\n- Network settings (IP, ports for multi-process mode)\n\n##### logger.py - Logging Utility\n\n**Purpose**: Formatted logging with timestamps and thread IDs\n\n**Format**: `2026-01-27 10:30:45.123 Python[12345:67890] +[ClassName]: Message`\n\n### Data Flow Pipeline\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                        MAIN APPLICATION                         │\n│                     (bin_single_window.py)                      │\n└──────────────────────┬──────────────────────────────────────────┘\n                       │\n                       ↓\n         ┌─────────────────────────────┐\n         │   Camera Capture (OpenCV)   │\n         │   Original: 640×480         │\n         └─────────────┬───────────────┘\n                       │\n                       ↓\n         ┌─────────────────────────────┐\n         │  Resize for Detection       │\n         │  Downscaled: 320×240        │\n         └─────────────┬───────────────┘\n                       │\n                       ↓\n┌──────────────────────────────────────────────────────────────────┐\n│                    FACE DETECTION LAYER                          │\n│                  (displayers/face_predictor.py)                  │\n├──────────────────────────────────────────────────────────────────┤\n│  • Detect face(s) in frame                                       │\n│  • Predict 68 facial landmarks (dlib) OR                         │\n│  • Predict 478 landmarks (MediaPipe)                             │\n│  • Extract eye regions (6 points per eye)                        │\n│  • Resize eye images to 48×64                                    │\n│  • Generate anchor maps (landmark feature maps)                  │\n└─────────────┬────────────────────────────────────────────────────┘\n              │\n              ↓ Output: List[FaceData]\n              │\n┌─────────────────────────────────────────────────────────────────┐\n│  FaceData {                                                     │\n│    left_eye: EyeData {                                          │\n│      image: 48×64×3 (normalized)                                │\n│      anchor_map: 48×64×12 (feature points)                      │\n│      center: (x, y)                                             │\n│      top_left: (row, col)                                       │\n│    }                                                            │\n│    right_eye: EyeData {...}                                     │\n│  }                                                              │\n└─────────────┬───────────────────────────────────────────────────┘\n              │\n              ↓\n┌──────────────────────────────────────────────────────────────────┐\n│                   GAZE CORRECTION LAYER                          │\n│                 (displayers/gaze_corrector.py)                   │\n├──────────────────────────────────────────────────────────────────┤\n│  For each eye:                                                   │\n│    1. Calculate 3D eye position from landmarks                   │\n│    2. Compute gaze redirection angle (toward camera)             │\n│    3. Feed to TensorFlow model:                                  │\n│       • Eye image (48×64×3)                                      │\n│       • Anchor map (48×64×12)                                    │\n│       • Gaze angle (θx, θy)                                      │\n│    4. Model outputs warped eye image                             │\n└─────────────┬────────────────────────────────────────────────────┘\n              │\n              ↓\n┌──────────────────────────────────────────────────────────────────┐\n│                      TensorFlow MODEL                            │\n│                     (tf_models/flx.py)                           │\n├──────────────────────────────────────────────────────────────────┤\n│  [Encoder] → Angle to spatial feature map                        │\n│  [CNN Feature Extraction] → Image features                       │\n│  [Transformation Module] → Flow field prediction                 │\n│  [Spatial Transformer] → Apply warping                           │\n│  [Light Color Module] → Adjust lighting                          │\n└─────────────┬────────────────────────────────────────────────────┘\n              │\n              ↓ Corrected Eye Image (48×64×3)\n              │\n┌──────────────────────────────────────────────────────────────────┐\n│                    COMPOSITE \u0026 DISPLAY                           │\n│                (dis_single_window.py)                            │\n├──────────────────────────────────────────────────────────────────┤\n│  1. Resize corrected eyes back to original size                  │\n│  2. Paste onto original 640×480 frame at eye positions           │\n│  3. Draw status overlay (GAZE ON/OFF)                            │\n│  4. Draw calibration overlay (if enabled)                        │\n│  5. Display in OpenCV window                                     │\n└──────────────────────────────────────────────────────────────────┘\n```\n\n### Key Design Patterns\n\n#### 1. Dependency Injection\n\n- `FacePredictor` is injectable → easy to swap backends (dlib ↔ MediaPipe)\n- `GazeCorrector` is injectable → testable and modular\n\n#### 2. Abstract Interface\n\n- `FacePredictor` is abstract base class\n- Implementations: `DlibFacePredictor`, `MediaPipeFacePredictor`\n\n#### 3. Configuration Objects\n\n- Dataclasses for configuration (immutable, type-safe)\n- `DisplayConfig`, `CameraConfig`, `GazeModelConfig`, etc.\n\n#### 4. Separation of Concerns\n\n- Face detection ≠ Gaze correction\n- Display logic ≠ Model inference\n- Configuration ≠ Business logic\n\n### Module Responsibilities\n\n| Module                | Input                      | Output           | Responsibility                    |\n| --------------------- | -------------------------- | ---------------- | --------------------------------- |\n| **face_predictor**    | Frame (BGR)                | `List[FaceData]` | Detect faces, extract eye regions |\n| **gaze_corrector**    | `FaceData` + Camera Config | Corrected frame  | Apply gaze correction model       |\n| **flx.py**            | Eye image + Anchor + Angle | Warped eye       | Neural network inference          |\n| **transformation.py** | Flow field + Image         | Warped image     | Spatial transformation            |\n| **dis_single_window** | Camera stream              | Display window   | Orchestrate pipeline, UI          |\n\n### How It Works (High-Level)\n\n1. **Capture** video frame from webcam\n2. **Detect** face and extract 68 facial landmarks\n3. **Extract** left/right eye regions (48×64 each)\n4. **Calculate** 3D eye position and required gaze angle\n5. **Inference** through trained CNN to generate warping flow field\n6. **Warp** eye image using spatial transformer\n7. **Composite** corrected eyes back onto original frame\n8. **Display** result in real-time\n\nThe key innovation is the **learned warping transformation** that realistically redirects gaze while preserving eye appearance, lighting, and texture.\n\n## References\n\nThe implementation is based on research in gaze correction techniques using warping-based convolutional neural networks.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangwilly%2Fgaze-correction-cam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwangwilly%2Fgaze-correction-cam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwangwilly%2Fgaze-correction-cam/lists"}