{"id":25347618,"url":"https://github.com/who-else-but-arjun/ethos_round2","last_synced_at":"2026-02-06T15:03:20.163Z","repository":{"id":267039858,"uuid":"875190586","full_name":"who-else-but-arjun/Ethos_Round2","owner":"who-else-but-arjun","description":"This repository consists the source code for the image enhancement pipeline built during the second round of Ethos Hackathon at IIT Guwahati. This pipeline integrates SRCNN, LIME and DeblurGAN for the enhancement of low quality CCTV frames. Also integrates VGGface face recognition model.","archived":false,"fork":false,"pushed_at":"2025-07-30T13:11:51.000Z","size":3043,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-29T17:46:26.042Z","etag":null,"topics":["cnn","deblurgan","gan","lime","neural-networks","srcnn","super-resolution"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/who-else-but-arjun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-19T10:31:43.000Z","updated_at":"2025-07-30T13:11:55.000Z","dependencies_parsed_at":"2025-02-14T14:57:38.203Z","dependency_job_id":"90f20a4e-1f1e-4cdd-bdb2-e3e8898a276a","html_url":"https://github.com/who-else-but-arjun/Ethos_Round2","commit_stats":null,"previous_names":["who-else-but-arjun/ethos_round2"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/who-else-but-arjun/Ethos_Round2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/who-else-but-arjun%2FEthos_Round2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/who-else-but-arjun%2FEthos_Round2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/who-else-but-arjun%2FEthos_Round2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/who-else-but-arjun%2FEthos_Round2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/who-else-but-arjun","download_url":"https://codeload.github.com/who-else-but-arjun/Ethos_Round2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/who-else-but-arjun%2FEthos_Round2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29165711,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-06T14:37:12.680Z","status":"ssl_error","status_checked_at":"2026-02-06T14:36:22.973Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn","deblurgan","gan","lime","neural-networks","srcnn","super-resolution"],"created_at":"2025-02-14T14:57:32.746Z","updated_at":"2026-02-06T15:03:20.147Z","avatar_url":"https://github.com/who-else-but-arjun.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Abstract\n\nIn this project, we propose a facial image enhancement pipeline\nspecifically tailored for low-quality CCTV footage, which is frequently\naffected by issues such as poor lighting, low resolution, and motion\nblur. The proposed pipeline integrates four core components: face\nextraction using OpenCV Haar Cascade Classifier, image enhancement\nthrough Super-Resolution Convolutional Neural Networks (SRCNN), handling\npoor lighting conditions with the help of LIME( Low-light Image\nEnhancement via Illumination Map Estimation)and deblurring via the\nDeblurGANv2 model. The objective is to improve the visual quality and\nclarity of facial images captured in challenging conditions, enhancing\ntheir usability for identification purposes. Comparative results show\nsignificant improvements in image clarity, enabling more accurate and\nefficient facial recognition from previously unusable footage. The\nsystem demonstrates its potential to be applied in real-world problems\nfaced in security and surveillance.\n\n[Github link to the\nMVP](https://github.com/who-else-but-arjun/Ethos_Round2).\n[Drive link to the\nWeights](https://drive.google.com/drive/folders/1jJ3jcYPZE-fgxxHvT9lR5Q-vjiOeb0PT?usp=sharing).\n\n# Introduction\n\nFootage captured by CCTV often lacks the necessary quality for accurate\nfacial recognition due to a combination of factors like low resolution,\nmotion blur, and poor lighting conditions. These issues make it\ndifficult to extract meaningful data for identification, leading to gaps\nin surveillance and law enforcement. The challenge is to develop a\nsystem that can automatically detect, enhance, and deblur faces from\nsuch low-quality footage in a computationally efficient manner. The\nobjective of this project is to create an end-to-end pipeline capable of\nenhancing facial images extracted from CCTV footage. The pipeline\nleverages a combination of deep learning and traditional image\nprocessing techniques to address the challenges posed by poor lighting,\nmotion blur, and low resolution. Specifically, this system will:\n\n-   Detect and extract faces from CCTV footage using OpenCV Haar Cascade\n    Classifier.\n\n-   Enhance the resolution and quality of the extracted images using a\n    pre-trained SRCNN.\n\n-   Improve the illumination conditions of the the image frames\n    extracted from video recordings using LIME.\n\n-   Restore clarity to motion-blurred images using DeblurGANv2.\n# Steps to run the code :\n\n## Table of Contents  \n- [Features](#features)  \n- [Project Structure](#project-structure)  \n- [Setup](#setup)  \n- [How to Run](#how-to-run)  \n- [Usage Guide](#usage-guide)  \n- [Models Used](#models-used)  \n\n## Features  \n- **Face Detection and Recognition** using VGGFace with ResNet50.\n- **Super-Resolution** enhancement using SRCNN.  \n- **Image Deblurring** using DeblurGANv2.  \n- **Interactive Dashboards** powered by **Streamlit** for real-time predictions on video or images.  \n- **LIME (Local Interpretable Model-Agnostic Explanations)** support for model interpretability.  \n- Supports **custom datasets** for face recognition training.\n\n---\n\n## Project Structure  \n```bash\n├── dashboard.py           # Dashboard for predictions\n├── live.py                # Live feed prediction dashboard\n├── pipeline.py            # Preprocessing and enhancements\n├── SRCNN.py               # Super-resolution model\n├── DeblurGANv2.py         # Deblurring model\n├── LIME.py                # LIME model for interpretability\n├── VGGFace.py             # VGG16 model with classification layers\n├── VGGFace_Resnet50.ipynb # Train face recognition model\n├── crop.py                # Face extraction from dataset\n├── layer_utils.py         # Custom layers for models\n├── requirements.txt       # Python dependencies\n└── README.md              # Project documentation (this file)\n```\n\n## Setup  \n\n### Prerequisites  \nEnsure you have **Python 3.11** and **TensorFlow 2.17** installed.  \n\n1. **Clone the repository:**  \n   ```bash\n   git clone \u003crepository-url\u003e\n   cd \u003crepository-directory\u003e\n   ```\n\n2. **Download model weights** from the provided Google Drive link and place them in the appropriate locations within the project.\n\n---\n\n## How to Run  \n\n1. **Run the Streamlit Dashboard:**  \n   ```bash\n   streamlit run dashboard.py\n   ```  \n   This dashboard allows you to upload images or videos for face recognition and enhancement.\n\n2. **Run the Live Video Prediction Dashboard:**  \n   ```bash\n   streamlit run live.py\n   ```  \n   This version works with a **pre-recorded video** for predictions.\n---\n## Usage Guide  \n\n1. **Dataset Preparation:**  \n   Create folders in the following structure:  \n   ```bash\n   Dataset/{Name_of_suspect}/images.png\n   Headsets/{Name_of_suspect}/1.png\n   ```\n\n2. **Extract Faces from Dataset:**  \n   Run the `crop.py` file to extract faces from the dataset images and store them in the `Headsets` folder.  \n   ```bash\n   python crop.py\n   ```\n\n3. **Train the Face Recognition Model:**  \n   Use the `VGGFace_Resnet50.ipynb` notebook to train the model on your own dataset. Training will continue until the desired accuracy is achieved, and the weights will be saved for future use.  \n\n4. **Prediction on Dashboard:**  \n   - Upload an image or video on the **dashboard** to perform face recognition.  \n   - For **live predictions**, use `live.py` with a pre-recorded video.\n---\n\n## Models Used  \n\n1. **SRCNN** (`SRCNN.py`):  \n   - Used for **super-resolution** enhancement of images.  \n\n2. **DeblurGANv2** (`DeblurGANv2.py`):  \n   - Used for **deblurring** the images.  \n\n3. **LIME** (`LIME.py`):  \n   - Provides **model interpretability** by showing which features influence the predictions.\n\n4. **VGGFace** (`VGGFace.py`):  \n   - Uses **VGG16 architecture** for feature extraction and includes custom layers for face classification.\n\n---\n# Methodology\n\n## Face Detection using OpenCV Haar Cascade Classifier\n\nThe first step in the pipeline is to detect faces in the CCTV footage.\nFor this, we utilized the Haar Cascade Classifier from OpenCV. The\nclassifier identifies human faces by scanning the image at multiple\nscales and detecting features that resemble facial characteristics.\nWhile efficient and fast, the method is sensitive to extreme lighting\nvariations and occlusions.\n\n-   **Pre-trained Weights:** The Haar Cascade model is pre-trained using\n    a large set of positive and negative images, which allows it to\n    generalize across diverse face structures and detect faces\n    efficiently. The pre-trained XML file for the Haar Cascade model is\n    provided in OpenCV's official GitHub repository, making it readily\n    accessible for integration. [OpenCV Haar Cascade Pre-trained\n    Weights](https://github.com/opencv/opencv/tree/master/data/haarcascades)\n\n-   **Why Chosen:** This model was chosen for its speed and efficiency\n    in real-time video processing, which is crucial for the high volume\n    of data captured by CCTV systems.\n\n## Image Enhancement using SRCNN (Super-Resolution Convolutional Neural Network)\n\nAfter detecting faces, we applied the SRCNN model to enhance the\nresolution of the extracted facial images. SRCNN is a deep\nlearning-based approach that upscales low-resolution images and restores\nhigh-frequency details. The network consists of multiple convolutional\nlayers that learn the mapping from low-resolution to high-resolution\nimages, ensuring critical facial features are restored.\n\n-   **Pre-trained Weights:** The SRCNN model was employed with\n    pre-trained weights available from the official paper\n    repository.[SRCNN Pre-trained\n    Weights](https://github.com/tegg89/SRCNN-Tensorflow) These weights\n    were trained on large datasets of low and high-resolution image\n    pairs, allowing the model to perform super-resolution on a wide\n    variety of images.\n\n-   **Why Chosen:** SRCNN was selected for its balance between\n    effectiveness and computational efficiency. Although more advanced\n    models such as SRGAN or EDSR could provide even better results,\n    SRCNN offered a faster and more straightforward approach, making it\n    well-suited for the scale of this project.\n\n## Improving poor lighting conditions using LIME.\n\nThe LIME algorithm models an image as the product of its reflectance (R)\nand illumination (T), expressed mathematically as\n$I(x) = R(x) \\times T(x)$. To estimate the illumination map $T(x)$, the\nalgorithm identifies the brightest channel by taking the maximum pixel\nvalues across the RGB channels, under the assumption that the brightest\nchannel contains the most relevant illumination information.\n\nTo ensure smooth transitions between illumination values, a weighting\nmatrix $W$ is constructed based on the first-order derivatives in both\nvertical and horizontal directions. This strategy helps maintain\nconsistency across adjacent pixels. LIME further refines the\nillumination map through iterative optimization. The process involves\nsolving multiple subproblems: illumination estimation using Fourier\ntransforms ($T$ subproblem), reflectance estimation through gradient\ndescent ($G$ subproblem), and auxiliary updates involving parameters $Z$\nand $u$.\n\nThe enhanced brightness is achieved by applying gamma correction to the\nillumination map, which compresses the dynamic range for more balanced\nlighting. The final enhanced image is produced by dividing the original\nimage by the illumination map, followed by clamping pixel values to\nmaintain valid intensity levels. This process effectively mitigates\nlow-light conditions, resulting in an image with uniform brightness and\nimproved visibility.\n\n-   **Why Chosen:** LIME is a simple and efficient method for image\n    enhancement, operating directly in the spatial domain without the\n    need for large datasets or complex computations. It preserves\n    natural lighting, introduces fewer artifacts, and is more resilient\n    to noise than traditional methods like histogram equalization, which\n    often amplify noise and produce unnatural results. Compared to\n    Retinex-based models, LIME offers a faster, simpler solution for\n    estimating illumination. Unlike deep learning models such as SRCNN\n    or GANs, which require extensive training and computational power,\n    LIME is lightweight and can be applied universally without training,\n    making it ideal for practical use in low-light conditions.\n\n## Deblurring with DeblurGANv2\n\nMany facial images extracted from CCTV footage suffer from motion blur\ndue to subject movement. To address this, we employed DeblurGANv2, a\nstate-of-the-art deblurring model based on generative adversarial\nnetworks (GANs). The model consists of a generator(a convolutional\nneural network) that learns to restore blurred images and a\ndiscriminator that assesses the quality of the deblurred\nimages.[DeblurGANv2 Pre-trained\nWeights](https://github.com/KupynOrest/DeblurGANv2)\n\n-   The architecture is based on a U-Net-like structure with residual\n    blocks. It uses InstanceNormalization layers instead of the more\n    traditional BatchNormalization to avoid batch size sensitivity and\n    to enhance performance on images. The architecture consists of:\n\n    -   **Reflection Padding 2D:** Initial reflection padding,\n        convolution, and activation to preserve edge information which\n        is essential for image processing tasks.\n\n    -   **Downsampling layers:** Progressively reduces the image size\n        while increasing the feature depth.\n\n    -   **Residual blocks:** These allow the model to better capture\n        fine details and maintain information across multiple scales.\n\n    -   **Upsampling layers:**These restore the image back to the\n        original size after the downsampling.\n\n    -   **Final convolution and tanh activation:** Produces the output\n        image with pixel values between \\[-1, 1\\].\n\n-   **Pre-trained Weights:** DeblurGANv2 is used with pre-trained\n    weights available in the official implementation repository. These\n    weights were trained on extensive datasets of blurred and sharp\n    image pairs, making it highly effective in handling various types of\n    motion blur.\n\n-   **Image Pre-processing:** Before feeding an image into the\n    generator,The pixel values are normalized from the range \\[0, 255\\]\n    to \\[-1, 1\\], which is the input range required by the generator\n    (due to the tanh output activation).\n\n-   **Image Post-processing:** After the generator outputs the deblurred\n    image, The pixel values are rescaled from the \\[-1, 1\\] range back\n    to \\[0, 255\\] for displaying or converting the image back to a\n    format that can be saved (For eg: PNG or JPEG).\n\n-   **Why Chosen:** DeblurGANv2 was chosen due to its ability to handle\n    complex blurring patterns and its superior performance compared to\n    traditional deblurring techniques. It stands out as an advanced\n    model for image deblurring due to its GAN-based adversarial\n    training, multi-scale feature handling via FPN (Feature Pyramid\n    Network for Multi-Scale Blur Handling), high perceptual quality with\n    improved loss functions, and robustness to real-world conditions.Its\n    ability to restore sharpness in images and handle a wide range of\n    blur-scenarios with minimal computational overhead made it an ideal\n    choice for this pipeline.\n\n![Model Architecture](Model.png)\n\n# Discussion\n\n## Challenges and Limitations\n\n-   **Lighting and Occlusion:** While the Haar Cascade Classifier is\n    efficient, it falters in cases of occlusion or poor lighting. Future\n    iterations could integrate more robust face detection models, such\n    as deep learning-based detectors that can handle a broader range of\n    scenarios.\n\n-   **Real-time Processing:** Although the current pipeline can process\n    individual frames efficiently, real-time processing of continuous\n    video streams remains a challenge. Incorporating GPU acceleration\n    and optimizing the models for faster inference times will be\n    essential for real-time deployment.\n\n-   **Fixed Image Size:** The pre-trained model used for deblurring\n    assumes all input images are resized to 256x256 pixels during\n    preprocessing.This size constraint may degrade the quality of the\n    output, especially if the original images have a higher resolution.\n\n## Future Enhancement Implementation: Face Recognition for Security and Surveillance Systems \n\nThe project's future enhancement focuses on creating a real-time face\nrecognition system for automating attendance tracking. The system\nutilizes the VGGFace model, based on VGG16 for feature extraction, while\nthe classification layer is custom-trained on a dataset of suspect\nfaces. The enhancement includes an image processing pipeline to ensure\nhigh-quality face detection and recognition during live tracking.\n\n## Concept of Transfer Learning\n\nThe system leverages transfer learning, a machine learning technique\nwhere a model developed for a task is reused as the starting point for a\ndifferent but related task. In this implementation:\n\n-   VGG16 was pretrained on large-scale face recognition data as part of\n    VGGFace. It has already learned general features of human faces,\n    such as edges, shapes, and facial structures.\n\n-   Instead of training a model from scratch, the pretrained VGG16\n    feature extraction layers were retained, allowing the system to\n    reuse the knowledge gained from extensive training on facial\n    features\n\n-   A custom classification layer was then added on top of these feature\n    extraction layers, which was trained specifically to recognize the\n    faces of suspects in our dataset.\n\n## System Architecture\n\n-   **Dataset Structure:** The dataset was organized such that each\n    suspect has a dedicated folder containing at least ten face images:\n    Dataset/Class_name/images\n\n-   **Suspect Database:** A JSON file was maintained alongside the\n    dataset, containing each suspect's unique details (name, id,\n    status), which are used for suspect identification during\n    recognition.\n\n-   **Model Training:**\n\n    -   The VGG16 feature extraction layers were frozen to retain the\n        pretrained weights, and only the custom classification layer was\n        trained to map the extracted features to suspect classes.\n\n    -   The training process involved resizing images to 224x224 pixels\n        and using callbacks to monitor training performance until the\n        model achieved the desired accuracy.\n\n    -   After training, the model's weights were saved for future use in\n        real-time face recognition.\n\n![Sample Output Images](Sample.png)\n\n##  Real-Time Face Recognition System\n\nThe live face recognition system works as follows:\n\n-   **Loading the Model:** The trained VGGFace model, along with the\n    custom classification layer, is loaded from the saved weights.\n\n-   **Live Video Stream:** A live camera feed captures real-time footage\n    of suspects.\n\n-   **Face Detection:** The OpenCV Haar Cascade Classifier is employed\n    to detect faces in each frame of the video stream.\n\n-   **Image Enhancement Pipeline:** The OpenCV Haar Cascade Classifier\n    is employed to detect faces in each frame of the video stream.\n\n    -   **SRCNN:** Enhances the resolution of the detected faces,\n        improving the clarity of low-resolution images.\n\n    -   **LIME:** Increases the illumination in poor-light image frames.\n\n    -   **DeblurGANv2:** Removes motion blur from images, ensuring the\n        detected faces are sharp and suitable for recognition.\n\n-   **Face Recognition:** The enhanced face images are resized to\n    224x224 pixels and passed through the VGGFace model, which predicts\n    the class ID corresponding to the suspect.\n\n-   **Suspect Identification:** The predicted class ID is\n    cross-referenced with the suspect database stored in the JSON file\n    to retrieve the suspect's details, such as name, id and status.\n\n## Challenges and Solutions\n\n-   **Image Quality:** The live video feed or the video recording of the\n    suspect might include low-quality images due to lighting or motion,\n    which are effectively handled by the SRCNN, LIME and DeblurGANv2\n    pipelines, ensuring that the images are of sufficient quality for\n    recognition.\n\n-   **Real-Time Processing:** The system is optimized for real-time\n    performance, ensuring that each video frame is processed quickly\n    without significant delays in face detection and recognition.\n\n-   **Multiple Faces:** The system is capable of handling multiple faces\n    in real-time, recognizing and flagging alerts for multiple suspects\n    recognised in the frame.\n\n# Results\n\nThe face recognition system successfully identifies suspects in\nreal-time. By leveraging transfer learning with the VGGFace model and\nimage enhancement techniques, the system achieved high accuracy and\nefficiency in recognition.\n\n-   **Recognition Accuracy:** The combination of the VGGFace model,\n    transfer learning, and image enhancement allowed for accurate\n    student identification even in challenging conditions.\n\n-   **Efficient Processing:** The system handled real-time video frames\n    smoothly, maintaining high performance in face detection and\n    recognition.\n\n## Implementing Dashboard for real-time Monitoring: \n\nThe Streamlit interface acts as an interactive dashboard for users to\nengage with the facial reconstruction and recognition pipeline. On the\nleft hand side of the interface, users can upload pre-recorded videos in\nformats such as MP4 or AVI. Upon uploading, the video is displayed with\nplayback options, and users can also view terminal logs detailing the\nprocessing steps in real time. On the right hand side, suspect details,\nsuch as name, ID, and status, are presented in a structured table format\nfor easy reference. Once a video is processed, the system analyzes each\nframe to detect and classify faces. If a match is found, the interface\ndisplays both the found face from the video and the actual suspect image\nside by side for visual comparison, along with the prediction confidence\nand class information.The use of Streamlit provides a smooth,\ninteractive experience without the need for complex back-end systems,\nensuring that even non-technical users can interact with the model\nresults efficiently. This same interface is also used in live\nvideo-streaming wherein the suspects are identified in a live-video feed\nthrough web-cam.\n\n![Dashboard](Dashboard_UI.png)\n# References\n\n-   Dong, C., Loy, C. C., He, K., \u0026 Tang, X. (2015). *Image\n    Super-Resolution Using Deep Convolutional Networks (SRCNN)*. IEEE\n    Transactions on Pattern Analysis and Machine Intelligence.\n\n-   Kupyn, O., Martyniuk, T., Wu, J., \u0026 Wang, Z. (2019). *DeblurGAN-v2:\n    Deblurring (Orders-of-Magnitude) Faster and Better*. Proceedings of\n    the IEEE International Conference on Computer Vision (ICCV).\n\n-   [Keras VGGFace GitHub\n    Repository](https://github.com/rcmalli/keras-vggface).\n\n-   Guo, X., Li, Y., \u0026 Ling, H. (2017). LIME: Low-Light Image\n    Enhancement via Illumination Map Estimation. IEEE Transactions on\n    Image Processing, 26(2), 982--993. [LIME Implementation GitHub\n    Repository](https://github.com/estija/LIME).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwho-else-but-arjun%2Fethos_round2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwho-else-but-arjun%2Fethos_round2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwho-else-but-arjun%2Fethos_round2/lists"}