{"id":21144172,"url":"https://github.com/aim-harvard/faceage","last_synced_at":"2025-04-13T10:23:22.965Z","repository":{"id":241714690,"uuid":"438267816","full_name":"AIM-Harvard/FaceAge","owner":"AIM-Harvard","description":"Decoding biological age from face photographs using deep learning. ","archived":false,"fork":false,"pushed_at":"2024-05-29T07:47:50.000Z","size":4294,"stargazers_count":5,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-27T01:47:33.995Z","etag":null,"topics":["age-estimation","biological-age","cnn","data-analysis","deep-learning","survival-analysis"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AIM-Harvard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-14T13:41:59.000Z","updated_at":"2024-12-29T01:17:59.000Z","dependencies_parsed_at":"2024-05-29T20:30:07.295Z","dependency_job_id":null,"html_url":"https://github.com/AIM-Harvard/FaceAge","commit_stats":null,"previous_names":["aim-harvard/faceage"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIM-Harvard%2FFaceAge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIM-Harvard%2FFaceAge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIM-Harvard%2FFaceAge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AIM-Harvard%2FFaceAge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AIM-Harvard","download_url":"https://codeload.github.com/AIM-Harvard/FaceAge/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248696312,"owners_count":21147102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["age-estimation","biological-age","cnn","data-analysis","deep-learning","survival-analysis"],"created_at":"2024-11-20T08:08:52.150Z","updated_at":"2025-04-13T10:23:22.931Z","avatar_url":"https://github.com/AIM-Harvard.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FaceAge\n\nDecoding Biological Age from Face Photographs using Deep Learning\n\nIf you use code or parts of this code in your work, please cite our publication:\n\n_Dennis Bontempi*, Osbert Zalay*, Danielle S. Bitterman, Nicolai Birkbak, Jack M. Qian, Hannah Roberts, Subha Perni, Andre Dekker, Tracy Balboni, Laura Warren, Monica Krishan, Benjamin H. Kann, Charles Swanton, Dirk De Ruysscher, Raymond H. Mak*, Hugo J.W.L. Aerts* - Decoding biological age from face photographs using deep learning_ (submitted).\n\n\n## Table of Contents\n\n- [FaceAge](#faceage)\n    - [Repository Structure](#repository-structure)\n    - [Environment Setup and Dependencies](#environment-setup-and-dependencies)\n        - [Running the Pipeline on a Machine Equipped With a Gpu](#running-the-pipeline-on-a-machine-equipped-with-a-gpu)\n        - [Running the Pipeline on a Machine Without a Gpu](#running-the-pipeline-on-a-machine-without-a-gpu)\n    - [Acknowledgements](#acknowledgements)\n    - [Disclaimer](#disclaimer)\n- [Data](#data)\n    - [Augmentation and Rebalancing](#augmentation-and-rebalancing)\n    - [Database Processing](#database-processing)\n    - [Database Curation](#database-curation)\n    - [Survey Data Processing](#survey-data-processing)\n    - [Manual Quality Assurance](#manual-quality-assurance)\n        - [Example of Acceptable Image](#example-of-acceptable-image)\n        - [Example of Disqualified Image](#example-of-disqualified-image)\n    - [Link to the Shared Data](#link-to-the-shared-data)\n- [Models](#models)\n    - [Model Description](#model-description)\n    - [Model Training and Performance](#model-training-and-performance)\n    - [Model Validation](#model-validation)\n        - [Comments on Fine-tuning and Bias](#comments-on-fine-tuning-and-bias)\n- [Output](#output)\n- [Notebooks](#notebooks)\n    - [Data Processing Demo](#data-processing-demo)\n    - [Extended Data Plots Demo](#extended-data-plots-demo)\n- [Source Code](#source-code)\n    - [Testing Code](#testing-code)\n- [Statistical Analysis Code](#statistical-analysis-code)\n\n\n## Repository Structure\n\nThis repository is structured as follows:\n\n* The `src` folder stores the code used to train and test the pipeline;\n* The `stats` folder stores the code used in the statistical analysis, and to export the plots in the Main Manuscript and in the Extended Data;\n* The `models` folder stores the pre-trained weights for the FaceAge model;\n* The `data` folder stores the code used for data processing (from the database curation to the processing of some of the results, e.g., the survey in Figure 4 of the Main Manuscript), along with the links to retrieve the data shared with the publication;\n* The `outputs` folder stores a sample of the output from the pipeline;\n* The `notebooks` folder stores some demo resources useful to understand how the pipeline works and reproduce the first figures in the Extended Data.\n\nAdditional details on the content of the subdirectories and their structure can be found in the markdown files stored in each of the subdirectories.\n\n\n## Environment Setup and Dependencies\n\nThe original code was developed and tested using Python 3.6.5 on Ubuntu 18.04 with Tensorflow-gpu 1.8.0 and CUDA 9.1.85. An online version of the pipeline was also tested using the environment described in the `environment-gpu.yaml` file (Python 3.6.13, CUDA 11.3.1, and libcudnn 8.2.1 - on Ubuntu 18.04), the environment described in `environment-cpu.yaml` (i.e., the same packages - but without GPU acceleration), and using the freely accessible Google Colab notebook (Python 3.7.12, CUDA 11.1, and libcudnn 7.6.5 - on Ubuntu 18.04). We recommend running the pipeline on an up-to-date environment to avoid [known problems](https://github.com/ipazc/mtcnn/issues/87) with the `MTCNN` library. The statistical analysis was conducted using R (Version 3.6.3) in an RStudio environment (Version 1.4.1106).\n\nFor the code to run as intended, the all the packages under one of the environment files should be installed. In order not to break previous installations and ensure full compatibility, it's highly recommended to create a Conda environment to run the FaceAge pipeline in. Here follows two examples of the set-up procedure using Conda and one of the provided YAML environment files.\n\n### Running the Pipeline on a Machine Equipped With a Gpu\n\nGPU acceleration can speed up the FaceAge estimation dramatically. If your machine is equipped with a GPU that supports CUDA compute capability, you can set up the Conda environment running the following:\n\n```\n# set-up Conda faceage-test environment\nconda env create --file environment-gpu.yaml\n\n# activate the conda environment\nconda activate faceage-gpu\n```\n\nAt this point, `(faceage-gpu)` should be displayed at the start of each bash line. Furthermore, the command `which python` should return a path similar to `$PATH_TO_CONDA_FOLDER/faceage-gpu/bin/python`. You can check your GPU is correctly identified by the environment (and thus, the GPU acceleration available) by spawning a `python` shell (after activating the Conda environment), and running:\n\n```\nimport tensorflow as tf\ntf.config.list_physical_devices('GPU')\n```\n\nIf the set-up was successful, you should get an output similar to the following:\n\n```\n\u003e\u003e\u003e tensorflow.config.list_physical_devices('GPU')\n[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n```\n\nEverything should now be ready for the data to be processed by the FaceAge pipeline.\n\nThe virtual environment can be deactivated by running:\n\n```\nconda deactivate\n```\n\n### Running the Pipeline on a Machine Without a Gpu\n\nRunning the FaceAge pipeline without GPU acceleration is possible, although this might increase considerably the processing times. If your machine is not equipped with a GPU that supports CUDA compute capability, you can set up the Conda environment running the following:\n\n```\n# set-up Conda faceage-test environment\nconda env create --file environment-cpu.yaml\n\n# activate the conda environment\nconda activate faceage-cpu\n```\n\nAt this point, `(faceage-cpu)` should be displayed at the start of each bash line. Furthermore, the command `which python` should return a path similar to `$PATH_TO_CONDA_FOLDER/faceage-cpu/bin/python`. Everything should now be ready for the data to be processed by the FaceAge pipeline.\n\nThe virtual environment can be deactivated by running:\n\n```\nconda deactivate\n```\n\n\n## Acknowledgements\n\nCode development, testing, and documentation: Osbert Zalay and Dennis Bontempi.\n\n\n## Disclaimer\n\nThe code and data of this repository are provided to promote reproducible research. They are not intended for clinical care or commercial use.\n\nThe software is provided \"as is\", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.\n\n\n\n# Data\n\nThe `data` folder contains data processing and quality assurance files, used to prepare the clinical (HARVARD) and model development datasets. A description of each file is provided below. This README also contains the links to retrieve the data shared with the publication.\n\n## Augmentation and Rebalancing\n\nThe script takes the source database of age-labelled photographs and creates a balanced dataset for model training and development via augmentation and randomized rebalancing. The user defines a target number of samples per age category, age range of the processed dataset, and batch size for augmentation. The augmentation is done using the ImageDataGenerator functionality in Keras:5\n\n```\n# example instantiation of ImageDataGenerator object for augmentation\ndatagen = ImageDataGenerator(\n        # paramaters for augmentation (shifts, rotations, flips, shear, etc.)\n        samplewise_center = False,\n        samplewise_std_normalization = False,\n        rotation_range=20,\n        width_shift_range=0.1,\n        height_shift_range=0.1,\n        shear_range=0.2,\n        zoom_range=0.2,\n        horizontal_flip=True,\n        fill_mode='constant')\n```\n\nThe randomized rebalancing randomly shuffles samples from each each category to supplement the augmented samples until the target number of samples per age category is achieved, in order to achieve an approximate uniform distribution\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/data-training.png\" alt=\"data-training\" width=\"800\"/\u003e\n\u003c/p\u003e\n\n\n## Database Processing\n\nThe script takes `logfile.csv`, which contains the clinical image hashes pertaining to patient photographs stored on the hospital EMR, and maps them to the clinical database of respective patient prognostic factors and outcomes collectively prospectively; e.g. stored on `REDcap` clinical data e-repository at Harvard MGH/BWH. No clinical information is provided in this repo (including the contents of `logfile.csv`) due to privacy conditions.\n\n\n## Database Curation\n\nThis script applies inclusion and exclusion criteria (as described in the manuscript) to the output of `Database_Processing.py` and calculates survival times, follow-up times and events in order to obtain the curated database used for statistical and survival analyses.\n\n\n## Survey Data Processing\n\nThis script was used to import the human survey data from part 1 and 2 (stored on `REDcap` clinical data e-repository at Harvard MGH/BWH; see manuscript for further details) and map those to respective patient prognostic factors and outcomes, to create an output `.csv` file used for statistical analyses of survey results that compares human performance to machine performance in survival prediction for palliative cancer patients.\n\n\n## Manual Quality Assurance\n\nThis script was used to manually cull images that met the photo QA exclusion criteria listed in the manuscript.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/data-manual_qa.png\" alt=\"data-manual_qa\" width=\"550\"/\u003e\n\u003c/p\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n### Example of Acceptable Image\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/data-acceptable_image.png\" alt=\"data-acceptable_image\" width=\"600\"/\u003e\n\u003c/p\u003e\n\n\n\u003cbr\u003e\n\n### Example of Disqualified Image\n\nExclusion reason: the original image, and therefore the cropped face, is of too-low quality.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/data-disqualified_image.png\" alt=\"data-disqualified_image\" width=\"600\"/\u003e\n\u003c/p\u003e\n\n\n## Link to the Shared Data\n\nA link to the data (all the data used for training purposes, and the curated portion of the UTK dataset used for validation purposes) will be shared here upon publication. We plan to make these resources openly available to the community through [Zenodo.org](https://zenodo.org).\n\nFor the time being, to ease the review process, the datasets are available at [this Google Drive link](https://drive.google.com/drive/folders/1C2n7KekfbthDFt0t8vbVpD_lEh5lYFyj?usp=sharing). To ensure the review process stays confidential, we ask reviewers to visit this webpage in private/incognito mode, or log-out of their Google account beforehand. In this way, it is impossible for us to monitor who accesses the Google Drive folders.\n\n\n# Models\n\nThe pre-trained weights will be made available upon publication of the study via [Zenodo](https://zenodo.org).\n\nFor the time being, to ease the review process, the model weights are available at [this Google Drive link](https://drive.google.com/file/d/1KBZBMKbeqDH95KMbPKJ6vFqnf3ze2aIn/view?usp=sharing). To ensure the review process stays confidential, we ask reviewers to visit this webpage in private/incognito mode, or log-out of their Google account beforehand. In this way, it is impossible for us to monitor who accesses the Google Drive folders.\n\n## Model Description\n\nThe FaceAge deep learning pipeline comprises two stages: a face localization and extraction stage, and a feature embedding stage with output linear regressor that provides the continuous estimate of biological age.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/models-diagram.png\" alt=\"models-diagram\" width=\"800\"/\u003e\n\u003c/p\u003e\n\nThe first stage pre-processes the input data by locating the face within the photograph and defines a bounding box around it. The image is then cropped, resized, and pixel values standard-normalized across all RGB channels. Face extraction is accomplished using [a multi-task cascaded CNN (MTCNN)](https://github.com/ipazc/mtcnn) implemented by Zhang et al. (IEEE Signal Processing Letters 23, 1499–1503, 2016). The extraction network is comprised of three sub-networks, namely a proposal network (`P-net`) that creates an initial set of bounding box candidates, of which similar boxes are merged then further refined (`R-net`) using bounding box regression and face landmark localization, then the third stage (`O-net`) makes more stringent use of face landmarks to optimize the final bounding box, achieving a cited test accuracy of 95%. For additional details, please refer to the project repository.\n\nThe second stage of the pipeline takes the extracted face image and feeds it into a convolutional neural network (CNN) that encodes image features which through regression yield a continuous FaceAge prediction as the output. This model (whose pre-trained weights will be made available upon publication) uses an `Inception-ResNet v1` architecture, whose layers progressively embed higher-order facial features into lower-dimensional representations, yielding a 128-dimensional face embedding that is then fed through a linear regression layer to produce the continuous biological age estimate as the final output. The original `Inception-ResNet v1` [CNN weights were pre-trained on the problem of facial recognition by Sandberg](https://github.com/davidsandberg/facenet), whereby it achieved a test accuracy of 98% on the [`Local Faces in the Wild` database](http://vis-www.cs.umass.edu/lfw/results.html) (see project repository for details).\n\nThe network was adapted to biological age estimation by removing the output classification layer and replacing it with densely connected layers feeding a fully-connected output layer with linear activation function to perform the regression. Transfer learning was then applied to tune the weights of the last 281 of the 426 `Inception-ResNet v1` layers (Inception Block B1 onward) in addition to the fully-connected output layers, using the augmented and randomly rebalanced training dataset of N = 56,304 age-labelled face images derived from the `IMDb-Wiki` database (see [Augmentation and Rebalancing](#augmentation-and-rebalancing)). The photographs age-labeled from 60 years old and onward were manually curacted to ensure image quality control and reduce error from noisy labels for this clinically-relevant age group (see [Manual Quality Assurance](#manual-quality-assurance)). The following CONSORT-style diagram shows how the training dataset was constructed:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/models-consort.png\" alt=\"models-consort\" width=\"400\"/\u003e\n\u003c/p\u003e\n\n\n## Model Training and Performance\n\nTraining was carried out on paired GPUs using `Keras` with `Tensorflow` backend, applying stochastic gradient descent with momentum for backpropagation, minimizing mean absolute error (MAE), with batch size of 256, batch normalization, dropout for regularization, and initial learning rate of 0.001 with incremental reduction on plateauing of error rate. The model development set was subdivided using random partitioning into 90% training, 10% testing. A description of the source code can be found in the dedicated README file.\n\nModel performance was good for the age range that underwent manual curation and quality assurance (MAE = 4.09 years). Overall, MAE = 5.87 years for the entire dataset. This was deemed acceptable because the intent was on obtaining a better fit for ages 60 or older, being most clinically relevant to oncology populations, at the expense of not fitting the younger age range \u003c 40 as well. Therefore, we accepted that images pertaining to the younger age labels contained higher heterogeneity, poorer image quality and some noisy or erroneous labels. Nevertheless, mean age difference was approximately zero for ages \u003e 40, demonstrating the absence of an age bias in this clinically-relevant age range.\n\n![model-development-performance](assets/FaceAge-Model-Dev-Performance.SVG)\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n## Model Validation\n\nExample model validation on an independent curated dataset derived from the `UTK` database [can be found here](https://drive.google.com/drive/folders/1N__o3t96KQr3PYOd_5XnVYfJUdhyQYQp?usp=sharing). The notebooks enable the user to reproduce some of the figures in the Extended Data section of the manuscript.\n\n\n### Comments on Fine-tuning and Bias\n\nThe FaceAge model as implemented in the research study on clinical cancer populations did not have finetuning of model parameters. The reasons for foregoing finetuning are manifold, but most importantly we discerned introduction of age bias and overfitting when performing finetuning. Although finetuned FaceAge models could very accurately predict chronologic age for the test sample population of healthy controls, often with MAE \u003c 3 years, when a finetuned model was applied to clinical datasets, prognostic power to discern patient outcomes typically diminished compared to the non-finetuned model:\n\n![fineutuning-FaceAge-on-ChaLearn-2015-dataset](assets/FaceAge-Finetuning-Apparent-Age.SVG)\n\n![fineutuning-effect-clinical-prognostication](assets/FaceAge-Finetuning-Effect-Prognostication.SVG)\n\nTherefore, it was preferrable to utilize the non-finetuned model for biological age estimation and outcomes prognostication, despite the base model being \"noisier\" (i.e. possessing larger dispersion in estimated age range) than the finetuned model. The greater magnitude of dispersion between individuals was found to correlate with relative survival outcomes between patients in the cohort, and thus with biological age, whereby finetuning was found to weaken or eliminate this relational prognostic component, likely in part due to overfitting to the smaller sample size of the finetuning cohort. Bias was also introduced by the idiosyncracies of the fine-tunining dataset, such as by the fact that when humans estimate apparent age, they tend to underestimate the ages of older individuals - precisely in the age-range relevant to clinical oncology populations - with the bias becoming more pronounced with increasing age:\n\n![bias-of-human-age-estimates-of-older-people](assets/Human-Age-Estimation-Bias.SVG)\n\nOf note, although we did not finetune the model for this study in order to avoid issues of overfitting and introducing age-related bias, this does not preclude us from doing so in future iterations of the model if we are able to mitigate the negative impacts of finetuning on clinical prognostication. This is an area of active research for our group.\n\n\n# Output\n\nThe `output` folder stores a sample of the output from the FaceAge pipeline (`utk_hi-res_qa_res.csv`). This CSV file was obtained by processing the data folder `data/utk_hi-res_qa` (the data will be shared upon publication via [Zenodo](https://zenodo.org), and are temporarily available [at this link](https://drive.google.com/drive/folders/1OkFc74Izf_2b1kxtGWR7LCiA83-XvJfd?usp=sharing); to ensure the review process stays confidential, we ask reviewers to visit this webpage in private/incognito mode, or log-out of their Google account beforehand: in this way, it is impossible for us to monitor who accesses the Google Drive folders) using the demo script `predict_folder_demo.py` (with the default configuration stored in `config_predict_folder_demo.yaml`), found under the `src/test` subdirectory.\n\nThe `subj_id` column contains the file name (without the extension), while the `faceage` column contains the estimation of the subject biologic age computed by our pipeline:\n\n|          subj_id          |  faceage  |\n|---------------------------|-----------|\n| 100_1_0_20170112213303693 | 94.32125  |\n| 100_1_0_20170117195420803 | 95.265175 |\n| 100_1_0_20170119212053665 | 98.09706  |\n| ...                       | ...       |\n| 51_0_0_20170117160754830  | 49.63042  |\n| 51_0_0_20170117160756480  | 56.181442 |\n| 51_0_0_20170117160807782  | 59.752483 |\n| ...                       | ...       |\n| 96_1_2_20170110182526540  | 91.92511  |\n| 99_1_0_20170120133837030  | 92.09139  |\n| 99_1_2_20170117195405372  | 95.592224 |\n\n\u003cbr\u003e\n\nThe CSV file `utk_hi-res_qa_res-ext_data.csv` is very similar to the aforementioned, but incorporates all the information needed to reproduce the first Extended Data figures. This CSV has been generated from the [Google Colab Notebook `data_processing_demo.ipynb`](https://colab.research.google.com/drive/15fUoyFsMF7yo--oQ_MZJe51J_RE9BjwE?usp=sharing):\n\n|          subj_id          |  faceage  |  age  |  gender  |  race  |\n|---------------------------|-----------|-------|----------|--------|\n|     20170110153238490     | 75.026596 |  74   |   1      |   0    |\n|     20170109213056053     | 28.754328 |  21   |   1      |   2    |\n|     20170117012906285     | 48.943573 |  38   |   0      |   1    |\n|     20170117151304315     | 30.146166 |  30   |   1      |   0    |\n|     20170116200714834     | 45.974422 |  26   |   0      |   1    |\n|     ...                   | ...       | ...   | ...      | ...    |\n\n\n# Notebooks\n\nThe `notebook` folder stores some tutorial notebooks we developed to allow for quick experimentation with our pipeline.\n\nThe notebooks can be downloaded and run locally (provided the right environment is set up - see the main `README` for detailed instructions), or opened directly in Google Colab by clicking the \"Open with Colab\" buttons below.\n\n## Data Processing Demo\n\n[This Google Colab notebook](https://colab.research.google.com/drive/15fUoyFsMF7yo--oQ_MZJe51J_RE9BjwE?usp=sharing) is designed to showcase how the pipeline is set-up, how to format the input to the pipeline, and what to expect as its output - as well as estimate the computational costs of the operation involved.\n\nThe data required to run the notebook will be made available through [Zenodo](https://zenodo) upon publication. For the time being, to ease the review process, the datasets are available at [this Google Drive link](https://drive.google.com/drive/folders/1C2n7KekfbthDFt0t8vbVpD_lEh5lYFyj?usp=sharing). To ensure the review process stays confidential, we ask reviewers to visit this webpage in private/incognito mode, or log-out of their Google account beforehand. In this way, it is impossible for us to monitor who accesses the Google Drive folders.\n\n\nHere follows a sample from the notebook:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/notebooks-data_processing.png\" alt=\"notebooks-data_processing\" width=\"500\"/\u003e\n\u003c/p\u003e\n\n```\nAge:      62\nFaceAge:  63.252663\n\nGender:   0 (Male)\nRace:     0 (White)\n```\n\n## Extended Data Plots Demo\n\n[This Google Colab notebook](https://colab.research.google.com/drive/1EPBcbD1nZ_WVJ-G7giXSIb4sr9GEV6M2?usp=sharing) allows the user to reproduce some of the figures in the Extended Data document.\n\nNote that any small difference in the quantitative results is due to a difference between the versions of the dependencies used to conduct the final analysis in the paper, and the version of such dependencies on Google Colab.\n\nHere follows a sample from the notebook:\n\n\u003cbr\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/notebooks-extended_data.png\" alt=\"notebooks-extended_data\" width=\"800\"/\u003e\n\u003c/p\u003e\n\n\n# Source Code\n\nThe `src` folder stores the code used to train and test the pipeline.\n\nA list of the pipeline dependencies, as well as a quick guide on how to set-up the environment to run the pipeline, can be found in the main `README`.\n\n\n# Testing Code\n\nIn the `test` folder, we provide a simple script (`predict_folder_demo.py`) to process all the `.jpg` and `.png` files in a given directory.\n\nBy default, the script reads the images to process from a folder whose name is specified in the `config_predict_folder_demo.yaml` configuration file (`input_folder_name`). The absolute path to the folder is determined from `base_path` and `data_folder_name` as well (by default, the script will look for a folder at `$base_path/$data_folder_name/$input_folder_name`):\n\n```\n$base_path\n└── $data_folder_name\n    └── $input_folder_name\n        ├── subj1.jpg\n        ├── subj2.jpg\n        ├── subj3.jpg\n        ...\n```\n\nThe folder can contain files that are not `.png` and `.jpg`, as only these formats will be read and processed. Here follows an example of how a folder should look like:\n\n```\n~/git/FaceAge\n└── data\n    └── utk_hi-res_qa\n        ├── 100_1_0_20170112213303693.jpg\n        ├── 100_1_0_20170117195420803.jpg\n        ├── 100_1_0_20170119212053665.jpg\n        ├── 20_0_0_20170104230051977.jpg\n        ├── 20_0_0_20170110232156775.jpg\n        ├── 20_0_0_20170117134213422.jpg\n        ├── 20_0_0_20170117140056058.jpg\n        ...\n```\nThe `config_predict_folder_demo.yaml` configuration file is parsed by the script to determine also the path to the pre-trained model file (`$base_path/$models_folder_name/$model_name`), and the path to the folder where outputs (predictions) will be stored (`$base_path/$outputs_folder_name` - by default, stored under `${input_folder_name}.csv`).\n\nA documented [Google Colab notebook implementing very similar operations](https://colab.research.google.com/drive/15fUoyFsMF7yo--oQ_MZJe51J_RE9BjwE?usp=sharing) is provided as part of the repository. The code can be easily adapted to suit the users need.\n\nHere follows a sample output from the `predict_folder_demo.py` script, executed on a machine equipped with a NVIDIA TITAN RTX, after setting up from scratch the `faceage-gpu` Conda environment:\n\n```\n(faceage-gpu) dennis@R2-D2:~/git/FaceAge/src/test$ python -W ignore predict_folder_demo.py\nPython version     :  3.6.13 |Anaconda, Inc.| (default, Jun  4 2021, 14:25:59)\nTensorFlow version :  2.6.2\nKeras version      :  2.6.0\nNumpy version      :  1.19.5\n\nPredicting FaceAge for 2547 subjects at: '/home/dennis/git/FaceAge/data/utk_hi-res_qa'\n\n(2547/2547) Running the face localization step for \"28_0_4_20170117202427928.jpg\"\"\n... Done in 2013.39 seconds.\n\n(2547/2547) Running the age estimation step for \"28_0_4_20170117202427928\"\"\n... Done in 95.157 seconds.\n\nSaving predictions at: '/home/dennis/git/FaceAge/outputs/utk_hi-res_qa_res.csv'... Done.\n```\n\nFinally, here follows the output of `predict_folder_demo.py` ran on the same machine, without exploiting the GPU (rather, using an AMD Ryzen 7 3800X 8-Core; the system is equipped with 64 GBs of RAM). Given the system specifications, the nature of the operations involved and the lightweight models, the performances during the inference phase are comparable:\n\n```\n(faceage-cpu) dennis@R2-D2:~/git/FaceAge/src/test$ python -W ignore predict_folder_demo.py\nPython version     :  3.6.13 |Anaconda, Inc.| (default, Jun  4 2021, 14:25:59)\nTensorFlow version :  2.6.2\nKeras version      :  2.6.0\nNumpy version      :  1.19.5\n\nPredicting FaceAge for 2547 subjects at: '/home/dennis/git/FaceAge/data/utk_hi-res_qa'\n\n(2547/2547) Running the face localization step for \"28_0_4_20170117202427928.jpg\"\"\n... Done in 1773.17 seconds.\n\n(2547/2547) Running the age estimation step for \"28_0_4_20170117202427928\"\"\n... Done in 108.407 seconds.\n\nSaving predictions at: '/home/dennis/git/FaceAge/outputs/utk_hi-res_qa_res.csv'... Done.\n```\n\n# Statistical Analysis Code\n\nThe `stats` folder contains all the code used to generate the plots from the Main Manuscript (`main`) and the Extended Data (`extended`).\n\nStatistical and survival analyses were performed in Python 3.6.5 using the [Lifelines library](https://github.com/CamDavidsonPilon/lifelines), as well as NumPy and SciPy libraries, and in the open-source statistical software platform, R.\n\nThe clinical endpoint was overall survival (OS). Actuarial survival curves for stratification of risk groups by overall survival were plotted using the Kaplan Meier (KM) approach, right-censoring patients who did not have the event or were lost to follow-up. All hypothesis testing performed in the study was two-sided, and paired tests were implemented when evaluating model predictions against the performance of a comparator for the same data samples. Differences in KM survival curves between risk groups were assessed using the log-rank test. Univariable and multivariable analysis via the Cox proportional hazards (PH) model was carried out to adjust for the effect of clinical covariates such as gender, disease site, smoking status, performance status and treatment intent. ROC analysis for area-under-the-curve (AUC) and concordance index (C-index) were also performed to assess model performance, and to compare against human performance in predicting survival outcomes.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faim-harvard%2Ffaceage","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faim-harvard%2Ffaceage","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faim-harvard%2Ffaceage/lists"}