{"id":15716379,"url":"https://github.com/anilbas/3dmmasstn","last_synced_at":"2025-04-13T05:28:30.295Z","repository":{"id":68777595,"uuid":"101192221","full_name":"anilbas/3DMMasSTN","owner":"anilbas","description":"MatConvNet implementation for incorporating a 3D Morphable Model (3DMM) into a Spatial Transformer Network (STN)","archived":false,"fork":false,"pushed_at":"2018-04-15T13:53:28.000Z","size":2721,"stargazers_count":280,"open_issues_count":6,"forks_count":50,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-03-24T10:19:57.419Z","etag":null,"topics":["3d-morphable-models","3dmm","basel-face-model","computer-vision","convolutional-neural-networks","dagnn","deep-learning","deep-neural-networks","face","machine-learning","matconvnet","matlab","siamese-network","spatial-transformer-network","stn","vgg-face-matconvnet"],"latest_commit_sha":null,"homepage":"","language":"Matlab","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anilbas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-23T14:52:29.000Z","updated_at":"2025-01-24T05:28:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"dce4f068-2476-48df-86ee-5fb2d9aa412b","html_url":"https://github.com/anilbas/3DMMasSTN","commit_stats":{"total_commits":22,"total_committers":2,"mean_commits":11.0,"dds":"0.045454545454545414","last_synced_commit":"c6562b5fda5c2f742a27dc1b4a7ff15ec5e83837"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anilbas%2F3DMMasSTN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anilbas%2F3DMMasSTN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anilbas%2F3DMMasSTN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anilbas%2F3DMMasSTN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anilbas","download_url":"https://codeload.github.com/anilbas/3DMMasSTN/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248669012,"owners_count":21142786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-morphable-models","3dmm","basel-face-model","computer-vision","convolutional-neural-networks","dagnn","deep-learning","deep-neural-networks","face","machine-learning","matconvnet","matlab","siamese-network","spatial-transformer-network","stn","vgg-face-matconvnet"],"created_at":"2024-10-03T21:45:18.993Z","updated_at":"2025-04-13T05:28:30.264Z","avatar_url":"https://github.com/anilbas.png","language":"Matlab","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 3D Morphable Models as Spatial Transformer Networks\n\n#### Update: A simple gradient descent method is added to show how the layers work. Please see the [demo.m](https://github.com/anilbas/3DMMasSTN/blob/master/demo.m).\n\nThis page shows how to use a 3D morphable model as a spatial transformer within a convolutional neural network (CNN). It is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The network (specifically, the localiser part of the network) learns to fit a 3D morphable model to a single 2D image without needing labelled examples of fitted models.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/elon_musk_34.jpg\" \n       alt=\"Elon Musk (34)\" title=\"Elon Musk (34)\" width=\"19.4%\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/christian_bale_51.jpg\" \n       alt=\"Christian Bale (51)\" title=\"Christian Bale (51)\" width=\"19.4%\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/elisha_cuthbert_53.jpg\" \n       alt=\"Elisha Cuthbert (53)\" title=\"Elisha Cuthbert (53)\" width=\"19.4%\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/clint_eastwood_62.jpg\" \n       alt=\"Clint Eastwood (62)\" title=\"Clint Eastwood (62)\" width=\"19.4%\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/emma_watson_73.jpg\" \n       alt=\"Emma Watson (73)\" title=\"Emma Watson (73)\" width=\"19.4%\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/chuck_palahniuk_48.jpg\" \n       alt=\"Chuck Palahniuk (48)\" title=\"Chuck Palahniuk (48)\" width=\"19.4%\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/nelson_mandela_52.jpg\"\n       alt=\"Nelson Mandela (52)\" title=\"Nelson Mandela (52)\" width=\"19.4%\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/kim_jong-un_60.jpg\" \n       alt=\"Kim Jong-un (60)\" title=\"Kim Jong-un (60)\" width=\"19.4%\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/ben_affleck_66.jpg\"\n       alt=\"Ben Affleck (66)\" title=\"Ben Affleck (66)\" width=\"19.4%\"\u003e\n  \u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/average/courteney_cox_127.jpg\" \n       alt=\"Courteney Cox (127)\" title=\"Courteney Cox (127)\" width=\"19.4%\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\nA set of mean flattened images that are obtained by applying the 3DMM-STN to multiple images of the same person from the \u003ca href=\"http://www.umdfaces.io\"\u003eUMDFaces Dataset\u003c/a\u003e.\u003cbr\u003e\u003ci\u003e(Please hover over the image to see the subject's name and the number of images used for averaging)\u003c/i\u003e \n\u003c/p\u003e\n\nThe proposed architecture is based on a purely geometric approach in which only the shape component of a 3DMM is used to geometrically normalise an image. Our method can be trained in an unsupervised fashion, and thus does not depend on synthetic training data or the fitting results of an existing algorithm.\n\nIn contrast to all previous 3DMM fitting networks, the output of our 3DMM-STN is a 2D resampling of the original image which contains all of the high frequency, discriminating detail in a face rather than a model-based reconstruction which only captures the gross, low frequency aspects of appearance that can be explained by a 3DMM.\n\n## Citation\n\nPlease cite the [following paper](http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w17/Bas_3D_Morphable_Models_ICCV_2017_paper.pdf) ([DOI](http://dx.doi.org/10.1109/ICCVW.2017.110)) if you use this work in your research:\n\nA. Bas, P. Huber, W.A.P. Smith, M. Awais and J. Kittler. \"3D Morphable Models as Spatial Transformer Networks\". In Proc. ICCV Workshop on Geometry Meets Deep Learning, pp. 904-912, 2017.\n\n## Usage \u0026 Training\n\nWe train our network using the [MatConvNet](http://www.vlfeat.org/matconvnet/) library. Plese refer to the [installation page](http://www.vlfeat.org/matconvnet/install/) for the instructions.\n\nIn order to start the training, you need to create the resampled expression model first. To do that, you need (1) [Basel Face Model](http://faces.cs.unibas.ch/bfm), `01_MorphableModel.mat` and (2) [3DDFA Expression Model](http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3DDFA/Code/3DDFA.zip), `Model_Expression.mat`. You can set the paths accordingly and run the `prepareExpressionBFM` function in the prepareModel folder to build a resampled expression model.\n\nFinally, run the `dagnn_3dmmasstn.m` script to start the training.\n\n\u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/fig1.png\" alt=\"Overview of the 3DMM-STN\" width=\"50%\"\u003e\u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/fig2.png\" alt=\"The grid generator network within a 3DMM-STN\" width=\"50%\"\u003e\n\n#### Localiser Network\n\nThe localiser network is a CNN that takes an image as input and regresses the pose and shape parameters, theta (*θ* = **r**, **t**, *logs*, **α**). For our localiser network, we use the pre-trained [VGGFaces](http://www.robots.ox.ac.uk/~vgg/software/vgg_face/) architecture, delete the classification layer and add a new fully connected layer with 6 + *D* outputs. The pre-trained models can be downloaded from MatConvNet [model repository](http://www.vlfeat.org/matconvnet/pretrained/).\n\n#### Grid Generator Network\nOur grid generator combines a linear statistical model with a scaled orthographic projection. We apply a 3D transformation and projection to a 3D mesh that comes from the morphable model. The intensities sampled from the source image are then assigned to the corresponding points in a flattened 2D grid.\n\n## UV texture space embedding for Basel Face Model\nThe output of our 3DMM-STN is a resampled image in a flattened 2D texture space in which the images are in dense, pixel-wise correspondence. In other words, the output grid is a texture space flattening of the 3DMM mesh. Specifically, we compute a Tutte embedding using conformal Laplacian weights and with the mesh boundary mapped to a square. To ensure a symmetric embedding we map the symmetry line to the symmetry line of the square, flatten only one side of the mesh and obtain the flattening of the other half by reflection. \n\nYou can find the UV coordinates as [BFM_UV.mat file](https://github.com/anilbas/3DMMasSTN/blob/master/util/BFM_UV.mat) in the util folder.\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/UV.png\" alt=\"The output grid visualisation using the mean texture\" width=\"25%\"\u003e\u003cimg src=\"https://github.com/anilbas/3DMMasSTN/blob/master/img/geometry.png\" alt=\"The mean shape as a geometry image\" width=\"25%\"\u003e\n\u003c/p\u003e\n\n## Customised Layers\n\nIn this section, we summarise our customised layers and loss functions. Please refer to the [paper](http://arxiv.org/abs/1708.07199) for more details.\n\n* **3D morphable model layer** generates a shape **X**, comprising *N* 3D vertices by taking a linear combination of principal components stored in the matrix and the mean shape, according to shape parameters **α**.\n* **Axis-angle to rotation matrix layer** converts an axis-angle representation of a rotation, **r**, into a rotation matrix **R**.\n* **3D rotation layer** takes as input a rotation matrix **R** and *N* 3D points **X**, and applies the rotation.\n* **Orthographic projection layer** takes as input a set of *N* 3D points **X'** and outputs *N* 2D points **Y** by applying an orthographic projection along the *z* axis.\n* **Scaling layers** scale the 2D points *Y* based on scale *s*, after the log scale *logs* transformed to scale *s*.\n* **Translation layer** generates the 2D sample points by adding a 2D translation **t** to each of the scaled points.\n* **Grid layer** takes as input 2x*N* points and produces 2x*H'W'* grid using re-sampled 3DMM which has *N=H'W'* vertices and each vertex *i*, has an associated UV coordinate. To understand how to compute the re-sampled model over [a uniform grid in the UV space](#uv-texture-space-embedding-for-basel-face-model), please refer to the `resampleModel` function and the sampling section of the paper.\n* **Bilinear sampler** is a layer that is exactly as in the original STN.\n* **Visibility (self-occlusions) layer** takes as input the rotation matrix **R** and the shape parameters **α** and outputs a binary occlusion mask **M**.\n* **Masking layer** combines the sampled image and the visibility map via pixel-wise products.\n\n#### Geometric Loss Functions\n\n* **Bilateral symmetry loss** measures asymmetry of the sampled face texture over visible pixels.\n* **Siamese multi-view fitting loss** penalises differences between multiple images of the same face in different poses.\n* **Landmark loss** minimises the Euclidean distance between observed and predicted 2D points.\n* **Statistical prior loss** minimises an appearance error, regularising the statistical shape prior (We scale the shape basis vectors such that the shape parameters follow a standard multivariate normal distribution).\n\n## Dependencies\n\n- **map_tddfa_to_basel.mat** file is supplied by James Booth. \n\n- **Basel Face Model** is freely available upon signing a license agreement via the [website](http://faces.cs.unibas.ch/bfm) of [Graphics and Vision Research Group, University of Basel](http://gravis.dmi.unibas.ch).\n\n- **The expression model** is using the correspondence to the Basel Model provided by [3DDFA](http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3DDFA/main.htm). The components originally come from [FaceWarehouse](http://gaps-zju.org/facewarehouse/).\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanilbas%2F3dmmasstn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanilbas%2F3dmmasstn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanilbas%2F3dmmasstn/lists"}