{"id":19569536,"url":"https://github.com/opentrack/neuralnet-tracker-traincode","last_synced_at":"2025-04-27T03:30:51.643Z","repository":{"id":74507916,"uuid":"366970737","full_name":"opentrack/neuralnet-tracker-traincode","owner":"opentrack","description":"Training of machine learning models for the tracker","archived":false,"fork":false,"pushed_at":"2024-12-07T09:45:17.000Z","size":76902,"stargazers_count":29,"open_issues_count":1,"forks_count":2,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-04T20:36:17.262Z","etag":null,"topics":["3d-face-alignment","deep-learning","deeplearning","face-alignment","facealignment","pose-estimation"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opentrack.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"license.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-13T07:33:39.000Z","updated_at":"2025-04-03T08:30:51.000Z","dependencies_parsed_at":"2024-09-09T20:41:17.869Z","dependency_job_id":"5503bf20-8691-4c63-9f4a-2567bee065ed","html_url":"https://github.com/opentrack/neuralnet-tracker-traincode","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opentrack%2Fneuralnet-tracker-traincode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opentrack%2Fneuralnet-tracker-traincode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opentrack%2Fneuralnet-tracker-traincode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opentrack%2Fneuralnet-tracker-traincode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opentrack","download_url":"https://codeload.github.com/opentrack/neuralnet-tracker-traincode/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251085147,"owners_count":21533821,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-face-alignment","deep-learning","deeplearning","face-alignment","facealignment","pose-estimation"],"created_at":"2024-11-11T06:10:26.470Z","updated_at":"2025-04-27T03:30:46.632Z","avatar_url":"https://github.com/opentrack.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"OpenTrack \"NeuralNet Tracker\" Training \u0026 Evaluation\n===================================================\n\n/ [**\"OpNet: On the power of data augmentation for head pose estimation\"**](https://arxiv.org/abs/2407.05357)\n\nIf you are looking for the code for the publication please note the [`paper` branch](https://github.com/opentrack/neuralnet-tracker-traincode/tree/paper),\nwhich is a special tailored snapshot for the publication.\n\nThis repository contains the code to train the neural nets for the  NeuralNet tracker plugin of [Opentrack](https://github.com/opentrack/opentrack). It allows head tracking with a simple webcam.\n\n\nOverview\n--------\n\nThe tracker plugin is based on deep learning, i.e. neural network models optimized using data to perform their tasks.\nThere are two parts: A localizer network, and the actual pose estimation network.\nThe localizer tries to find a single face and generates a bounding box around it from where a crop is extracted for the pose network to analyze.\n\nIn the following there are steps outlined to reproduce the networks\ndelivered with OpenTrack. This includes training and evaluation. However, the instructions are currently focussed on the pose estimator. At the end there is a section on the localizer.\n\n\nInstall\n-------\n\nSetup a Python environment with a recent PyTorch. Tested with Python 3.11\nand PyTorch 2.3.0. Using Python Anaconda:\n\n```bash\n# Create and activate python environment\nconda create -p \u003cpath\u003e python=3.11\nconda activate \u003cpath\u003e\n\n# Install dependencies\nconda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia\nconda install -c conda-forge numpy scipy opencv kornia matplotlib tqdm h5py onnx onnxruntime strenum tabulate\n\n# Install `trackertraincode` from this repo in developer mode (as symlink)\ncd \u003cthis repo\u003e\npip install -e .\n```\n\nThis should support training and eval. To generate the datasets you also need `pytorch-minimize facenet-pytorch scikit-learn trimesh pyrender`.\n\nSet the `DATADIR` variable at least.\n```bash\nexport DATADIR=\u003cpath to preprocessing outputs\u003e\nexport NUM_WORKERS=\u003cnumber of cpu cores\u003e # For the data loader\n```\n\nEvaluation\n----------\n\nDownload AFLW2000-3D from http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3DDFA/main.htm.\n\nBiwi can be obtained from Kaggle https://www.kaggle.com/datasets/kmader/biwi-kinect-head-pose-database. I couldn't find a better source that is still accessible.\n\nDownload a pytorch model checkpoint.\n\n* Baseline Ensemble: https://drive.google.com/file/d/19LrssD36COWzKDp7akxtJFeVTcltNVlR/view?usp=sharing\n* Additionally trained on Face Synthetics (BL+FS): https://drive.google.com/file/d/19zN8KICVEbLnGFGB5KkKuWrPjfet-jC8/view?usp=sharing\n* Labeling Ensemble (RA-300W-LP from Table 3): https://drive.google.com/file/d/13LSi6J4zWSJnEzEXwZxr5UkWndFXdjcb/view?usp=sharing\n\n### Option 1 (AFLW2000 3D)\n\nRun `scripts/AFLW20003dEvaluation.ipynb`\nIt should give results pretty close to the paper. The face crop selection is different though and so the result won't be exactly the same.\n\n### Option 2\n\nRun the preprocessing and then the evaluation script.\n\n```bash\n# Preprocess the data. The output filename \"aflw2k.h5\" must match the hardcoded value in \"pipelines.py\"\npython scripts/dsaflw2k_processing.py \u003cpath to\u003e/AFLW2000-3D.zip $DATADIR/aflw2k.h5`\n\n# Will look in $DATADIR for aflw2k.h5.\npython scripts/evaluate_pose_network.py --ds aflw2k3d \u003cpath to model(.onnx|.ckpt)\u003e\n```\n\nIt supports ONNX conversions as well as PyTorch checkpoints. For PyTorch the script must be adapted to the concrete model configuration for the checkpoint. If you wish to process the outputs further, like for averaging like in the paper, there is an option to generate json files.\n\nEvaluation on the Biwi benchmark works similarly. However, we use the annotations file from https://github.com/pcr-upm/opal23_headpose in order to adhere to the experimental protocol. It can be found under https://github.com/pcr-upm/opal23_headpose/blob/main/annotations/biwi_ann.txt.\n```bash\n# Preprocess the data.\npython --opal-annotation \u003cpath to\u003e/biwi_ann.txt scripts/dsprocess_biwi.py \u003cpath to\u003e/biwi.zip $DATADIR/biwi-v3.h5\n\n# Will look in $DATADIR for biwi-v3.h5.\npython scripts/evaluate_pose_network.py --ds biwi --roi-expansion 0.8 --perspective-correction \u003cpath to model(.onnx|.ckpt)\u003e\n```\nYou want the `--perspective-correction` for SOTA results. It enables that the orientation obtained from the face crop is corrected for camera perspective since with the Kinect's field of view, the assumption of orthographic projection no longer holds true. I.e. the pose from the crop is transformed into the global coordinate frame. W.r.t this frame it is compared with the original labels. Without the correction, the pose from the crop is taken directly for comparison with the labels.\nSetting `--roi-expansion 0.8` causes the cropped area to be smaller relative to the bounding box annotation. That is also necessary for good results because the annotations have much larger bounding boxes than the networks were trained with.\n\n\nIntegration in OpenTrack\n------------------------\n\nChoose the \"Neuralnet\" tracker plugin. It currently comes with some older models which don't\nachieve the same SOTA benchmark results but are a little bit more noise resistent and invariant\nto eye movements.\n\nTraining\n--------\n\nRough guidelines for reproduction follow.\n\n### Datasets\n\n#### 300W-LP \u0026 AFLW2000-3d\n\nThere should be download links for `300W-LP.zip` and `AFLW2000-3D.zip` on http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3DDFA/main.htm.\n\n#### 300W-LP Reproduction\nMy version of 300W-LP with custom out-of-plane rotation augmentation applied.\nIncludes \"closed-eyes\" augmentation as well as directional illumination.\nOn Google Drive https://drive.google.com/file/d/1uEqba5JCGQMzrULnPHxf4EJa04z_yHWw/view?usp=drive_link.\n\n#### LaPa Megaface 3D Labeled \"Large Pose\" Extension\nMy pseudo / semi-automatically labeled subset of the Megaface frames from LaPa.\nOn Google Drive https://drive.google.com/file/d/1K4CQ8QqAVXj3Cd-yUt3HU9Z8o8gDmSEV/view?usp=drive_link.\n\n####  WFLW 3D Labeled \"Large Pose\" Extension\nMy pseudo / semi-automatically labeled subset.\nOn Google Drive https://drive.google.com/file/d/1SY33foUF8oZP8RUsFmcEIjq5xF5m3oJ1/view?usp=drive_link.\n\n#### Face Synthetics\nThere should be a download link on https://github.com/microsoft/FaceSynthetics for the 100k samples variant `dataset_100000.zip`.\n\n### Preprocessing\n\n```bash\npython scripts/dsprocess_aflw2k.py AFLW2000-3D.zip $DATADIR/aflw2k.h5\n\n# Optional, for training on the original 300W-LP:\npython scripts/dsprocess_300wlp.py --reconstruct-head-bbox 300W-LP.zip $DATADIR/300wlp.h5\n\n# Face Synthetics \npython scripts/dsprocess_synface.py dataset_100000.zip $DATADIR/microsoft_synface_100000-v1.1.h5\n\n# Custom datasets\nunzip lapa-megaface-augmented-v2.zip -d ../$DATADIR/\nunzip wflw_augmented_v4.zip -d ../$DATADIR/\nunzip reproduction_300wlp-v12.zip -d ../$DATADIR/\n```\n\nThe processed files can be inspected in the notebook `DataVisualization.ipynb`.\n\n### Training Process\n\nNow training should be possible. For the baseline it should be:\n```bash\npython scripts/train_poseestimator.py --lr 1.e-3 --epochs 1500 --ds \"repro_300_wlp+lapa_megaface_lp:20000+wflw_lp\" \\\n    --save-plot train.pdf \\\n    --with-swa \\\n    --with-nll-loss \\\n    --backbone mobilenetv1 \\\n    --no-onnx \\\n    --roi-override original \\\n    --no-blurpool \\\n    --outdir \u003coutput folder\u003e\n```\n\nIt will look at the environment variable `DATADIR` to find the datasets. Notable flags and settings are the following:\n\n```bash\n--with-nll-loss # Enables NLL losses\n--backbone resnet18 # Use ResNet backbone\n--no-blurpool # Disable use of Blur Pool instead of learnable strided conv.\n--no-imgaug # Disable image intensity augmentation\n--no-pointhead # Disable landmark predictions\n--raug \u003cvalue in degree\u003e # Set the maximum angle of in-plane rotation augmentation. Zero disables it.\n\n\n--ds \"300wlp\" # Train on original 300W-LP\n--ds \"300wlp:92+synface:8\" # Train on Face Synthetics + 300W-LP\n--ds \"repro_300_wlp_woextra\" # Train on the 300W-LP reproduction without the eye + illumination variations. (Needs unpublished dataset :-/)\n--ds \"repro_300_wlp\" # Train only on the 300W-LP reproduction\n--ds \"repro_300_wlp+lapa_megaface_lp+wflw_lp+synface\" # Train the \"BL + FS\" case which should give best performing models.\n```\n### Deployment\n\nI use ONNX for deployment and most evaluation purposes. There is a script for conversion. WARNING: it is necessary to adapt its code to the model configuration. :-/ It is easy though. Only one statement where the model is instantiated needs to be changed.  The script has two modes. For exports for OpenTrack use\n```bash\npython scripts/export_model.py --posenet \u003cmodel.ckpt\u003e\n```\nIt omits the landmark predictions and renames the output tensors (for historical reasons). The script performs sanity checks to ensure the outputs from ONNX are almost equal to PyTorch outputs.\nTo use the model in OpenTrack, find the directory with the other `.onnx` models and copy the new one there. Then in OpenTrack, in the tracker settings, there is a button to select the model file.\n\nFor evaluation use\n```\npython scripts/export_model.py --full --posenet \u003cmodel.ckpt\u003e\n```\nThe model created in this way includes all outputs.\n\nCreation of 3D Labeled WFLW \u0026 LaPa Large Pose Expansions\n--------------------------------------------------------\n\n* Preprocess LaPa and Megaface by scripts in `scripts/`.\n* Download pseudo labeling ensemble.\n* Generate pseudo labels\n* Find the github repository face-3d-rotation-augmentation. Install the package in it with pip.\n* Use the notebooks (in this repo) `scripts/DsLapaMegafaceFitFaceModel.ipynb`, `scripts/DsLapaMegafaceLargePoseCreation.ipynb`, `scripts/DsWflwFitFaceModel.ipynb` and `scripts/DsWflwLargePoseCreation.ipynb`.\n\n\nMiscellaneous\n-------------\n\n### Coordinate System\n\nIt's a right handed system. Seen from the front, X is right, Y is down and Z is into the screen.\nThis coordinate system is used for world space and screen space. Also as local coordinate system\nof the head, albeit the directions as described apply of course only at zero rotation.\n\n### File format\n\nLabels are stored in a HDF5 format. Input images maybe separated or integrated in the same file. Here is a dump of a\nfile with image included, where N is the number of samples:\n\n```\n/coords shape=(N, 3), dtype=float32\n    ATTRIBUTE category: \"xys\" (str)\n/images shape=(N,), dtype=object\n    ATTRIBUTE category: \"img\" (str)\n    ATTRIBUTE lossy: \"True\" (bool_)\n    ATTRIBUTE storage: \"varsize_image_buffer\" (str)\n/pt3d_68 shape=(N, 68, 3), dtype=float32\n    ATTRIBUTE category: \"pts\" (str)\n/quats shape=(N, 4), dtype=float32\n    ATTRIBUTE category: \"q\" (str)\n/rois shape=(N, 4), dtype=float32\n    ATTRIBUTE category: \"roi\" (str)\n/shapeparams shape=(N, 50), dtype=float16\n    ATTRIBUTE category: \"\" (str)\n```\n\nAs you can see the top level has several HDF5 Datasets (DS) with label data. `images` is the DS with the images.\nThe DS have attributes with metadata. There is the `category` which implies the kind of information stored in the DS.\nThe `image` DS has a `storage` attribute which tells if it the images stored inline or externally. `varsize_image_buffer`\nmeans that the data type is a variabled sized buffer which holds the image. When `lossy` is true then the images are\nencoded as JPG, else as PNG. When `storage` is set to `image_filename` then the DS contains relative paths to external\nfiles. The other label fields are label data and should be relatively self-explanatory.\n\nRelevant code for reading and writing those files can be found in `trackertraincode/datasets/dshdf5.py`, \n`trackertraincode/datasets/dshdf5pose.py` and the preprocessing scripts `scripts/dsprocess_*.py`.\n\nLocalizer Network\n-----------------\n\nThere is an old notebook to train this network.\n\nThe training data is a processed version of the Wider Face dataset. The processing accounts for the fact that Wider Face contains images with potentially many faces. Therefore, sections which contain only one face or none are extracted.\n\nThe localizer network is trained to generate a \"heatmap\" with a peak where it suspects the center of a face. In addition, parameters of a bounding box are outputted.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopentrack%2Fneuralnet-tracker-traincode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopentrack%2Fneuralnet-tracker-traincode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopentrack%2Fneuralnet-tracker-traincode/lists"}