Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shrubb/latent-pose-reenactment
The authors' implementation of the "Neural Head Reenactment with Latent Pose Descriptors" (CVPR 2020) paper.
https://github.com/shrubb/latent-pose-reenactment
avatar deep-learning face-reenactment facial-landmarks generative-model head-avatar head-reenactment landmark-detection pose-estimation pytorch self-supervised-learning talking-head voxceleb voxceleb2
Last synced: 3 days ago
JSON representation
The authors' implementation of the "Neural Head Reenactment with Latent Pose Descriptors" (CVPR 2020) paper.
- Host: GitHub
- URL: https://github.com/shrubb/latent-pose-reenactment
- Owner: shrubb
- License: apache-2.0
- Created: 2020-10-09T13:01:44.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-03-14T10:56:39.000Z (almost 2 years ago)
- Last Synced: 2024-12-27T14:39:00.688Z (19 days ago)
- Topics: avatar, deep-learning, face-reenactment, facial-landmarks, generative-model, head-avatar, head-reenactment, landmark-detection, pose-estimation, pytorch, self-supervised-learning, talking-head, voxceleb, voxceleb2
- Language: Python
- Homepage: https://shrubb.github.io/research/latent-pose-reenactment/
- Size: 1.59 MB
- Stars: 181
- Watchers: 5
- Forks: 34
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-ai-talking-heads - Neural Head Reenactment with Latent Pose Descriptors
README
# Neural Head Reenactment with Latent Pose Descriptors
![](https://user-images.githubusercontent.com/9570420/94962966-0a8bb900-0500-11eb-90ee-3315368019b8.png)
Burkov, E., Pasechnik, I., Grigorev, A., & Lempitsky V. (2020, June). **Neural Head Reenactment with Latent Pose Descriptors**. *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*.
See the [project page](https://shrubb.github.io/research/latent-pose-reenactment/) for an overview.
## Prerequisites
For fine-tuning a pre-trained model, you'll need an NVIDIA GPU, preferably with 8+ GB VRAM. To train from scratch, we recommend a total of 40+ GB VRAM.
Set up your environment as described [here](INSTALL.md).
## Running the pretrained model
* Collect images of the person to reenact.
* Run [`utils/preprocess_dataset.sh`](utils/preprocess_dataset.sh) to preprocess them. Read inside for instructions.
* Download the meta-model [checkpoint](https://drive.google.com/file/d/14-FYaz6YhTX5M_P3-rm2ITcxGljmWl-F/view?usp=share_link).
* Run the below to fine-tune the meta-model to your person, first setting the top variables. If you want, also launch a TensorBoard at `"$OUTPUT_PATH"` to view progress, preferably with the [`--samples_per_plugin "scalars=1000,images=100"`](https://stackoverflow.com/questions/57669234/how-to-display-more-than-10-images-in-tensorboard) option; mainly check the "images" tab to find out at which iteration the identity gap becomes small enough.```bash
# in this example, your images should be "$DATASET_ROOT/images-cropped/$IDENTITY_NAME/*.jpg"
DATASET_ROOT="/where/is/your/data"
IDENTITY_NAME="identity/name"
MAX_BATCH_SIZE=8 # pick the largest possible, start with 8 and decrease until it fits in VRAM
CHECKPOINT_PATH="/where/is/checkpoint.pth"
OUTPUT_PATH="outputs/" # a directory for outputs, will be created
RUN_NAME="tony_hawk_take_1" # give your run a name if you want# Important. See the note below
TARGET_NUM_ITERATIONS=230# Don't change these
NUM_IMAGES=`ls -1 "$DATASET_ROOT/images-cropped/$IDENTITY_NAME" | wc -l`
BATCH_SIZE=$((NUM_IMAGES30`**. But your concrete case may be different. If you have a lot of disk space, pass a flag to save checkpoints every so often (e.g. `--save_frequency 4` will save a checkpoint every `4 * NUM_IMAGES` iterations), then drive (see below how) each of them and thus find the iteration where the best tradeoff happens for your avatar.* Take your driving video and crop it with `python3 utils/crop_as_in_dataset.py`. Run with `--help` to learn how. Or, equivalently, just reuse [`utils/preprocess_dataset.sh`](utils/preprocess_dataset.sh) with `COMPUTE_SEGMENTATION=false`.
* Organize the cropped images from the previous step as `"/images-cropped//*.jpg"`.
* Use them to drive your fine-tuned model (the checkpoint is at `"$OUTPUT_PATH/$RUN_NAME/checkpoints"`) with `python3 drive.py`. Run with `--help` to learn how.## Training (meta-learning) your own model
You'll need a training configuration (aka config) file. Start with `"configs/default.yaml"` or just edit that. These files specify various training options which you can find in code as `argparse` parameters. Any of these options can be specified both in the config file and on the command line (e.g. `--batch_size=7`), and are resolved as follows (any source here overrides all the preceding ones):
* `argparse` defaults — these are specified in the code directly;
* those saved in a loaded checkpoint (if starting from a checkpoint);
* your `--config` file;
* command line.The command is
```bash
python3 train.py --config=config_name [any extra arguments ...]
```Or, with multiple GPUs,
```bash
python3 -um torch.distributed.launch --nproc_per_node= train.py --config=config_name [any extra arguments ...]
```## Reference
Consider citing us if you use the code:
```bibtex
@InProceedings{Burkov_2020_CVPR,
author = {Burkov, Egor and Pasechnik, Igor and Grigorev, Artur and Lempitsky, Victor},
title = {Neural Head Reenactment with Latent Pose Descriptors},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
```