{"id":49451712,"url":"https://github.com/universome/stylegan-v","last_synced_at":"2026-05-16T15:00:43.776Z","repository":{"id":37750359,"uuid":"442907669","full_name":"universome/stylegan-v","owner":"universome","description":"[CVPR 2022] StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2","archived":false,"fork":false,"pushed_at":"2023-04-19T12:28:11.000Z","size":85,"stargazers_count":282,"open_issues_count":19,"forks_count":32,"subscribers_count":21,"default_branch":"master","last_synced_at":"2023-11-07T15:13:14.409Z","etag":null,"topics":["gans","generative-adversarial-networks","pytorch","stylegan","video-generation"],"latest_commit_sha":null,"homepage":"https://universome.github.io/stylegan-v","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/universome.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-29T22:43:09.000Z","updated_at":"2023-11-02T13:02:39.000Z","dependencies_parsed_at":"2022-07-12T16:44:49.098Z","dependency_job_id":null,"html_url":"https://github.com/universome/stylegan-v","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/universome/stylegan-v","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/universome%2Fstylegan-v","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/universome%2Fstylegan-v/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/universome%2Fstylegan-v/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/universome%2Fstylegan-v/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/universome","download_url":"https://codeload.github.com/universome/stylegan-v/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/universome%2Fstylegan-v/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33107564,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-16T04:41:52.686Z","status":"ssl_error","status_checked_at":"2026-05-16T04:41:52.009Z","response_time":115,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gans","generative-adversarial-networks","pytorch","stylegan","video-generation"],"created_at":"2026-04-30T03:00:32.664Z","updated_at":"2026-05-16T15:00:43.770Z","avatar_url":"https://github.com/universome.png","language":"Python","funding_links":[],"categories":["Video \u0026 Animation"],"sub_categories":[],"readme":"# StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2\n### [CVPR 2022] Official pytorch implementation\n[[Project website]](https://universome.github.io/stylegan-v)\n[[Paper]](https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v-paper.pdf)\n[[Casual GAN papers summary]](https://www.casualganpapers.com/text_guided_video_editing_hd_video_generation/StyleGAN-V-explained.html?query=stylegan-v)\n\n\u003cdiv style=\"text-align:center\"\u003e\n\u003cimg src=\"https://user-images.githubusercontent.com/3128824/161441271-09fa5cfe-a2ae-4a7f-b5ca-ad90f5e0287e.gif\" alt=\"Content/Motion decomposition for Face Forensics 256x256\"/\u003e\n\u003c/div\u003e\n\n\u003cdiv style=\"text-align:center\"\u003e\n\u003cimg src=\"https://user-images.githubusercontent.com/3128824/161441278-c7c3a43d-a3cd-417b-98c5-6b889ac32935.gif\" alt=\"Content/Motion decomposition for Sky Timelapse 256x256\"/\u003e\n\u003c/div\u003e\n\nCode release TODO:\n- [x] Installation guide\n- [x] Training code\n- [x] Data preprocessing scripts\n- [ ] CLIP editing scripts (50% done)\n- [ ] Jupyter notebook demos\n- [x] [Pre-trained checkpoints](https://disk.yandex.ru/d/v7MS7zu4mmZxXw)\n\n## Installation\nTo install and activate the environment, run the following command:\n```\nconda env create -f environment.yaml -p env\nconda activate ./env\n```\nFor clip editing, you will need to install [StyleCLIP](https://github.com/orpatashnik/StyleCLIP) and `clip`.\nThis repo is built on top of [INR-GAN](https://github.com/universome/inr-gan), so make sure that it runs on your system.\n\nIf you have Ampere GPUs (A6000, A100 or RTX-3090), then use `environment-ampere.yaml` instead because it is based CUDA 11 and newer pytorch versions.\n\n## System requirements\n\nOur codebase uses the same system requirements as StyleGAN2-ADA: see them [here](https://github.com/NVlabs/stylegan2-ada-pytorch#requirements).\nWe trained all the 256x256 models on 4 V100s with 32 GB each for ~2 days.\nIt is very similar in training time to [StyleGAN2-ADA](https://github.com/NVlabs/stylegan2-ada-pytorch) (even a bit faster).\n\n## Training\n### Dataset structure\nThe dataset should be either a `.zip` archive (the default setting) or a directory structured as:\n```\ndataset/\n    video1/\n        - frame1.jpg\n        - frame2.jpg\n        - ...\n    video2/\n        - frame1.jpg\n        - frame2.jpg\n        - ...\n    ...\n```\nWe use such frame-wise structure because it makes loading faster for sparse training.\n\nBy default, we assume that the data is packed into a `.zip` archive since such representation is useful to avoid additional overhead when copying data between machines on a cluster.\nYou can also train from a directory: for this, just remove the `.zip` suffix from the `dataset.path` property in `configs/dataset/base.yaml`.\n\nIf  you want to train on a custom dataset, then create a config for it here `configs/dataset/my_dataset_config_name.yaml` (see `configs/dataset/ffs.yaml` as an example).\nThe `fps` parameter is needed for visualizations purposes only, videos typically have the value of 25 or 30 FPS.\n\n### Training StyleGAN-V\nTo train on FaceForensics 256x256, run:\n```\npython src/infra/launch.py hydra.run.dir=. exp_suffix=my_experiment_name env=local dataset=ffs dataset.resolution=256 num_gpus=4\n```\n\nTo train on SkyTimelapse 256x256, run:\n```\npython src/infra/launch.py hydra.run.dir=. exp_suffix=my_experiment_name env=local dataset=sky_timelapse dataset.resolution=256 num_gpus=4 model.generator.time_enc.min_period_len=256\n```\nFor SkyTimelapse 256x256, we increased the period length for the motion time encoder since the motions in this dataset are much slower/smoother, than in FaceForensics.\nIn practice, this parameter (and its accompanying `model.generator.motion.motion_z_distance`) influences the motion quality (but not the image quality!) the most.\n\nIf you do not want `hydra` to create some log directories (typically, you don't), add the following arguments: `hydra.output_subdir=null hydra/job_logging=disabled hydra/hydra_logging=disabled`.\n\nIn case [slurm](https://slurm.schedmd.com/documentation.html) is installed on your system, you can submit the slurm job with the above training by adding `slurm=true` parameter.\nSbatch arguments are specified in `configs/infra.yaml`, you can update them with your required ones.\nAlso note that you can create your own environment in `configs/env`.\n\nOn older GPUs (non V100 and newer), custom CUDA kernels (bias_act and upfirdn2n) might fail to compile. The following two lines can help:\n```\nexport TORCH_CUDA_ARCH_LIST=\"7.0\"\nexport TORCH_EXTENSIONS_DIR=/tmp/torch_extensions\n```\n\n### Resume training\nIf you shut down your experiment at some point and would love to fully recover training (i.e., with the optimizer parameters, logging, etc.), the add `training.resume=latest` argument to your launch script, e.g.:\n```\npython src/infra/launch.py hydra.run.dir=. exp_suffix=my_experiment_name env=local dataset=ffs dataset.resolution=256 num_gpus=4 training.resume=latest\n```\nIt will locate the given experiment directory (note that the git hash and the `exp_suffix` must be the same) and resume the training from it.\n\n### Inference\nTo sample from the model, launch the following command:\n```\npython src/scripts/generate.py --network_pkl /path/to/network-snapshot.pkl --num_videos 25 --as_grids true --save_as_mp4 true --fps 25 --video_len 128 --batch_size 25 --outdir /path/to/output/dir --truncation_psi 0.9\n```\nThis will sample 25 videos of 25 FPS as a 5x5 grid with the truncation factor of 0.9.\nEach video consists of 128 frames.\nAdjust the corresponding arguments to change the settings.\n\nAlternatively, instead of specifying `--network_pkl`, you can specify `--networks_dir`, which should lead to a directory containing the checkpoints and the `metric-fvd2048_16f.json` metrics json file (it is generated automatically during training).\nIt will then select the best checkpoint based on the metrics, which so not to search for the best checkpoint of an experiment manually.\n\nTo sample content/motion decomposition grids, use `--moco_decomposition 1` by running the following command:\n```\npython src/scripts/generate.py --networks_dir PATH_TO_EXPERIMENT/output --num_videos 25 --as_grids true --save_as_mp4 true --fps 25 --video_len 128 --batch_size 25 --outdir tmp --truncation_psi 0.8 --moco_decomposition 1\n```\n\n### Training MoCoGAN + SG2 backbone\nTo train the `MoCoGAN+SG2` model, just use the `mocogan.yaml` model config with the uniform sampling:\n```\npython src/infra/launch.py hydra.run.dir=. +exp_suffix=my_experiment env=local dataset=sky_timelapse dataset.resolution=256 num_gpus=4 model=mocogan sampling=uniform sampling.max_dist_between_frames=1\n```\n\n### Training other baselines\nTo train other baselines, used in the paper, we used their original implementations:\n- [MoCoGAN](https://github.com/sergeytulyakov/mocogan)\n- [MoCoGAN-HD](https://github.com/snap-research/MoCoGAN-HD)\n- [DIGAN](https://github.com/sihyun-yu/digan)\n- [VideoGPT](https://github.com/wilson1yan/VideoGPT)\n\n## Data\nDatasets can be downloaded here:\n- SkyTimelapse: https://github.com/weixiong-ur/mdgan\n- UCF: https://www.crcv.ucf.edu/data/UCF101.php\n- FaceForensics: https://github.com/ondyari/FaceForensics\n- RainbowJelly: https://www.youtube.com/watch?v=P8Bit37hlsQ\n- MEAD: https://wywu.github.io/projects/MEAD/MEAD.html\n\nWe resize all the datasets to the 256x256 resolution (except for MEAD, which we resize to 1024x1024).\nFFS was preprocessed with `src/scripts/preprocess_ffs.py` to extract face crops.\nFor MEAD, we used only the front views.\n\nFor `RainbowJelly`, download the youtube video, save it as `rainbow_jelly.mp4` and convert into the dataset by running:\n```\npython src/scripts/convert_video_to_dataset.py -s /path/to/rainbow_jelly.mp4 -t /path/to/desired/directory --target_size 256 -sf 150 -cs 512\n```\n\n## Evaluation\nIn this repo, we re-implemented two popular evaluation measures for video generation:\n- [Frechet Video Distance](https://arxiv.org/abs/1812.01717). For this, we re-implemented *perfectly* (up to numerical precision) the original Tensorflow version of the I3D model trained on Kinetics-400 and converted it to TorchScript. This is a precise implementation of the official one and we set up [this comparison repo](https://github.com/universome/fvd-comparison) to demonstrate this.\n- [Inception Score](https://arxiv.org/abs/1611.06624) (used only for UCF101). For this, we re-implemented *perfectly* (up to numerical precision) the original [Chainer version of the UCF101-finetuned C3D model](https://github.com/pfnet-research/tgan2) in Pytorch and converted it to TorchScript.\n\nIn practice, we found that neither Frechet Video Distance nor Inception Score work well reliably for catching motion artifacts.\nThis creates the need for better evaluation measures.\n\nAdvantages of our metrics implementation compared to the original ones:\n- It is much faster due to TorchScript and parallelization across several GPUs\n- It can be launched both on top of both a generator checkpoint and off-the-shelf samples\n- It is implemented in a very recent Pytorch version (v1.9.0) instead of deprecated TensorFlow 1.14 or Chainer 6.0\n- It is directly incorporated into training to track progress online without the need to launch the evaluation separately\n- For FVD, our implementation is *complete*, while the original one provides evaluation for a single batch of already processed videos only\n- Our FVD implementation supports different subsampling strategies and a variable number of frames in a video.\n\nTo compute FVD between two datasets, run the following command:\n```\npython src/scripts/calc_metrics_for_dataset.py --real_data_path /path/to/dataset_a.zip --fake_data_path /path/to/dataset_b.zip --mirror 1 --gpus 4 --resolution 256 --metrics fvd2048_16f,fvd2048_128f,fvd2048_128f_subsample8f,fid50k_full --verbose 0 --use_cache 0\n```\n\nTo compute FVD for a trained model, run `src/scripts/calc_metrics.py` instead.\n\nBoth datasets should be in the format specified above.\nThey can be either zip archives or normal directories.\nThis will compute several metrics:\n- `fid50k_full` - Frechet Inception Distance\n- `fvd2048_16f` — Frechet Video Distance with 16 frames\n- `fvd2048_128f` - Frechet Video Distance with 128 frames\n- `fvd2048_128f_subsample8f` — Frechet Video Distance with 16 frames, but sampled with a 8-frames interval\n\n*Note*. If you face any trouble running the above evaluation scripts — please do not hesitate contacting us!\n\n## Projection and CLIP editing\nThis section is still under construction.\nWe will update it shortly.\n\nThose two files provide projection and editing scripts:\n- `src/scripts/project.py`\n- `src/scripts/clip_edit.py`\n\n## Infrastructure and visualization\nYou will find some useful scripts for data processing and visualization in `src/scripts`\n\n## Troubleshooting\nMake sure that [INR-GAN](https://github.com/universome/inr-gan) and [StyleGAN2-ADA](https://github.com/nvlabs/stylegan2-ada) are runnable on your system.\nWe do not use any additional CUDA kernels or any exotic dependencies.\n\nIf this didn't help, than it's likely there is some dependency version mismatch.\nCheck the versions of your installed dependencies with:\n```\npip freeze\n```\nand compare them with the ones specified in `environment.yaml`/`environment-ampere.yaml`.\n\nIf this didn't help — open an issue, it's likely that the problem is on our side.\n\n## License\nThis repo is built on top of [INR-GAN](https://github.com/universome/inr-gan), which is likely to be restricted by the [NVidia license](https://nvlabs.github.io/stylegan2-ada-pytorch/license.html) since it's built on top of [StyleGAN2-ADA](https://github.com/nvlabs/stylegan2-ada).\nIf that's the case, then this repo is also restricted by it.\n\n\n## Bibtex\n```\n@misc{stylegan_v,\n    title={StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2},\n    author={Ivan Skorokhodov and Sergey Tulyakov and Mohamed Elhoseiny},\n    journal={arXiv preprint arXiv:2112.14683},\n    year={2021}\n}\n\n@inproceedings{digan,\n    title={Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks},\n    author={Sihyun Yu and Jihoon Tack and Sangwoo Mo and Hyunsu Kim and Junho Kim and Jung-Woo Ha and Jinwoo Shin},\n    booktitle={International Conference on Learning Representations},\n    year={2022},\n    url={https://openreview.net/forum?id=Czsdv-S4-w9}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funiversome%2Fstylegan-v","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funiversome%2Fstylegan-v","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funiversome%2Fstylegan-v/lists"}