{"id":26511573,"url":"https://luckyhzt.github.io/lvcd","last_synced_at":"2025-03-21T03:02:01.009Z","repository":{"id":259869876,"uuid":"864849999","full_name":"luckyhzt/LVCD","owner":"luckyhzt","description":"The official code of paper \"LVCD: Reference-based Lineart Video Colorization with Diffusion Models\"","archived":false,"fork":false,"pushed_at":"2025-01-06T04:39:52.000Z","size":20847,"stargazers_count":166,"open_issues_count":9,"forks_count":18,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-01-06T05:24:39.686Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luckyhzt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-29T10:33:49.000Z","updated_at":"2025-01-06T04:39:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"d49b18ce-13a5-4cbe-bd74-0cf43abdc3c1","html_url":"https://github.com/luckyhzt/LVCD","commit_stats":null,"previous_names":["luckyhzt/lvcd"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luckyhzt%2FLVCD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luckyhzt%2FLVCD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luckyhzt%2FLVCD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luckyhzt%2FLVCD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luckyhzt","download_url":"https://codeload.github.com/luckyhzt/LVCD/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244728214,"owners_count":20500023,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-21T03:00:34.664Z","updated_at":"2025-03-21T03:02:01.002Z","avatar_url":"https://github.com/luckyhzt.png","language":"Python","funding_links":[],"categories":["Colorization"],"sub_categories":[],"readme":"# *LVCD:* Reference-based Lineart Video Colorization with Diffusion Models\n\n## ACM Transactions on graphics \u0026 SIGGRAPH Asia 2024\n\n[Project page](https://luckyhzt.github.io/lvcd) | [arXiv](https://arxiv.org/abs/2409.12960)\n\nZhitong Huang $^1$, Mohan Zhang $^2$, [Jing Liao](https://scholars.cityu.edu.hk/en/persons/jing-liao(45757c38-f737-420d-8a7f-73b58d30c1fd).html) $^{1*}$\n\n\u003cfont size=\"1\"\u003e $^1$: City University of Hong Kong, Hong Kong SAR, China \u0026nbsp;\u0026nbsp; $^2$: WeChat, Tencent Inc., Shenzhen, China \u003c/font\u003e \\\n\u003cfont size=\"1\"\u003e $^*$: Corresponding author \u003c/font\u003e\n\n## Abstract:\nWe propose the first video diffusion framework for reference-based lineart video colorization. Unlike previous works that rely solely on image generative models to colorize lineart frame by frame, our approach leverages a large-scale pretrained video diffusion model to generate colorized animation videos. This approach leads to more temporally consistent results and is better equipped to handle large motions. Firstly, we introduce \u003cem\u003eSketch-guided ControlNet\u003c/em\u003e which provides additional control to finetune an image-to-video diffusion model for controllable video synthesis, enabling the generation of animation videos conditioned on lineart. We then propose \u003cem\u003eReference Attention\u003c/em\u003e to facilitate the transfer of colors from the reference frame to other frames containing fast and expansive motions. Finally, we present a novel scheme for sequential sampling, incorporating the \u003cem\u003eOverlapped Blending Module\u003c/em\u003e and \u003cem\u003ePrev-Reference Attention\u003c/em\u003e, to extend the video diffusion model beyond its original fixed-length limitation for long video colorization. Both qualitative and quantitative results demonstrate that our method significantly outperforms state-of-the-art techniques in terms of frame and video quality, as well as temporal consistency. Moreover, our method is capable of generating high-quality, long temporal-consistent animation videos with large motions, which is not achievable in previous works.\n\n\n\n\n\n# Installation\n\n```shell\nconda create -n lvcd python=3.10.0\nconda activate lvcd\npip3 install -r requirements/pt2.txt\n```\n\n# Download pretrained models\n1. Download the pretrained [SVD weights](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/resolve/main/svd.safetensors) and put it as `./checkpoints/svd.safetensors`\n2. Download the finetuned weights for [Sketch-guided ControlNet](https://huggingface.co/luckyhzt/lvcd_pretrained_models/resolve/main/lvcd.ckpt) and put is as `./checkpoints/lvcd.ckpt`\n\n# Inference\nAll the code for inference is placed under `./inference/`, where the jupyter notebook `sample.ipynb` demonstrates how to sample the videos. Two testing clips are also provided.\n\n# Training\n## Dataset preparation\nDownload the training set from [here](https://huggingface.co/datasets/luckyhzt/Animation_video) including the `.zip`, `.z01` to `.z07`, and `train_clips_hist.json` files.\n\nUnzip the zip files and put the json file under the root directory of the dataset as `.../Animation_video/train_clips_hist.json`.\n\nRun `data_preprocess/encode_latents.py` to encode all frames into VAE-encoded latents. The script is written in multi-process program and you can change the variable `devices` to enable multi-GPU.\n\nAfter encoding the latents, `.pt` files will be stored in the dataset directory.\n\n## Run training\nThen you can train the model with:\n```\npython main.py --train --base configs/lvcd.yaml\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/luckyhzt.github.io%2Flvcd","html_url":"https://awesome.ecosyste.ms/projects/luckyhzt.github.io%2Flvcd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/luckyhzt.github.io%2Flvcd/lists"}