{"id":16162415,"url":"https://github.com/sunsided/vae-style-transfer","last_synced_at":"2025-10-24T03:32:13.343Z","repository":{"id":141993015,"uuid":"81339271","full_name":"sunsided/vae-style-transfer","owner":"sunsided","description":"An experiment in VAE-based artistic style transfer by embedding fiddling.","archived":true,"fork":false,"pushed_at":"2019-02-05T22:29:56.000Z","size":653,"stargazers_count":36,"open_issues_count":1,"forks_count":4,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-10-27T20:31:26.424Z","etag":null,"topics":["artificial-intelligence","autoencoder","cadl","deep-learning","experiment","generative-adversarial-network","generative-art","image-processing","kadenze","neural-network","online-course","tensorflow","vae","vaegan","variational-inference"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sunsided.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-02-08T14:39:47.000Z","updated_at":"2024-05-20T15:36:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"391de8f4-9449-435d-89af-2f464f0febc1","html_url":"https://github.com/sunsided/vae-style-transfer","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Fvae-style-transfer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Fvae-style-transfer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Fvae-style-transfer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Fvae-style-transfer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sunsided","download_url":"https://codeload.github.com/sunsided/vae-style-transfer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237910078,"owners_count":19385830,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","autoencoder","cadl","deep-learning","experiment","generative-adversarial-network","generative-art","image-processing","kadenze","neural-network","online-course","tensorflow","vae","vaegan","variational-inference"],"created_at":"2024-10-10T02:30:04.306Z","updated_at":"2025-10-24T03:32:13.000Z","avatar_url":"https://github.com/sunsided.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Convolutional VAE Style Transfer\n\nThe project was created as part of the [Creative Applications of Deep Learning with TensorFlow](kadenze.com/courses/creative-applications-of-deep-learning-with-tensorflow-iv)\n(CADL) Kadenze course's final assignment. It is an experimental attempt to transfer artistic style\nlearned from a series of paintings \"live\" onto a video sequence by fitting\na variational autoencoder with 512 codes to both paintings and video\nframes, isolating the mean feature-space embeddings and modifying the\nvideo's embeddings to be closer to those of the paintings.\n\nBecause the general visual quality of the VAE's decoded output is relatively low,\na convolutional post-processing network based on residual convolutions was trained \nwith the purpose of making the resulting image less similar\nto the VAE's generated output and more similar to the original input images.\nThe basic idea was to have an upsampling network here, but it quickly turned out\nto be a very naive idea at this point of development. Instead, it now\ndownsizes the input, learns filters in a residual network and then samples back\nup to the input frame size; I would have liked to perform convolutions directly\non the input, but memory limitations prevented the usage of a useful amount of\nfeature maps.\n\nThe combined network makes the processing pipeline consist of an encoder, a\nvariational embedding, a decoder and a generator; sort of a three-quarter deep \nconvolutional VAEGAN architecture minus the adversarial training. No cross-validation, \ndropout or regularization procedures have been used in order to get the network \nto as closely fit the inputs as possible. \n\nImage frame size was fixed to 320x180 because of said memory limitations;\nThe VAE uses 6 layers with increasing feature map sizes in an attempt\nto make up for this. Training the whole network took about three days\non an Nvidia GTX 980TI.\n\n## Training inputs\n\nThe paintings and videos used are:\n\n* [Leonid Afremov](http://leonidafremov.deviantart.com/gallery/)'s DeviantArt gallery\n* [Disclosure - Magnets](https://www.youtube.com/watch?v=b_KfnGBtVeA) music video\n* [Wim - See You Hurry](https://vimeo.com/22328077) music video\n\nBecause more video frames were available than paintings, only every tenth\nframe was used from them. Black borders have been cropped and the frames\nwere resampled to 320x180, RGB.\n\nFinally, the trained VAE was used on a video unrelated to the training process,\n\n* [Pentatonix - Daft Punk](https://www.youtube.com/watch?v=3MteSlpxCpo)\n\n## Resulting media\n\nThe evaluation script creates video-only MP4 files to which I added back the music\nfrom the original videos using `ffmpeg`. Some videos have been uploaded\nto Vimeo. For copyright reasons, the videos are protected with the password\n\n    cadl\n\nYou can find them here\n\n* [CADL VAE Style Transfer on Pentatonix - Daft Punk](https://vimeo.com/202984113)\n* [CADL VAE Style Transfer on Wim - See You Hurry](https://vimeo.com/202979720)\n* [CADL VAE Style Transfer on Disclosure - Magnets](https://vimeo.com/202991439)\n\nThe _Daft Punk_ video is, in my opinion, by far the most interesting: Because the\ninput was never seen during training, the network had to make things up on its own.\nFor _See You Hurry_ and _Magnets_, movement is a bit choppy due to the\nreduced amount of frames the network was trained on. It can also be seen that faster\nmotion tends to correlate with more colorful rendition, whereas the fog\nin the See You Hurry video doesn't do the video any favor at all.\n\n## Training process\n\nData is extracted using `extract_tiles.py` and written to `.tfrecord.gz`\nfiles for later usage. The `preview_tiles.py` script is used to\nvalidate correctness.\n\n`train_vae.py` performs the actual training based on the TFRecord files.\nThe network was pre-trained using See You Hurry and the Afremov paintings,\nto which I later added the Magnets video frames (that part went well).\nLearning that VAE is _very_ slow, although adding another training set\ndid not appear to make it worse. I stopped training the VAE after\napproximately 30 hours and left the refining network running for about\n12 hours, at which point improvement was noticeable, yet very subtle.\n\nThe `export_graphs.py` script takes the network checkpoints produced by\nTensorFlow's Supervisor and exports them as reusable protocol buffer\nfiles. The `evaluate*.py` load these files in order to perform inference\nand some tests on the latent embedding vectors.\n\nFinally, `mogrify_video.py` is used to process videos using the network.\n\n### Impressions from the VAE training process\n\nAfter the VAE training reached a certain point, convergence slowed down.\nThe following graph depicts the change of loss over about 31 hours, \nwhere the bump/spike at 3pm (12 hours in) depicts the moment I added\nthe second video to the learning process. Note that the loss scale is logarithmic.\n\n![Loss on the VAE learning process](doc/vae-loss.jpg)\n\nThis is a screenshot about nineteen hours into the learning process ...\n\n![VAE training](doc/preview-20170205-231757.jpg)\n\n... while this is about four hours later.\n\n![VAE training, four hours later](doc/preview-20170206-013356.jpg)\n\nA video of the learning progress over about 9000 batch iterations is\nassembled here:\n\n[![9000 iterations of a Variational Autoencoder learning](doc/youtube-vae9000.jpg)](https://www.youtube.com/watch?v=dTUmzAW4t3A \"9000 iterations of a Variational Autoencoder learning\")\n\n### Impressions from the refinement network training process\n\nThe following graph depicts the change of loss over about 12 hours,\nagain with logarithmic loss scale:\n\n![Loss on the refinement learning process](doc/refinement-loss.jpg)\n\nThe below screenshot shows the output of the VAE on the top and the\nrefined images on the bottom after training; note that the images appear to feature\nsharper edges and smoother areas.\n\n![Refinement network after 10 hours](doc/preview-refine-20170207-144602.jpg)\n\n## Further experiments\n\nIn order to get a cleaner outcome, I assume a real VAEGAN approach might\nbe more fruitful.\n\n## Copyright and Licenses\n\nThe original videos and paintings are copyrighted by their respective\ncopyright holders and used only in a _fair use_ context.\nThe VAE implementation and related utilities are copyrighted by Parag Mital (2016) and can be\nfound on his [CADL repository](https://github.com/pkmital/CADL).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunsided%2Fvae-style-transfer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsunsided%2Fvae-style-transfer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunsided%2Fvae-style-transfer/lists"}