{"id":19070278,"url":"https://github.com/amogh7joshi/media-vision","last_synced_at":"2026-04-17T04:32:31.724Z","repository":{"id":138929872,"uuid":"355672478","full_name":"amogh7joshi/media-vision","owner":"amogh7joshi","description":"🎥 Visual media restoration using deep-learning-based colorization, upscaling, interpolation, and more.","archived":false,"fork":false,"pushed_at":"2021-05-12T18:23:15.000Z","size":48,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-02T15:49:54.711Z","etag":null,"topics":["colorization","computer-vision","deep-learning","interpolation","neural-networks","pytorch","super-resolution","upscaling"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amogh7joshi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-07T20:21:44.000Z","updated_at":"2022-11-11T13:12:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"b54d9c61-21db-453d-b386-3b071c36c9b1","html_url":"https://github.com/amogh7joshi/media-vision","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amogh7joshi%2Fmedia-vision","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amogh7joshi%2Fmedia-vision/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amogh7joshi%2Fmedia-vision/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amogh7joshi%2Fmedia-vision/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amogh7joshi","download_url":"https://codeload.github.com/amogh7joshi/media-vision/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240122574,"owners_count":19751142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colorization","computer-vision","deep-learning","interpolation","neural-networks","pytorch","super-resolution","upscaling"],"created_at":"2024-11-09T01:17:53.216Z","updated_at":"2026-04-17T04:32:26.705Z","avatar_url":"https://github.com/amogh7joshi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MediaVision\n\nA toolbox for reconstructing visual media, primarily images and videos, through operations including\ncolorizing, upscaling, interpolating, and more. This project makes use of entirely open-source papers and \ncode implementations, which are re-implemented, and the neural networks have pre-trained weights loaded in \n(the toolbox is implemented almost exclusively in PyTorch).\n\n**Note**: While the goal of MediaVision is to try to reconstruct media, all features that are added to images\nor video, such as artificial coloring or frame interpolation (adding artificial frames), is not necessarily \nhistorically accurate, rather it simply *aims to provide a plausible historical interpretation*.\n\n## Usage\n\n### Install from Source\n\nCurrently, MediaVision is only available as a toolkit downloadable from source (though both a Colab implementation\nand a pip package are in development). To install, it, first clone the repository:\n\n```shell\ngit clone https://github.com/amogh7joshi/media-vision.git\n```\n\nThen, you will need to download the trained weights files for certain modules which have their \nnetworks with pretrained weights in Google Drive. Download them from the following list:\n\n1. **Interpolation** (RIFE): Follow the instructions at [https://github.com/hzwer/arXiv2020-RIFE](https://github.com/hzwer/arXiv2020-RIFE).\n2. **Upscaling** (ESRGAN): Follow the instructions at [https://github.com/xinntao/ESRGAN](https://github.com/xinntao/ESRGAN).\n\nAll of the pretrained weights should be placed in the `mediavision.models` directory. After you have \ndone this, enter the top-level directory and execute the following: \n\n```shell\nmake build\n```\n \nThe provided Makefile will configure the `models` directory as well as the `mediavision.weights` module\nfor easy usage, in addition to installing all system requirements.\n\n## Core Features\n\nMediaVision contains a number of modules for visual media processing and reconstruction.\n\n### Colorization\n\n**API**: `mediavision.colorize()`\n\nCreates a colorized version of a grayscale image, trying to emulate traditional colors and \nemphasize vibrancy and realism. In essence, the input image will be colorized based on the neural network's\nunderstanding of modern and historical colors.\n\nCurrently, the existing module makes use of the [Colorful Image Colorization](https://arxiv.org/abs/1603.08511) \napproach, with a direct feed-forward CNN (which contains some branches), and the code is sourced from the \n[official implementation](https://github.com/richzhang/colorization).\n\n### Interpolation\n\n**API**: `mediavision.interpolate()`\n\nPerforms Video Frame Interpolation (VFI) to increase a video's FPS by generating intermediate\nframes in between the existing ones to increase fluidity and smoothness.\n\nCurrently, the existing module makes use of the [RIFE](https://arxiv.org/abs/2011.06294) approach, with\nintermediate flow estimation and three semi-sequential models, and the code is sourced from the \n[official implementation](https://github.com/hzwer/arXiv2020-RIFE).\n\n### Upscaling\n\n**API**: `mediavision.upscale()`\n\nUpscales images by enlarging them to a greater resolution while preventing quality loss, attempting to\ngenerate realistic textures maintain visual quality.\n\nCurrently, the existing module makes use of the [ESRGAN](https://arxiv.org/abs/1809.00219) approach,\nusing Residual-in-Residual Dense Blocks and using features before activation, and the code is sourced from\nthe [official implementation](https://github.com/xinntao/ESRGAN).\n\n## Additional Features\n\n### Image/Video Visualization\n\n**API**: `imagevision.visualize`\n\nA collection of image and video visualization methods to facilitate easy viewing of core processing \nresults and also aid in debugging, in certain cases. All the visualizations are constructed using\neither some form of matplotlib or OpenCV backend.\n\n\n## References\n\n### Media Colorization\n\n```bibtex\n@inproceedings{zhang2016colorful,\n  title={Colorful Image Colorization},\n  author={Zhang, Richard and Isola, Phillip and Efros, Alexei A},\n  booktitle={ECCV},\n  year={2016}\n}\n\n@article{zhang2017real,\n  title={Real-Time User-Guided Image Colorization with Learned Deep Priors},\n  author={Zhang, Richard and Zhu, Jun-Yan and Isola, \n          Phillip and Geng, Xinyang and Lin, Angela S and Yu, \n          Tianhe and Efros, Alexei A},\n  journal={ACM Transactions on Graphics (TOG)},\n  volume={9},\n  number={4},\n  year={2017},\n  publisher={ACM}\n}\n```\n\n### Video Frame Interpolation\n\n```bibtex\n@article{huang2020rife,\n  title={RIFE: Real-Time Intermediate Flow Estimation \n               for Video Frame Interpolation},\n  author={Huang, Zhewei and Zhang, Tianyuan and Heng, \n          Wen and Shi, Boxin and Zhou, Shuchang},\n  journal={arXiv preprint arXiv:2011.06294},\n  year={2020}\n}\n```\n\n### Media Resolution Upscaling\n\n```bibtex\n@InProceedings{wang2018esrgan,\n    author = {Wang, Xintao and Yu, Ke and Wu, Shixiang and Gu, Jinjin and Liu, Yihao and Dong, Chao and Qiao, Yu and Loy, Chen Change},\n    title = {ESRGAN: Enhanced super-resolution generative adversarial networks},\n    booktitle = {The European Conference on Computer Vision Workshops (ECCVW)},\n    month = {September},\n    year = {2018}\n}\n```\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famogh7joshi%2Fmedia-vision","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famogh7joshi%2Fmedia-vision","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famogh7joshi%2Fmedia-vision/lists"}