{"id":22366299,"url":"https://github.com/microsoft/video_call_mos","last_synced_at":"2025-07-30T17:30:43.724Z","repository":{"id":152162625,"uuid":"551626544","full_name":"microsoft/Video_Call_MOS","owner":"microsoft","description":"A video quality MOS prediction model for videoconferencing calls that takes temporal distortions into account","archived":false,"fork":false,"pushed_at":"2024-08-30T23:43:38.000Z","size":10890,"stargazers_count":37,"open_issues_count":3,"forks_count":7,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-12-04T17:49:30.264Z","etag":null,"topics":["machine-learning","qoe","video-quality","video-quality-assessment","videoconferencing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null}},"created_at":"2022-10-14T19:15:08.000Z","updated_at":"2024-11-16T01:54:22.000Z","dependencies_parsed_at":"2023-06-07T01:00:31.192Z","dependency_job_id":null,"html_url":"https://github.com/microsoft/Video_Call_MOS","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FVideo_Call_MOS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FVideo_Call_MOS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FVideo_Call_MOS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FVideo_Call_MOS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/Video_Call_MOS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228164524,"owners_count":17879084,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","qoe","video-quality","video-quality-assessment","videoconferencing"],"created_at":"2024-12-04T18:09:28.540Z","updated_at":"2024-12-04T18:09:29.460Z","avatar_url":"https://github.com/microsoft.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Video Call MOS\nThis repository provides the code and dataset for the Video Call MOS (VCM) prediction model, accepted at ICASSP 2023.\nThe model predicts the perceived video quality of videos that were transmitted via videoconferencing calls.\nIn contrast to other state-of-the-art video MOS models it is able to take temporal distortions, such as video freezes, into account.\nWe further provide a dataset with live Microsoft Teams video recordings and crowdsourced subjective quality ratings using [P.910 Crowd](https://github.com/microsoft/P.910). \nThe prediction is performed with the following steps:\n\n 1. Time-alignment of reference video via QR-code marker detection\n 2. VMAF Computation\n 3. Frame freeze feature computation based on time-alignment indices\n 4. Predict MOS with Video Call MOS LSTM, using VMAF and frame freeze features as input\n\nLink to paper: [Gabriel Mittag, Babak Naderi, Vishak Gopal and Ross Cutler, “LSTM-based Video Quality Prediction Accounting for Temporal Distortions in Videoconferencing Calls,” accepted at ICASSP 2023, 2023.](https://arxiv.org/pdf/2303.12761v1.pdf)\n\n## Performance\nIn comparison to VMAF, the proposed VCM model performs better on videos with temporal distortions. The following figure shows how VMAF overestimates the quality for multiple samples in the validation dataset:\n\u003cbr\u003e\u003cbr\u003e\u003cimg src=\"imgs/results.png\" width=\"500\" \u003e\n\nThe following example shows the per-frame predictions for a video that is impaired by a single freeze of around 1 second. According to the crowdsourced ratings, the ground truth video quality MOS is 2.95. Because VMAF does not take the temporal freeze but only the reduced resolution / bitrate into acount, it overestimates the quality with a score of 3.52. In contrast, the proposed VCM model reduces the predictions during frozen frames, resulting in an overall MOS score close to the ground truth.\n\u003cbr\u003e\u003cbr\u003e\u003cimg src=\"imgs/example_1.png\" width=\"500\" \u003e\n\nThe next figure shows a similar effect but instead with multiple shorter frame freezes:\n\u003cbr\u003e\u003cbr\u003e\u003cimg src=\"imgs/example_2.png\" width=\"500\" \u003e\n\n\nPlease refer to the [paper](https://arxiv.org/pdf/2303.12761v1.pdf) for more detailed results.\n\n## Requirements\nThe code in this repository was tested on Ubuntu. Adjustments to the FFMPEG commands may be necessary when running on Windows.\nTo perform reference video alignment and VMAF computation, FFMPEG with VMAF support is required, which can be installed on Ubuntu via the following steps (optional for training and evaluation on the VCM dataset, as pre-computed VMAF features are available in CSV files).\nSee also https://www.johnvansickle.com/ffmpeg/faq for more info on the FFMPEG installation.\n\n```bash\napt-get update -y\napt-get install -y libzbar0 libgl1 # needed for reading QR-codes\nwget -q https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz\ntar xf ffmpeg-git-amd64-static.tar.xz\nmv ffmpeg-git-*-amd64-static/ffmpeg ffmpeg-git-*-amd64-static/ffprobe /usr/local/bin/\n```\n\nIt is recommended to create a new virtual or conda environment dedicated to the project. Use the following command to install the required python packages via pip.\n\n```bash\npip install requirements.txt\n```\n\n## Dataset\nBefore running the code, it is necessary to download the Video Call MOS dataset. Please note that the dataset is a subset of the one used in the [paper](https://arxiv.org/pdf/2303.12761v1.pdf). It can be found here:\n\nhttps://challenge.blob.core.windows.net/video-call-mos/video_call_mos_set.zip\n\nThe dataset contains 10 reference videos and 1467 degraded videos. The videos were transmitted via Microsoft Teams calls in 83 different network conditions and contain various typical videoconferencing impairments. It also includes [P.910 Crowd](https://github.com/microsoft/P.910) subjective video MOS ratings (see [paper](https://arxiv.org/pdf/2303.12761v1.pdf) for more info).\n\n## Evaluating\nTo evaluate the default VCM or a newly trained model, the following script can be run. It also plots correlation diagrams and per-frame MOS predictions and compares the results to VMAF (it should reproduce exactly the same results as shown above in [Performance](#Performance)). The path variables `data_dir` and `csv_file` within the script need to be updated before executing. \n\n```bash\npython run_evaluation_and_plotting.py   \n```\n\nThe script is using the pre-computed VMAF features and alignment indices loaded from CSV files as inputs to the VCM model. For a new dataset, new CSV files can be written by using the `run_video_call_mos_on_dataset.py` script (see [Video Quality Prediction](#Video-Quality-Prediction)).\n\n## Video Quality Prediction\nTo predict the MOS score of a single video file, the following command can be used:\n```bash\npython run_video_call_mos.py --deg_video /path/to/video_call_mos_set/data/deg_0001.mp4 --ref_video /path/to/video_call_mos_set/data/ref_01.mp4 --results_dir /path/to/video_call_mos_set/results --tmp_dir /path/to/video_call_mos_set/tmp\n```\nThis command requires longer computation time and will run the inference end-to-end, including QR-code detection, reference alignment, VMAF computation, and Video Call MOS LSTM model. Note that the code expects 1920x1080 MP4 video files and the reference and degraded videos need to have QR-code markers drawn onto them (see [Draw QR-code markers](#Draw-QR-code-markers)).\n\nTo run the Video Call MOS model on a dataset provided via CSV file, the following script can be used (the paths within the script need to be updated):\n```bash\npython run_video_call_mos_on_dataset.py\n```\n\n## Training\nTo train a new Video Call MOS model following script can be used. It uses pre-computed VMAF features and alignment indices loaded from CSV files as inputs. For a new dataset, new CSV files can be written by using the `run_video_call_mos_on_dataset.py` script (see [Video Quality Prediction](#Video-Quality-Prediction)). The path variables within the script need to be updated before running the script. The training parameters, such as, which input features to use, the number of epochs or LSTM layers and hidden units size may be adjusted as well.\n\n```bash\npython run_training.py   \n```\n\n## Draw QR-code Markers\nBecause videos received during a video call are prone to frame freezes, skips and playback rate changes, it is necessary to align the degraded videos to the clean reference video. In order to allow for a robust time alignment, we apply QR-code markers to the source videos. The reference videos in the Video Call MOS dataset are already prepared with QR-code markers. To draw markers on new reference videos, the following script can be used. The paths and parameters within the script need to be updated. Please note that the script expects 1920x1080 MP4 video files but could be adjusted for other formats.\n\n```bash\npython run_draw_qr_codes.py   \n```\n\n## Citation\nIf you use the code or dataset in a publication please cite the following [paper](https://arxiv.org/pdf/2303.12761v1.pdf):\n\n```BibTex\n@inproceedings{vcm_icassp,\n  title={LSTM-based Video Quality Prediction Accounting for Temporal Distortions in Videoconferencing Calls},\n  author={Mittag, Gabriel and Naderi, Babak and Gopal, Vishak and Cutler, Ross},\n  booktitle={accepted at ICASSP 2023},\n  year={2023}\n}\n```\n\n# Contributing\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide a\nCLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n# Legal Notices\nMicrosoft and any contributors grant you a license to the Microsoft documentation and other content\nin this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),\nsee the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the\n[LICENSE-CODE](LICENSE-CODE) file.\n\nMicrosoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the\ndocumentation may be either trademarks or registered trademarks of Microsoft in the United States\nand/or other countries. The licenses for this project do not grant you rights to use any Microsoft\nnames, logos, or trademarks. Microsoft's general trademark guidelines can be found at\nhttp://go.microsoft.com/fwlink/?LinkID=254653.\n\nPrivacy information can be found at https://privacy.microsoft.com/en-us/privacystatement.\n\nMicrosoft and any contributors reserve all other rights, whether under their respective copyrights, patents,\nor trademarks, whether by implication, estoppel or otherwise.\n\n## Dataset licenses\nMICROSOFT PROVIDES THE DATASETS ON AN \"AS IS\" BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INCLUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.\n\nThe dataset is provided under the original terms that Microsoft received the source dataset. The Terms of Use of the Microsoft Learn videos, which are used as source videos in the Video Call MOS dataset, can be found at https://learn.microsoft.com/en-us/legal/termsofuse.\n\n## Code license\nMIT License\n\nCopyright (c) Microsoft Corporation.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Fvideo_call_mos","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Fvideo_call_mos","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Fvideo_call_mos/lists"}