{"id":28513493,"url":"https://github.com/livepeer/verification-classifier","last_synced_at":"2025-08-21T17:26:28.460Z","repository":{"id":37594840,"uuid":"172597245","full_name":"livepeer/verification-classifier","owner":"livepeer","description":"Metrics-based Verification Classifier","archived":false,"fork":false,"pushed_at":"2022-11-21T21:54:16.000Z","size":138006,"stargazers_count":8,"open_issues_count":19,"forks_count":7,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-06-30T02:49:06.471Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/livepeer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-25T22:43:04.000Z","updated_at":"2021-06-26T11:53:48.000Z","dependencies_parsed_at":"2023-01-21T12:49:04.427Z","dependency_job_id":null,"html_url":"https://github.com/livepeer/verification-classifier","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/livepeer/verification-classifier","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livepeer%2Fverification-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livepeer%2Fverification-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livepeer%2Fverification-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livepeer%2Fverification-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/livepeer","download_url":"https://codeload.github.com/livepeer/verification-classifier/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/livepeer%2Fverification-classifier/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263467772,"owners_count":23471126,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-09T01:07:13.629Z","updated_at":"2025-08-21T17:26:28.444Z","avatar_url":"https://github.com/livepeer.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Context\n\nThis contribution involves research and attempts to tackle the problem of verifying that the transcoded content itself is a reasonable match for the original source given a good-faith effort at transcoding.\n\nThe mission consists on developing a verification classifier that will give a pass / fail output (and with what confidence score), for a given segment of a given asset's rendition.\n\nA series of articles on the topic can be found [here](https://medium.com/@epiclabs.io/assessing-metrics-for-video-quality-verification-in-livepeers-ecosystem-f66f724b2aea) and [here](https://medium.com/@epiclabs.io/assessing-metrics-for-video-quality-verification-in-livepeers-ecosystem-ii-6827d093a380).\n\nAn up-to-date verifier implementation is exposed through the [API](api/README.md). The implementation section below documents its design. \n\n# Implementation\nThis section is intended to give readers a high level understanding of currently implemented verification process, without diving into too much details.\n## Interface\nREST API is the main interface for video verification. It is implemented with Flask and runs on a Gunicorn server inside Docker container. The API is documented [here](api/README.md). \n## Verification process\nVerification process consist of following steps:\n### 1. Preparation\nSource video and each rendition video are made accessible through file system.\n### 2. Pre-verification\nMetadata attributes, such as width, height and framerate, are read from video file headers and compared among source video, renditions and assumed values passed in the API call. Handled by [Verifier](verifier/verifier.py) class.\n### 3. Frame matching\nThe frame matching algorithm goal is to choose closest by presentation time stamp frame pairs from source and rendition videos. Once PTS is extracted, the task is trivial, if source and rendition FPS are same. If frame rates doesn't match, the algorithm works as follows: \n1. An excessive number of frames is uniformly sampled from source video. The number is determined as MAX(N_SAMPLES, N_SAMPLES * MAX{SOURCE_FPS/RENDITION_FPS}). This allows to increase probability of finding best matching timestamps in case rendition FPS is lower than source fps. \n2. Presentation timestamps of rendition video frames are iterated to find closest matching frame for each master frame. If the timestamp difference for a given pair exceeds 1/(2*SOURCE_FPS), the pair is discarded.\n3. Resulting set of frame pairs returned for metrics computation.  \n\nImplemented in [VideoAssetProcessor](scripts/asset_processor/video_asset_processor.py) class. \n### 4. Metrics computation\nOn a rendition video level, following numerical metrics are computed:\n- size_dimension_ratio  \n\nFor each frame pair, following numerical metrics are computed:\n- temporal_dct\n- temporal_gaussian_mse\n- temporal_gaussian_difference\n- temporal_threshold_gaussian_difference\n- temporal_histogram_distance\n\nOne important thing to note regarding frame-level metrics, is that all of them, except temporal_histogram_difference, are applied to V channel of HSV-converted frame image. Without full-channel metrics, it would be trivial for an attacker to craft a very obviously tampered video, which would pass the verification.\n     \nThe code for metric computation is located [here](scripts/asset_processor/video_metrics.py).\n### 5. Metrics aggregation\nEach per-frame pair metric is aggregated across frame pairs to get a single value for source-rendition pair in question. Currently, the aggregation function is a simple mean for each metric.\n### 6. Metrics scaling\nThe final step is to scale metrics according to video resolution. After that, we have features which could be used with models.\n### 7. Classification\nThe process of determining whether the video is tampered is viewed as a binary classification task. The Positive class or 1 is assigned to tampered videos, while Negative (0) designates untampered renditions, which accurately represent the source video.  \nOnce features are extracted for select source-rendition video pair, they are fed to following models:\n- One Class Support Vector Machine  \nThis is an anomaly detection model, it was fit to untampered renditions to learn the 'normal' distribution of features and detect outliers. It is characterized by lower number of False Positives, but is somewhat less sensitive to tampered videos. Being unsupervised model, it is expected to generalize well on novel data.\n- CatBoost binary classifier.  \nThis supervised model is trained on a labeled dataset and typically achieve higher accuracy, than OCSVM model.\n### 8. Meta-model application\nTo make a final prediction, the following rule is applied to classification models output:\n- if OCSVM prediction is \"Untampered\", return \"Untampered\"\n- otherwise, return CatBoost model prediction\n\nThe goal is to reduce the number of False Positives to prevent wrongfully penalizing transcoder nodes. OCSVM model is expected to have higher precision (low FP) on novel data. If OCSVM predicts the observation is an inlier, we'll go with it, otherwise we'll use supervised model output. \n\n# Repository structure\n## 1. Bulk video data generation: YT8M_Downloader\n\nWe are using 10 second chunks of videos from the YouTube-8M Dataset available [here](https://research.google.com/youtube8m/).\nPrevious work with this dataset can be found [here](https://github.com/epiclabs-io/YT8M).\n\nAll the information and the scripts to create the assets reside inside the [YT8M_downloader](YTM8_downloader) folder and are explained in [this](YT8M_downloader/README.md) document.\n\n## 2. Video data analysis: data_analytics\n\nFrom the raw video dataset created we obtain different features out of the analysis made with different tools.\n\n### 2.1. Generation of renditions\nAs part of the feature extraction, we want to generate different variations of the videos including different renditions, flipped videos, etc. Some of these variations constitute the bulk of what we label as \"attacks\". Other constitute \"good\" renditions where no distortions are included.\n\nTo obtain the different \"attacks\", we provide several scripts in order to perform each variation.\n\nAll the information and the scripts can be found inside the scripts folder [here](scripts/README.md)\n\nSection 1 of [Tools.ipynb](feature_engineering/notebooks/Tools.ipynb) notebook helps in the usage in case a notebook is preferred as a means of interaction.\n\n\n### 2.2. Metrics computation with external tools\n\nThere are different standard metrics (VMAF, MS-SSIM, SSIM and PSNR) provided by external tools (ffmpeg and libav) which can be run from the data-analysis/notebooks folder Tools.ipynb notebook. The notebook provides info on how to use them, but also inside the scripts folder [here](/scripts/README.md)\n\nSection 2 of [Tools.ipynb](feature_engineering/notebooks/Tools.ipynb) notebook helps in the usage in case a notebook is preferred as a means of interaction.\n\nAlternatively, the scripts can be run separately as bash scripts.\n\n### 2.3. Data analysis with jupyter notebooks\n\nAt this step we should have the required data in the form of video assets and attacks as well as the metrics extracted with the external tools which may be required by some of the notebooks.\n\nFurther information about this notebooks can be found [here](feature_engineering/README.md)\n\n## 3. Interfaces: CLI and API\n\nOnce models are trained and available, a [CLI](https://github.com/livepeer/verification-classifier/tree/master/cli) and a [RESTful API](https://github.com/livepeer/verification-classifier/tree/master/api) to interact with them and obtain predictions are made available.\nThe bash scripts launch_cli.sh and launch_api.sh can be run from the root folder of the project.\n\n## 4. Common usage scripts: scripts\n\nSeveral utility scripts are hosted in this folder for convenience. They are needed at different stages of the process and for different Docker instances.\n\n## 5. Unit Tests\n\nUnit tests are located in testing/tests folder. Some tests are using data included in repository (under testing/tests/data, machine_learning/output/models, etc.), while other require the following assets to be downloaded and extracted into ../data directory:\n1. [Dataset CSV](https://storage.cloud.google.com/feature_dataset/yt8m-large.tar.gz)\n2. [YT8M renditions mini dataset](https://storage.cloud.google.com/feature_dataset/renditions-mini.tar) \n3. [Small dataset for CI/CD](https://storage.cloud.google.com/feature_dataset/renditions-nano.tar.gz)\n\nTo run tests:\n- Install prerequisites\n```\nsudo apt install ffmpeg\npip install -r requirements.txt\n```\n- Run tests\n```\npython -m pytest testing/tests\n``` \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivepeer%2Fverification-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flivepeer%2Fverification-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivepeer%2Fverification-classifier/lists"}