{"id":15937066,"url":"https://github.com/zimmerrol/emomatch-pytorch","last_synced_at":"2025-04-03T19:44:10.815Z","repository":{"id":98201799,"uuid":"150695845","full_name":"zimmerrol/emomatch-pytorch","owner":"zimmerrol","description":"Unsupervised Audio + Video Network Pretraining using PyTorch","archived":false,"fork":false,"pushed_at":"2019-04-21T15:05:25.000Z","size":271,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-09T08:17:04.570Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zimmerrol.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-28T06:31:37.000Z","updated_at":"2020-08-27T15:39:10.000Z","dependencies_parsed_at":"2023-03-08T12:15:42.137Z","dependency_job_id":null,"html_url":"https://github.com/zimmerrol/emomatch-pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zimmerrol%2Femomatch-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zimmerrol%2Femomatch-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zimmerrol%2Femomatch-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zimmerrol%2Femomatch-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zimmerrol","download_url":"https://codeload.github.com/zimmerrol/emomatch-pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247070780,"owners_count":20878581,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-07T04:42:02.909Z","updated_at":"2025-04-03T19:44:10.797Z","avatar_url":"https://github.com/zimmerrol.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EmoMatch Task\nSimple transfer-learning task based on the VoxCeleb dataset to pretrain networks working on videos (audio + video)\nThis code requires you to download the VoxCeleb dataset and to extract it (both audio and video).\n\nThe idea of this aproach is based on the paper [Look, Listen, Learn](https://arxiv.org/abs/1705.08168): here, audio and video information were used to pretain an image encoder network to be used for image classificaiton tasks.\n\nThis project tries to extend this approach to not only train an image encoder but to actually pre-train a network that is able to process both audio and video information. The task the network is meant to solve is rather simple: given an audio sequence and a video sequence, decide whether the two match (i.e. have the same origin).\n\n\nStructure of the EmoMatch training procedure is shown in the image below. \n![](avnet_model.png)\nThe left side showsthe data preparation while the right side illustrates the data flow through thenetwork.  In  the  data  preparation  video  recordings  are  used  to  separate  theirvideo and audio track. These tracks are then feed into a VNet and an ANet for the video respectively the audio. These networks serve as an encoder to generate features for a classifier network. This classifier will then detect whether the audiotrack originates from the same original recording as the video track (Match) or from a two different recordings (No Match).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzimmerrol%2Femomatch-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzimmerrol%2Femomatch-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzimmerrol%2Femomatch-pytorch/lists"}