{"id":19178603,"url":"https://github.com/cloud-cv/vilbert-multi-task","last_synced_at":"2025-05-07T20:45:42.544Z","repository":{"id":39720293,"uuid":"259434223","full_name":"Cloud-CV/vilbert-multi-task","owner":"Cloud-CV","description":":eyes: :speaking_head: :memo:12-in-1: Multi-Task Vision and Language Representation Learning Web Demo","archived":false,"fork":false,"pushed_at":"2022-12-08T04:05:21.000Z","size":1334,"stargazers_count":35,"open_issues_count":12,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-05-07T20:45:36.222Z","etag":null,"topics":["channels","cnn","deep-learning","javascript","machine-learning","postgresql","python3","rabbitmq","redis","visual-question-answering","web-sockets"],"latest_commit_sha":null,"homepage":"https://vilbert.cloudcv.org/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Cloud-CV.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-27T19:41:42.000Z","updated_at":"2024-10-24T05:50:47.000Z","dependencies_parsed_at":"2023-01-24T09:45:11.076Z","dependency_job_id":null,"html_url":"https://github.com/Cloud-CV/vilbert-multi-task","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cloud-CV%2Fvilbert-multi-task","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cloud-CV%2Fvilbert-multi-task/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cloud-CV%2Fvilbert-multi-task/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cloud-CV%2Fvilbert-multi-task/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Cloud-CV","download_url":"https://codeload.github.com/Cloud-CV/vilbert-multi-task/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252954144,"owners_count":21830895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["channels","cnn","deep-learning","javascript","machine-learning","postgresql","python3","rabbitmq","redis","visual-question-answering","web-sockets"],"created_at":"2024-11-09T10:40:04.679Z","updated_at":"2025-05-07T20:45:42.526Z","avatar_url":"https://github.com/Cloud-CV.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# 12-in-1: Multi-Task Vision and Language Representation Learning Web Demo\n\nMuch of vision-and-language research focuses on a small but diverse set of independent tasks and supporting datasets often studied in isolation; however, the visually-grounded language understanding skills required for success at these tasks overlap significantly. In this work, we investigate these relationships between vision-and-language tasks by developing a large-scale, multi-task training regime. Our approach culminates in a single model on 12 datasets from four broad categories of task including visual question answering, caption-based image retrieval, grounding referring expressions, and multi-modal verification. Compared to independently trained single-task models, this represents a reduction from approximately 3 billion parameters to 270 million while simultaneously improving performance by 2.05 points on average across tasks. We use our multi-task framework to perform in-depth analysis of the effect of joint training diverse tasks. Further, we show that finetuning task-specific models from our single multi-task model can lead to further improvements, achieving performance at or above the state-of-the-art.\n\n**Arxiv Paper Link**: https://arxiv.org/abs/1912.02315\n\n**Demo Link**: https://vilbert.cloudcv.org/\n\nIf you have more questions about the project, then you can email us on team@cloudcv.org  \n\n### Bulit \u0026 Maintained by -\n\n[Rishabh Jain](https://rishabhjain.xyz)\n\n### Acknowledgements\n\nWe thank Jiasen Lu for his help.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloud-cv%2Fvilbert-multi-task","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloud-cv%2Fvilbert-multi-task","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloud-cv%2Fvilbert-multi-task/lists"}