Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cloud-cv/vilbert-multi-task
:eyes: :speaking_head: :memo:12-in-1: Multi-Task Vision and Language Representation Learning Web Demo
https://github.com/cloud-cv/vilbert-multi-task
channels cnn deep-learning javascript machine-learning postgresql python3 rabbitmq redis visual-question-answering web-sockets
Last synced: 3 months ago
JSON representation
:eyes: :speaking_head: :memo:12-in-1: Multi-Task Vision and Language Representation Learning Web Demo
- Host: GitHub
- URL: https://github.com/cloud-cv/vilbert-multi-task
- Owner: Cloud-CV
- License: other
- Created: 2020-04-27T19:41:42.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T04:05:21.000Z (about 2 years ago)
- Last Synced: 2024-04-13T22:28:15.070Z (10 months ago)
- Topics: channels, cnn, deep-learning, javascript, machine-learning, postgresql, python3, rabbitmq, redis, visual-question-answering, web-sockets
- Language: Python
- Homepage: https://vilbert.cloudcv.org/
- Size: 1.27 MB
- Stars: 35
- Watchers: 5
- Forks: 4
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 12-in-1: Multi-Task Vision and Language Representation Learning Web Demo
Much of vision-and-language research focuses on a small but diverse set of independent tasks and supporting datasets often studied in isolation; however, the visually-grounded language understanding skills required for success at these tasks overlap significantly. In this work, we investigate these relationships between vision-and-language tasks by developing a large-scale, multi-task training regime. Our approach culminates in a single model on 12 datasets from four broad categories of task including visual question answering, caption-based image retrieval, grounding referring expressions, and multi-modal verification. Compared to independently trained single-task models, this represents a reduction from approximately 3 billion parameters to 270 million while simultaneously improving performance by 2.05 points on average across tasks. We use our multi-task framework to perform in-depth analysis of the effect of joint training diverse tasks. Further, we show that finetuning task-specific models from our single multi-task model can lead to further improvements, achieving performance at or above the state-of-the-art.
**Arxiv Paper Link**: https://arxiv.org/abs/1912.02315
**Demo Link**: https://vilbert.cloudcv.org/
If you have more questions about the project, then you can email us on [email protected]
### Bulit & Maintained by -
[Rishabh Jain](https://rishabhjain.xyz)
### Acknowledgements
We thank Jiasen Lu for his help.