ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
← Back to topic
Authors: Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee
Year: 2019
Journal: NeurIPS
DOI: 10.48550/arXiv.1908.02265
Publisher: https://arxiv.org/abs/1908.02265
Keywords: vilbert, vision-language
Abstract
We introduce ViLBERT a model for learning task-agnostic joint representations of image content and natural language.
Cite this paper
bibtex
@misc{vilbert2019,
title = {ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks},
author = {Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee},
year = {2019},
journal = {NeurIPS},
doi = {10.48550/arXiv.1908.02265},
url = {https://doi.org/10.48550/arXiv.1908.02265},
}