BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
← Back to topic
Authors: Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi
Year: 2022
Journal: ICML
DOI: 10.48550/arXiv.2301.12597
Publisher: https://arxiv.org/abs/2301.12597
Keywords: blip-2, vision-language
Abstract
We propose BLIP-2 a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models.
Cite this paper
bibtex
@misc{blip22022,
title = {BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models},
author = {Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi},
year = {2022},
journal = {ICML},
doi = {10.48550/arXiv.2301.12597},
url = {https://doi.org/10.48550/arXiv.2301.12597},
}