Skip to content

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

← Back to topic

Authors: Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi
Year: 2022
Journal: ICML
DOI: 10.48550/arXiv.2301.12597
Publisher: https://arxiv.org/abs/2301.12597

Keywords: blip-2, vision-language

Abstract

We propose BLIP-2 a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models.

Cite this paper

bibtex
@misc{blip22022,
  title  = {BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models},
  author = {Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi},
  year   = {2022},
  journal = {ICML},
  doi    = {10.48550/arXiv.2301.12597},
  url    = {https://doi.org/10.48550/arXiv.2301.12597},
}

Source files

Released under the MIT License.