Skip to content

SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing

← Back to topic

Authors: Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei
Year: 2023
Journal: arXiv
DOI: 10.48550/arXiv.2110.07205
Publisher: https://arxiv.org/abs/2110.07205

Keywords: speecht5, multimodal

Abstract

SpeechT5 is a unified-modal encoder-decoder pre-trained model for speech.

Cite this paper

bibtex
@misc{speecht52023,
  title  = {SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing},
  author = {Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei},
  year   = {2023},
  journal = {arXiv},
  doi    = {10.48550/arXiv.2110.07205},
  url    = {https://doi.org/10.48550/arXiv.2110.07205},
}

Source files

Released under the MIT License.