SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
← Back to topic
Authors: Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei
Year: 2023
Journal: arXiv
DOI: 10.48550/arXiv.2110.07205
Publisher: https://arxiv.org/abs/2110.07205
Keywords: speecht5, multimodal
Abstract
SpeechT5 is a unified-modal encoder-decoder pre-trained model for speech.
Cite this paper
bibtex
@misc{speecht52023,
title = {SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing},
author = {Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei},
year = {2023},
journal = {arXiv},
doi = {10.48550/arXiv.2110.07205},
url = {https://doi.org/10.48550/arXiv.2110.07205},
}