LLaST: Improved End-to-end Speech Translation System Leveraged by Large
Language Models
LLaST: Improved End-to-end Speech Translation System Leveraged by Large
Language Models
We introduces LLaST, a framework for building high-performance Large Language model based Speech-to-text Translation systems. We address the limitations of end-to-end speech translation(E2E ST) models by exploring model architecture design and optimization techniques tailored for LLMs. Our approach includes LLM-based speech translation architecture design, ASR-augmented training, multilingual data augmentation, and …