DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural …