Ask a Question

Prefer a chat interface with context about you and your work?

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural …