Ask a Question

Prefer a chat interface with context about you and your work?

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) by incorporating pre-trained speech models. However, these SLMs often undergo extensive speech instruction-tuning to bridge the gap between speech and text modalities. This requires significant annotation efforts and risks catastrophic forgetting of the original …