Ask a Question

Prefer a chat interface with context about you and your work?

SpiRit-LM: Interleaved Spoken and Written Language Model

SpiRit-LM: Interleaved Spoken and Written Language Model

We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set …