Ask a Question

Prefer a chat interface with context about you and your work?

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

Recent work on end-to-end automatic speech recognition (ASR) has shown that the connectionist temporal classification (CTC) loss can be used to convert acoustics to phone or character sequences.Such systems are used with a dictionary and separately-trained Language Model (LM) to produce word sequences.However, they are not truly end-to-end in the …