Ask a Question

Prefer a chat interface with context about you and your work?

Voice Quality and Pitch Features in Transformer-Based Speech Recognition

Voice Quality and Pitch Features in Transformer-Based Speech Recognition

Jitter and shimmer measurements have shown to be carriers of voice quality and prosodic information which enhance the performance of tasks like speaker recognition, diarization or automatic speech recognition (ASR).However, such features have been seldom used in the context of neural-based ASR, where spectral features often prevail.In this work, we …