Voice Quality and Pitch Features in Transformer-Based Speech Recognition
Voice Quality and Pitch Features in Transformer-Based Speech Recognition
Jitter and shimmer measurements have shown to be carriers of voice quality and prosodic information which enhance the performance of tasks like speaker recognition, diarization or automatic speech recognition (ASR).However, such features have been seldom used in the context of neural-based ASR, where spectral features often prevail.In this work, we …