Ask a Question

Prefer a chat interface with context about you and your work?

SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition

SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition

End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the whole token sequence, while neglecting explicit phonemic-granularity supervision.This could result in recognition errors due to similarphoneme confusion or phoneme reduction.To alleviate this problem, we propose a novel framework based on Supervised Contrastive Learning (SCaLa) to …