SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the whole token sequence, while neglecting explicit phonemic-granularity supervision.This could result in recognition errors due to similarphoneme confusion or phoneme reduction.To alleviate this problem, we propose a novel framework based on Supervised Contrastive Learning (SCaLa) to …