Ask AI a math question

End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the whole token sequence, while neglecting explicit phonemic-granularity supervision.This could result in recognition errors due to similarphoneme confusion or phoneme reduction.To alleviate this problem, we propose a novel framework based on Supervised Contrastive Learning (SCaLa) to …

Ask a Question