Type: Article
Publication Date: 2022-09-16
Citations: 25
DOI: https://doi.org/10.21437/interspeech.2022-10652
OursFigure 1: HuBERT: sum of attention weights each frame receives from other frames.Ours (VG-HuBERT3): attention weights each frame receives from the [CLS A] token.Attention weights from different attention heads are coded with different colors.