Attentive Statistics Pooling for Deep Speaker Embedding
Attentive Statistics Pooling for Deep Speaker Embedding
This paper proposes attentive statistics pooling for deep speaker embedding in text-independent speaker verification. In conventional speaker embedding, frame-level features are averaged over all the frames of a single utterance to form an utterance-level feature. Our method utilizes an attention mechanism to give different weights to different frames and generates …