Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning
We introduce a simple neural encoder architecture that can be trained using an unsupervised contrastive learning objective which gets its positive samples from data-augmented k-Nearest Neighbors search.We show that when built on top of recent self-supervised audio representations [1, 2, 3], this method can be applied iteratively and yield competitive …