Vggsound: A Large-Scale Audio-Visual Dataset
Vggsound: A Large-Scale Audio-Visual Dataset
Our goal is to collect a large-scale audio-visual dataset with low label noise from videos `in the wild' using computer vision techniques. The resulting dataset can be used for training and evaluating audio recognition models. We make three contributions. First, we propose a scalable pipeline based on computer vision techniques …