Ask AI a math question

Related Paper

Vggsound: A Large-Scale Audio-Visual Dataset

Our goal is to collect a large-scale audio-visual dataset with low label noise from videos `in the wild' using computer vision techniques. The resulting dataset can be used for training and evaluating audio recognition models. We make three contributions. First, we propose a scalable pipeline based on computer vision techniques …

Ask a Question