Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Human perceives rich auditory experience with distinct sound heard by ears. Videos recorded with binaural audio particular simulate how human receives ambient sound. However, a large number of videos are with monaural audio only, which would degrade the user experience due to the lack of ambient information. To address this …