Multi-Modulation Network for Audio-Visual Event Localization

Type: Preprint

Publication Date: 2021-01-01

Citations: 3

DOI: https://doi.org/10.48550/arxiv.2108.11773

View

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ Audio-Visual Event Localization in Unconstrained Videos 2018 Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
+ PDF Chat Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention 2021 Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
+ PDF Chat Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization 2024 Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
+ PDF Chat CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization 2024 X. He
Xiangxi Liu
Yang Li
Dongcheng Zhao
Guobin Shen
Qingqun Kong
Xin Yang
Yi Zeng
+ Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention 2020 Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
+ MPN: Multimodal Parallel Network for Audio-Visual Event Localization 2021 Jiashuo Yu
Ying Cheng
Rui Feng
+ PDF Chat MPN: Multimodal Parallel Network for Audio-Visual Event Localization 2021 Jiashuo Yu
Ying Cheng
Rui Feng
+ PDF Chat Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration 2024 Ziheng Zhou
Jinxing Zhou
Wei Qian
Shengeng Tang
Xiaojun Chang
Dan Guo
+ PDF Chat Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline 2023 Tiantian Geng
Teng Wang
Jinming Duan
Runmin Cong
Feng Zheng
+ Dual Normalization Multitasking for Audio-Visual Sounding Object Localization 2021 Tokuhiro Nishikawa
Daiki Shimada
Jerry Jun Yokono
+ Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline 2023 Tiantian Geng
Teng Wang
Jinming Duan
Runmin Cong
Feng Zheng
+ AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization 2022 Tanvir Mahmud
Diana Marculescu
+ PDF Chat AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization 2023 Tanvir Mahmud
Diana Marculescu
+ Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion 2022 Yuanbo Hou
Bo Soo Kang
Dick Botteldooren
+ PDF Chat Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion 2022 Yuanbo Hou
Bo Kang
Dick Botteldooren
+ PDF Chat Label-anticipated Event Disentanglement for Audio-Visual Video Parsing 2024 Jinxing Zhou
Dan Guo
Yuxin Mao
Yiran Zhong
Xiaojun Chang
Meng Wang
+ PDF Chat Towards Open-Vocabulary Audio-Visual Event Localization 2024 Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Meng Wang
+ PDF Chat Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing 2024 Pengcheng Zhao
Jinxing Zhou
Dan Guo
Yang Zhao
Yanxiang Chen
+ PDF Chat Dual-modality Seq2Seq Network for Audio-visual Event Localization 2019 Yan-Bo Lin
Yu-Jhe Li
Yu-Chiang Frank Wang
+ Dual-modality seq2seq network for audio-visual event localization 2019 Yan-Bo Lin
Yu-Jhe Li
Yu-Chiang Frank Wang

Cited by (0)

Action Title Year Authors

Citing (28)

Action Title Year Authors
+ PDF Chat ImageNet Large Scale Visual Recognition Challenge 2015 Olga Russakovsky
Jia Deng
Hao Su
Jonathan Krause
Sanjeev Satheesh
Sean Ma
Zhiheng Huang
Andrej Karpathy
Aditya Khosla
Michael S. Bernstein
+ PDF Chat Ambient Sound Provides Supervision for Visual Learning 2016 Andrew Owens
Jiajun Wu
Josh H. McDermott
William T. Freeman
Antonio Torralba
+ PDF Chat CNN architectures for large-scale audio classification 2017 Shawn Hershey
Sourish Chaudhuri
Daniel P. W. Ellis
Jort F. Gemmeke
Aren Jansen
Robert C. Moore
Manoj Plakal
Devin Platt
Rif A. Saurous
Bryan Seybold
+ PDF Chat Temporal Convolutional Networks for Action Segmentation and Detection 2017 Colin Lea
M. D. Flynn
René Vidal
Austin Reiter
Gregory D. Hager
+ Co-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization 2018 Bruno Korbar
Du Tran
Lorenzo Torresani
+ Deep Co-Clustering for Unsupervised Audiovisual Learning. 2018 Di Hu
Feiping Nie
Xuelong Li
+ PDF Chat X2Face: A Network for Controlling Face Generation Using Images, Audio, and Pose Codes 2018 Olivia Wiles
A. Sophia Koepke
Andrew Zisserman
+ PDF Chat Self-supervised Audio-visual Co-segmentation 2019 Andrew Rouditchenko
Hang Zhao
Chuang Gan
Josh H. McDermott
Antonio Torralba
+ PDF Chat Learning modality-invariant representations for speech and images 2017 Kenneth Leidal
David Harwath
James Glass
+ Very Deep Convolutional Networks for Large-Scale Image Recognition 2014 Karen Simonyan
Andrew Zisserman
+ PDF Chat Improved Speech Reconstruction from Silent Video 2017 Ariel Ephrat
Tavi Halperin
Bezalel Peleg
+ PDF Chat Learning to Separate Object Sounds by Watching Unlabeled Video 2018 Ruohan Gao
Rogério Feris
Kristen Grauman
+ PDF Chat End-to-End Learning of Action Detection from Frame Glimpses in Videos 2016 Serena Yeung
Olga Russakovsky
Greg Mori
Li Fei-Fei
+ Attention is All you Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Ɓukasz Kaiser
Illia Polosukhin
+ PDF Chat Learning to Localize Sound Source in Visual Scenes 2018 Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming–Hsuan Yang
In So Kweon
+ PDF Chat Visual to Sound: Generating Natural Sound for Videos in the Wild 2018 Yipin Zhou
Zhaowen Wang
Fang Chen
Trung Bui
Tamara L. Berg
+ Cascaded Boundary Regression for Temporal Action Detection 2017 Jiyang Gao
Zhenheng Yang
Ram Nevatia
+ PDF Chat Audio-Visual Event Localization in Unconstrained Videos 2018 Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
+ PDF Chat The Conversation: Deep Audio-Visual Speech Enhancement 2018 Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
+ PDF Chat Seeing Through Noise: Visually Driven Speaker Separation And Enhancement 2018 Aviv Gabbay
Ariel Ephrat
Tavi Halperin
Bezalel Peleg
+ PDF Chat Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs 2016 Zheng Shou
Dongang Wang
Shih‐Fu Chang
+ PDF Chat Temporal Action Detection with Structured Segment Networks 2017 Yue Zhao
Yuanjun Xiong
Limin Wang
Zhirong Wu
Xiaoou Tang
Dahua Lin
+ PDF Chat Visually Indicated Sounds 2016 Andrew Owens
Phillip Isola
Josh H. McDermott
Antonio Torralba
Edward H. Adelson
William T. Freeman
+ PDF Chat BMN: Boundary-Matching Network for Temporal Action Proposal Generation 2019 Tianwei Lin
Xiao Liu
Xin Li
Errui Ding
Shilei Wen
+ PDF Chat Graph Convolutional Networks for Temporal Action Localization 2019 Runhao Zeng
Wenbing Huang
Chuang Gan
Mingkui Tan
Yu Rong
Peilin Zhao
Junzhou Huang
+ Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention 2020 Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
+ PDF Chat Audio to Body Dynamics 2018 Eli Shlizerman
Lucio M. Dery
Hayden Schoen
Ira Kemelmacher-Shlizerman
+ PDF Chat Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation 2018 Ariel Ephrat
Inbar Mosseri
Oran Lang
Tali Dekel
Kevin Wilson
Avinatan Hassidim
William T. Freeman
Michael Rubinstein