Visual Translation Embedding Network for Visual Relation Detection

Type: Preprint

Publication Date: 2017-07-01

Citations: 537

DOI: https://doi.org/10.1109/cvpr.2017.331

Download PDF

Abstract

Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate ≈ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-toend relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors [27].

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Visual Translation Embedding Network for Visual Relation Detection 2017 Hanwang Zhang
Zawlin Kyaw
Shih‐Fu Chang
Tat‐Seng Chua
+ Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation 2020 Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
+ PDF Chat Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations 2021 Meng-Jiun Chiou
Roger Zimmermann
Jiashi Feng
+ Visual Relationship Detection with Language Priors 2016 Cewu Lu
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
+ RelTransformer: Balancing the Visual Relationship Detection from Local Context, Scene and Memory. 2021 Jun Chen
Aniket Agarwal
Sherif Abdelkarim
Deyao Zhu
Mohamed Elhoseiny
+ Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection 2019 Nikolaos Gkanatsios
Vassilis Pitsikalis
Petros Koutras
Athanasia Zlatintsi
Petros Maragos
+ PDF Chat Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection 2019 Nikolaos Gkanatsios
Vassilis Pitsikalis
Petros Koutras
Athanasia Zlatintsi
Petros Maragos
+ Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation 2017 Ruichi Yu
Ang Li
Vlad I. Morariu
Larry S. Davis
+ Unified Visual Relationship Detection with Vision and Language Models 2023 L. Zhao
Liangzhe Yuan
Boqing Gong
Yin Cui
Florian Schroff
Ming–Hsuan Yang
Hartwig Adam
Ting Liu
+ PDF Chat Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation 2017 Ruichi Yu
Ang Li
Vlad I. Morariu
Larry S. Davis
+ PDF Chat Detecting Unseen Visual Relations Using Analogies 2019 Julia Peyre
Josef Sivic
Ivan Laptev
Cordelia Schmid
+ Detecting unseen visual relations using analogies 2018 Julia Peyre
Ivan Laptev
Cordelia Schmid
Josef Šivic
+ Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions 2018 Stephan Baier
Yunpu Ma
Volker Tresp
+ Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions 2018 Stephan Baier
Yunpu Ma
Volker Tresp
+ PDF Chat RelationVLM: Making Large Vision-Language Models Understand Visual Relations 2024 Zhipeng Huang
Zhizheng Zhang
Zheng-Jun Zha
Yan Lü
Baining Guo
+ PDF Chat Look, Learn and Leverage (L$^3$): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment 2024 Hanchen Xie
Jiageng Zhu
Mahyar Khayatkhoei
Jiazhi Li
Wael AbdAlmageed
+ RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning 2022 Xiaojian Ma
Weili Nie
Zhiding Yu
Huaizu Jiang
Chaowei Xiao
Yuke Zhu
Song‐Chun Zhu
Anima Anandkumar
+ Visual Relationship Detection with Language prior and Softmax 2019 Jaewon Jung
Jongyoul Park
+ PDF Chat Visual Relationship Detection with Language prior and Softmax 2018 Jaewon Jung
Jongyoul Park
+ Visual Relationship Detection Based on Guided Proposals and Semantic Knowledge Distillation 2018 François Plesse
Alexandru Ginsca
Bertrand Delezoide
Françoise Prêteux

Works That Cite This (258)

Action Title Year Authors
+ PDF Chat REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter 2022 Hanbo Zhang
Deyu Yang
Han Wang
Binglei Zhao
Xuguang Lan
Jishiyu Ding
Nanning Zheng
+ PDF Chat Generative Compositional Augmentations for Scene Graph Prediction 2021 B. A. Knyazev
Harm de Vries
Cătălina Cangea
Graham W. Taylor
Aaron Courville
Eugene Belilovsky
+ Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation 2021 Gengcong Yang
Jingyi Zhang
Yong Zhang
Baoyuan Wu
Yujiu Yang
+ PDF Chat A Comprehensive Survey of Scene Graphs: Generation and Application 2021 Xiaojun Chang
Pengzhen Ren
Pengfei Xu
Zhihui Li
Xiaojiang Chen
Alex Hauptmann
+ PDF Chat RelTR: Relation Transformer for Scene Graph Generation 2023 Yuren Cong
Michael Ying Yang
Bodo Rosenhahn
+ PDF Chat Temporal Reasoning Graph for Activity Recognition 2020 Jingran Zhang
Fumin Shen
Xing Xu
Heng Tao Shen
+ Improving Scene Graph Generation with Superpixel-Based Interaction Learning 2023 Jingyi Wang
Can Zhang
Jinfa Huang
Botao Ren
Zhidong Deng
+ Visual Distant Supervision for Scene Graph Generation 2021 Yuan Yao
Ao Zhang
Xu Han
Mengdi Li
Cornelius Weber
Zhiyuan Liu
Stefan Wermter
Maosong Sun
+ Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks 2017 Jun Xiao
Hao Ye
Xiangnan He
Hanwang Zhang
Fei Wu
Tat‐Seng Chua
+ PDF Chat Detecting Unseen Visual Relations Using Analogies 2019 Julia Peyre
Josef Sivic
Ivan Laptev
Cordelia Schmid

Works Cited by This (25)

Action Title Year Authors
+ A Review of Relational Machine Learning for Knowledge Graphs 2015 Maximilian Nickel
Kevin Murphy
Volker Tresp
Evgeniy Gabrilovich
+ Very Deep Convolutional Networks for Large-Scale Image Recognition 2014 Karen Simonyan
Andrew Zisserman
+ PDF Chat Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models 2015 Bryan A. Plummer
Liwei Wang
Chris M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
+ DRAW: A Recurrent Neural Network For Image Generation 2015 Karol Gregor
Ivo Danihelka
Alex Graves
Danilo Jimenez Rezende
Daan Wierstra
+ PDF Chat Deep visual-semantic alignments for generating image descriptions 2015 Andrej Karpathy
Li Fei-Fei
+ PDF Chat VQA: Visual Question Answering 2015 Stanislaw Antol
Aishwarya Agrawal
Jiasen Lu
Margaret Mitchell
Dhruv Batra
C. Lawrence Zitnick
Devi Parikh
+ PDF Chat Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation 2014 Ross Girshick
Jeff Donahue
Trevor Darrell
Jitendra Malik
+ PDF Chat Deep Residual Learning for Image Recognition 2016 Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
+ Deep Compositional Question Answering with Neural Module Networks 2015 Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Dan Klein
+ PDF Chat Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures 2016 Raffaella Bernardi
Ruket Çakıcı
Desmond Elliott
Aykut Erdem
Erkut Erdem
Nazlı İkizler-Cinbiş
Frank Keller
Adrian Muscat
Barbara Plank