Visual Translation Embedding Network for Visual Relation Detection

Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat‐Seng Chua

Type: Preprint

Publication Date: 2017-07-01

Citations: 537

DOI: https://doi.org/10.1109/cvpr.2017.331

Abstract

Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate ≈ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-toend relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors [27].

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Visual Translation Embedding Network for Visual Relation Detection	2017	Hanwang Zhang Zawlin Kyaw Shih‐Fu Chang Tat‐Seng Chua
+	Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation	2020	Zih-Siou Hung Arun Mallya Svetlana Lazebnik
+ PDF Chat	Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations	2021	Meng-Jiun Chiou Roger Zimmermann Jiashi Feng
+	Visual Relationship Detection with Language Priors	2016	Cewu Lu Ranjay Krishna Michael S. Bernstein Li Fei-Fei
+	RelTransformer: Balancing the Visual Relationship Detection from Local Context, Scene and Memory.	2021	Jun Chen Aniket Agarwal Sherif Abdelkarim Deyao Zhu Mohamed Elhoseiny
+	Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection	2019	Nikolaos Gkanatsios Vassilis Pitsikalis Petros Koutras Athanasia Zlatintsi Petros Maragos
+ PDF Chat	Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection	2019	Nikolaos Gkanatsios Vassilis Pitsikalis Petros Koutras Athanasia Zlatintsi Petros Maragos
+	Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation	2017	Ruichi Yu Ang Li Vlad I. Morariu Larry S. Davis
+	Unified Visual Relationship Detection with Vision and Language Models	2023	L. Zhao Liangzhe Yuan Boqing Gong Yin Cui Florian Schroff Ming–Hsuan Yang Hartwig Adam Ting Liu
+ PDF Chat	Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation	2017	Ruichi Yu Ang Li Vlad I. Morariu Larry S. Davis
+ PDF Chat	Detecting Unseen Visual Relations Using Analogies	2019	Julia Peyre Josef Sivic Ivan Laptev Cordelia Schmid
+	Detecting unseen visual relations using analogies	2018	Julia Peyre Ivan Laptev Cordelia Schmid Josef Šivic
+	Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions	2018	Stephan Baier Yunpu Ma Volker Tresp
+	Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions	2018	Stephan Baier Yunpu Ma Volker Tresp
+ PDF Chat	RelationVLM: Making Large Vision-Language Models Understand Visual Relations	2024	Zhipeng Huang Zhizheng Zhang Zheng-Jun Zha Yan Lü Baining Guo
+ PDF Chat	Look, Learn and Leverage (L$^3$): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment	2024	Hanchen Xie Jiageng Zhu Mahyar Khayatkhoei Jiazhi Li Wael AbdAlmageed
+	RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning	2022	Xiaojian Ma Weili Nie Zhiding Yu Huaizu Jiang Chaowei Xiao Yuke Zhu Song‐Chun Zhu Anima Anandkumar
+	Visual Relationship Detection with Language prior and Softmax	2019	Jaewon Jung Jongyoul Park
+ PDF Chat	Visual Relationship Detection with Language prior and Softmax	2018	Jaewon Jung Jongyoul Park
+	Visual Relationship Detection Based on Guided Proposals and Semantic Knowledge Distillation	2018	François Plesse Alexandru Ginsca Bertrand Delezoide Françoise Prêteux

Works That Cite This (258)

Action	Title	Year	Authors
+ PDF Chat	REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in Clutter	2022	Hanbo Zhang Deyu Yang Han Wang Binglei Zhao Xuguang Lan Jishiyu Ding Nanning Zheng
+ PDF Chat	Generative Compositional Augmentations for Scene Graph Prediction	2021	B. A. Knyazev Harm de Vries Cătălina Cangea Graham W. Taylor Aaron Courville Eugene Belilovsky
+	Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation	2021	Gengcong Yang Jingyi Zhang Yong Zhang Baoyuan Wu Yujiu Yang
+ PDF Chat	A Comprehensive Survey of Scene Graphs: Generation and Application	2021	Xiaojun Chang Pengzhen Ren Pengfei Xu Zhihui Li Xiaojiang Chen Alex Hauptmann
+ PDF Chat	RelTR: Relation Transformer for Scene Graph Generation	2023	Yuren Cong Michael Ying Yang Bodo Rosenhahn
+ PDF Chat	Temporal Reasoning Graph for Activity Recognition	2020	Jingran Zhang Fumin Shen Xing Xu Heng Tao Shen
+	Improving Scene Graph Generation with Superpixel-Based Interaction Learning	2023	Jingyi Wang Can Zhang Jinfa Huang Botao Ren Zhidong Deng
+	Visual Distant Supervision for Scene Graph Generation	2021	Yuan Yao Ao Zhang Xu Han Mengdi Li Cornelius Weber Zhiyuan Liu Stefan Wermter Maosong Sun
+	Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks	2017	Jun Xiao Hao Ye Xiangnan He Hanwang Zhang Fei Wu Tat‐Seng Chua
+ PDF Chat	Detecting Unseen Visual Relations Using Analogies	2019	Julia Peyre Josef Sivic Ivan Laptev Cordelia Schmid

Works Cited by This (25)

Action	Title	Year	Authors
+	A Review of Relational Machine Learning for Knowledge Graphs	2015	Maximilian Nickel Kevin Murphy Volker Tresp Evgeniy Gabrilovich
+	Very Deep Convolutional Networks for Large-Scale Image Recognition	2014	Karen Simonyan Andrew Zisserman
+ PDF Chat	Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models	2015	Bryan A. Plummer Liwei Wang Chris M. Cervantes Juan C. Caicedo Julia Hockenmaier Svetlana Lazebnik
+	DRAW: A Recurrent Neural Network For Image Generation	2015	Karol Gregor Ivo Danihelka Alex Graves Danilo Jimenez Rezende Daan Wierstra
+ PDF Chat	Deep visual-semantic alignments for generating image descriptions	2015	Andrej Karpathy Li Fei-Fei
+ PDF Chat	VQA: Visual Question Answering	2015	Stanislaw Antol Aishwarya Agrawal Jiasen Lu Margaret Mitchell Dhruv Batra C. Lawrence Zitnick Devi Parikh
+ PDF Chat	Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation	2014	Ross Girshick Jeff Donahue Trevor Darrell Jitendra Malik
+ PDF Chat	Deep Residual Learning for Image Recognition	2016	Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
+	Deep Compositional Question Answering with Neural Module Networks	2015	Jacob Andreas Marcus Rohrbach Trevor Darrell Dan Klein
+ PDF Chat	Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures	2016	Raffaella Bernardi Ruket Çakıcı Desmond Elliott Aykut Erdem Erkut Erdem Nazlı İkizler-Cinbiş Frank Keller Adrian Muscat Barbara Plank