InterFusion: Text-Driven Generation of 3D Human-Object Interaction

Type: Preprint

Publication Date: 2024-03-22

Citations: 1

DOI: https://doi.org/10.48550/arxiv.2403.15612

Abstract

In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with complex spatial relationships. To effectively address these issues, we present InterFusion, a two-stage framework specifically designed for HOI generation. InterFusion involves human pose estimations derived from text as geometric priors, which simplifies the text-to-3D conversion process and introduces additional constraints for accurate object generation. At the first stage, InterFusion extracts 3D human poses from a synthesized image dataset depicting a wide range of interactions, subsequently mapping these poses to interaction descriptions. The second stage of InterFusion capitalizes on the latest developments in text-to-3D generation, enabling the production of realistic and high-quality 3D HOI scenes. This is achieved through a local-global optimization process, where the generation of human body and object is optimized separately, and jointly refined with a global optimization of the entire scene, ensuring a seamless and contextually coherent integration. Our experimental results affirm that InterFusion significantly outperforms existing state-of-the-art methods in 3D HOI generation.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains 2024 Yixuan Zhang
Hui Yang
Chuanchen Luo
Junran Peng
Yuxi Wang
Zhaoxiang Zhang
+ CG-HOI: Contact-Guided 3D Human-Object Interaction Generation 2023 Christian Diller
Angela Dai
+ GenZI: Zero-Shot 3D Human-Scene Interaction Generation 2023 Lei Li
Angela Dai
+ PDF Chat Diverse 3D Human Pose Generation in Scenes based on Decoupled Structure 2024 Bowen Dang
Xi Zhao
+ PDF Chat InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction 2024 Sirui Xu
Ziyin Wang
Yu-Xiong Wang
Liang-Yan Gui
+ HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models 2023 Xiaogang Peng
Yiming Xie
Zizhao Wu
Varun Jampani
Deqing Sun
Huaizu Jiang
+ PDF Chat AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation 2024 Yukang Cao
Liang Pan
Kai Han
Kenneth K. Wong
Ziwei Liu
+ PDF Chat THOR: Text to Human-Object Interaction Diffusion via Relation Intervention 2024 Qianyang Wu
Shi Ye
Xiaoshui Huang
Jingyi Yu
Lan Xu
Jingya Wang
+ PDF Chat Contact-aware Human Motion Generation from Textual Descriptions 2024 Sihan Ma
Qiong Cao
Jing Zhang
Dacheng Tao
+ Text-guided 3D Human Generation from 2D Collections 2023 Tsu-Jui Fu
Wenhan Xiong
Yixin Nie
Jingyu Liu
Barlas Oğuz
William Wang
+ PDF Chat HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects 2024 Xintao Lv
Liang Xu
Yichao Yan
Xin Jin
Congsheng Xu
Shuwen Wu
Yifan Liu
Lincheng Li
Mengxiao Bi
Wenjun Zeng
+ Text-guided 3D Human Generation from 2D Collections 2023 Tsu-Jui Fu
Wenhan Xiong
Yixin Nie
Jingyu Liu
Barlas Oğuz
William Wang
+ PDF Chat Generating Human Motion in 3D Scenes from Text Descriptions 2024 Zhi Cen
Huaijin Pi
Sida Peng
Zehong Shen
Minghui Yang
Shuai Zhu
Hujun Bao
Xiaowei Zhou
+ TIPS: Text-Induced Pose Synthesis 2022 Prasun Kumar Roy
Subhankar Ghosh
Saumik Bhattacharya
Umapada Pal
Michael Blumenstein
+ PDF Chat Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction 2024 Junuk Cha
Jihyeon Kim
Jae Shin Yoon
Seungryul Baek
+ PDF Chat GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction 2024 Patrick Kwon
Hanbyul Joo
+ PDF Chat CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images 2023 Sookwan Han
Hanbyul Joo
+ CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images 2023 Sookwan Han
Hanbyul Joo
+ HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes 2022 Zan Wang
Yixin Chen
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
+ PDF Chat DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors 2024 Tieyuan Zhu
Ruining Li
Tomáš Jakab

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors