CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control

Type: Preprint

Publication Date: 2025-01-10

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2501.06006

Abstract

We propose a method for generating fly-through videos of a scene, from a single image and a given camera trajectory. We build upon an image-to-video latent diffusion model. We condition its UNet denoiser on the camera trajectory, using four techniques. (1) We condition the UNet's temporal blocks on raw camera extrinsics, similar to MotionCtrl. (2) We use images containing camera rays and directions, similar to CameraCtrl. (3) We reproject the initial image to subsequent frames and use the resulting video as a condition. (4) We use 2D<=>3D transformers to introduce a global 3D representation, which implicitly conditions on the camera poses. We combine all conditions in a ContolNet-style architecture. We then propose a metric that evaluates overall video quality and the ability to preserve details with view changes, which we use to analyze the trade-offs of individual and combined conditions. Finally, we identify an optimal combination of conditions. We calibrate camera positions in our datasets for scale consistency across scenes, and we train our scene exploration model, CamCtrl3D, demonstrating state-of-theart results.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models 2024 Rundi Wu
Ruiqi Gao
Ben Poole
Alex Trevithick
Changxi Zheng
Jonathan T. Barron
Aleksander Holynski
+ PDF Chat Wonderland: Navigating 3D Scenes from a Single Image 2024 Hanwen Liang
Jun‐Li Cao
Vidit Goel
Guocheng Qian
S.P. Korolev
Demetri Terzopoulos
Konstantinos N. Plataniotis
Sergey Tulyakov
Jian Ren
+ PDF Chat Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training 2024 Zhenghong Zhou
Jie An
Jiebo Luo
+ PDF Chat DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models 2023 Shengqu Cai
Eric Ryan Chan
Songyou Peng
Mohamad Shahbazi
Anton Obukhov
Luc Van Gool
Gordon Wetzstein
+ DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models 2022 Shengqu Cai
Eric Ryan Chan
Songyou Peng
Mohamad Shahbazi
Anton Obukhov
Luc Van Gool
Gordon Wetzstein
+ PDF Chat MultiDiff: Consistent Novel View Synthesis from a Single Image 2024 Norman MĂźller
K Schwarz
Barbara Roessle
Lorenzo Porzi
Samuel Rota Bulò
Matthias Nießner
Peter Kontschieder
+ ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models 2023 Jeong-gi Kwak
Erqun Dong
Yuhe Jin
Hanseok Ko
Shweta Mahajan
Kwang Moo Yi
+ PDF Chat ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model 2024 Fangfu Liu
Wenqiang Sun
Hanyang Wang
Yikai Wang
Haowen Sun
Junliang Ye
Jun Zhang
Yueqi Duan
+ PDF Chat DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos 2024 Wen-Hsuan Chu
Lei Ke
Katerina Fragkiadaki
+ PDF Chat LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors 2024 Yabo Chen
Yang Chen
Jiemin Fang
Xiaopeng Zhang
Lingxi Xie
Weiming Shen
Wenrui Dai
Hongkai Xiong
Qi Tian
+ PDF Chat Controlling Space and Time with Diffusion Models 2024 Daniel Watson
Saurabh Saxena
Lala Li
Andrea Tagliasacchi
David J. Fleet
+ Diffusion Priors for Dynamic View Synthesis from Monocular Videos 2024 Chaoyang Wang
Peiye Zhuang
Aliaksandr Siarohin
Jun‐Li Cao
Guocheng Qian
Hsin-Ying Lee
Sergey Tulyakov
+ PDF Chat World-consistent Video Diffusion with Explicit 3D Modeling 2024 Qihang Zhang
Shuangfei Zhai
Miguel Ángel Bautista
Ke-Xuan Miao
Alexander Toshev
Joshua M. Susskind
Jiatao Gu
+ AutoDecoding Latent 3D Diffusion Models 2023 Evangelos Ntavelis
Aliaksandr Siarohin
Kyle Olszewski
Chaoyang Wang
Luc Van Gool
Sergey Tulyakov
+ PDF Chat CAT3D: Create Anything in 3D with Multi-View Diffusion Models 2024 Ruiqi Gao
Aleksander Holynski
Philipp Henzler
Arthur Brussee
Ricardo Martin-Brualla
Pratul P. Srinivasan
Jonathan T. Barron
Ben Poole
+ Dynamic Scene Novel View Synthesis via Deferred Spatio-temporal Consistency 2021 Beatrix-Emőke Fülöp-Balogh
Eleanor Tursman
James Tompkin
Julie Digne
Nicolas Bonneel
+ PDF Chat Training-free Camera Control for Video Generation 2024 Chen Hou
Guoqiang Wei
Yan Zeng
Zhibo Chen
+ PDF Chat Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views 2024 Zi–Xin Zou
Weihao Cheng
Yan–Pei Cao
Shi-Sheng Huang
Ying Shan
Song–Hai Zhang
+ Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views 2023 Zi–Xin Zou
Weihao Cheng
Yan–Pei Cao
Shi-Sheng Huang
Ying Shan
Song–Hai Zhang
+ Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image 2023 Liao Shen
Xingyi Li
Huiqiang Sun
Juewen Peng
Ke Xian
Zhiguo Cao
Guosheng Lin

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors