CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control

Stefan Popov, A. Stephan Antony Raj, Michael Krainin, Yuanzhen Li, William T. Freeman, Michael Rubinstein

Type: Preprint

Publication Date: 2025-01-10

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2501.06006

Abstract

We propose a method for generating fly-through videos of a scene, from a single image and a given camera trajectory. We build upon an image-to-video latent diffusion model. We condition its UNet denoiser on the camera trajectory, using four techniques. (1) We condition the UNet's temporal blocks on raw camera extrinsics, similar to MotionCtrl. (2) We use images containing camera rays and directions, similar to CameraCtrl. (3) We reproject the initial image to subsequent frames and use the resulting video as a condition. (4) We use 2D<=>3D transformers to introduce a global 3D representation, which implicitly conditions on the camera poses. We combine all conditions in a ContolNet-style architecture. We then propose a metric that evaluates overall video quality and the ability to preserve details with view changes, which we use to analyze the trade-offs of individual and combined conditions. Finally, we identify an optimal combination of conditions. We calibrate camera positions in our datasets for scale consistency across scenes, and we train our scene exploration model, CamCtrl3D, demonstrating state-of-theart results.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+ PDF Chat	CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models	2024	Rundi Wu Ruiqi Gao Ben Poole Alex Trevithick Changxi Zheng Jonathan T. Barron Aleksander Holynski
+ PDF Chat	Wonderland: Navigating 3D Scenes from a Single Image	2024	Hanwen Liang Jun‐Li Cao Vidit Goel Guocheng Qian S.P. Korolev Demetri Terzopoulos Konstantinos N. Plataniotis Sergey Tulyakov Jian Ren
+ PDF Chat	Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training	2024	Zhenghong Zhou Jie An Jiebo Luo
+ PDF Chat	DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models	2023	Shengqu Cai Eric Ryan Chan Songyou Peng Mohamad Shahbazi Anton Obukhov Luc Van Gool Gordon Wetzstein
+	DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models	2022	Shengqu Cai Eric Ryan Chan Songyou Peng Mohamad Shahbazi Anton Obukhov Luc Van Gool Gordon Wetzstein
+ PDF Chat	MultiDiff: Consistent Novel View Synthesis from a Single Image	2024	Norman Müller K Schwarz Barbara Roessle Lorenzo Porzi Samuel Rota Bulò Matthias Nießner Peter Kontschieder
+	ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models	2023	Jeong-gi Kwak Erqun Dong Yuhe Jin Hanseok Ko Shweta Mahajan Kwang Moo Yi
+ PDF Chat	ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model	2024	Fangfu Liu Wenqiang Sun Hanyang Wang Yikai Wang Haowen Sun Junliang Ye Jun Zhang Yueqi Duan
+ PDF Chat	DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos	2024	Wen-Hsuan Chu Lei Ke Katerina Fragkiadaki
+ PDF Chat	LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors	2024	Yabo Chen Yang Chen Jiemin Fang Xiaopeng Zhang Lingxi Xie Weiming Shen Wenrui Dai Hongkai Xiong Qi Tian
+ PDF Chat	Controlling Space and Time with Diffusion Models	2024	Daniel Watson Saurabh Saxena Lala Li Andrea Tagliasacchi David J. Fleet
+	Diffusion Priors for Dynamic View Synthesis from Monocular Videos	2024	Chaoyang Wang Peiye Zhuang Aliaksandr Siarohin Jun‐Li Cao Guocheng Qian Hsin-Ying Lee Sergey Tulyakov
+ PDF Chat	World-consistent Video Diffusion with Explicit 3D Modeling	2024	Qihang Zhang Shuangfei Zhai Miguel Ángel Bautista Ke-Xuan Miao Alexander Toshev Joshua M. Susskind Jiatao Gu
+	AutoDecoding Latent 3D Diffusion Models	2023	Evangelos Ntavelis Aliaksandr Siarohin Kyle Olszewski Chaoyang Wang Luc Van Gool Sergey Tulyakov
+ PDF Chat	CAT3D: Create Anything in 3D with Multi-View Diffusion Models	2024	Ruiqi Gao Aleksander Holynski Philipp Henzler Arthur Brussee Ricardo Martin-Brualla Pratul P. Srinivasan Jonathan T. Barron Ben Poole
+	Dynamic Scene Novel View Synthesis via Deferred Spatio-temporal Consistency	2021	Beatrix-Emőke Fülöp-Balogh Eleanor Tursman James Tompkin Julie Digne Nicolas Bonneel
+ PDF Chat	Training-free Camera Control for Video Generation	2024	Chen Hou Guoqiang Wei Yan Zeng Zhibo Chen
+ PDF Chat	Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views	2024	Zi–Xin Zou Weihao Cheng Yan–Pei Cao Shi-Sheng Huang Ying Shan Song–Hai Zhang
+	Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views	2023	Zi–Xin Zou Weihao Cheng Yan–Pei Cao Shi-Sheng Huang Ying Shan Song–Hai Zhang
+	Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image	2023	Liao Shen Xingyi Li Huiqiang Sun Juewen Peng Ke Xian Zhiguo Cao Guosheng Lin

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors