Self-supervised Pretraining and Finetuning for Monocular Depth and
Visual Odometry
Self-supervised Pretraining and Finetuning for Monocular Depth and
Visual Odometry
For the task of simultaneous monocular depth and visual odometry estimation, we propose learning self-supervised transformer-based models in two steps. Our first step consists in a generic pretraining to learn 3D geometry, using cross-view completion objective (CroCo), followed by self-supervised finetuning on non-annotated videos. We show that our self-supervised models …