PointCLIP: Point Cloud Understanding by CLIP

Type: Article

Publication Date: 2022-06-01

Citations: 200

DOI: https://doi.org/10.1109/cvpr52688.2022.00836

Abstract

Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP, pre-trained by large-scale image-text pairs in 2D, can be generalized to 3D recognition. In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP-encoded point clouds and 3D category texts. Specifically, we encode a point cloud by projecting it onto multi-view depth maps and aggregate the view-wise zero-shot prediction in an end-to-end manner, which achieves efficient knowledge transfer from 2D to 3D. We further design an inter-view adapter to better extract the global feature and adaptively fuse the 3D few-shot knowledge into CLIP pre-trained in 2D. By just fine-tuning the adapter under few-shot settings, the performance of PointCLIP could be largely improved. In addition, we observe the knowledge complementary property between PointCLIP and classical 3D-supervised networks. Via simple ensemble during inference, PointCLIP contributes to favorable performance enhancement over state-of-the-art 3D networks. Therefore, PointCLIP is a promising alternative for effective 3D point cloud understanding under low data regime with marginal resource cost. We conduct thorough experiments on Model-NetlO, ModelNet40 and ScanObjectNN to demonstrate the effectiveness of PointCLIP. Code is available at https://github.com/ZrrSkywalker/PointCLIP.

Locations

  • arXiv (Cornell University) - View - PDF
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - View

Similar Works

Action Title Year Authors
+ PointCLIP: Point Cloud Understanding by CLIP 2021 Renrui Zhang
Z. J. Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Bin Cui
Yu Qiao
Peng Gao
Hongsheng Li
+ PDF Chat PointCLIP: Point Cloud Understanding by CLIP 2021 Renrui Zhang
Z. J. Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Bin Cui
Yu Qiao
Peng Gao
Hongsheng Li
+ CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP 2023 Runnan Chen
Youquan Liu
Lingdong Kong
Xinge Zhu
Yuexin Ma
Yikang Li
Yuenan Hou
Yu Qiao
Wenping Wang
+ CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data 2023 Yihan Zeng
Chenhan Jiang
Jiageng Mao
Jianhua Han
Chaoqiang Ye
Qingqiu Huang
Dit‐Yan Yeung
Zhen Yang
Xiaodan Liang
Hang Xu
+ PDF Chat CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention 2023 Z. J. Guo
Renrui Zhang
Longtian Qiu
Xianzheng Ma
Xupeng Miao
Xuming He
Bin Cui
+ PDF Chat ESP-Zero: Unsupervised enhancement of zero-shot classification for Extremely Sparse Point cloud 2024 Jiayi Han
Zidi Cao
Weibo Zheng
Xiangguo Zhou
Xiangjian He
Yuanfang Zhang
Daisen Wei
+ CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention 2022 Z. J. Guo
Renrui Zhang
Longtian Qiu
Xianzheng Ma
Xupeng Miao
Xuming He
Bin Cui
+ CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training 2022 Tianyu Huang
Bowen Dong
Yunhan Yang
Xiaoshui Huang
Rynson W. H. Lau
Wanli Ouyang
Wangmeng Zuo
+ PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning 2022 Xiangyang Zhu
Renrui Zhang
Bowei He
Ziyao Zeng
Shanghang Zhang
Peng Gao
+ PDF Chat PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning 2023 Xiangyang Zhu
Renrui Zhang
Bowei He
Ziyu Guo
Ziyao Zeng
Zipeng Qin
Shanghang Zhang
Peng Gao
+ PDF Chat CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP 2023 Runnan Chen
Youquan Liu
Lingdong Kong
Xinge Zhu
Yuexin Ma
Yikang Li
Yuenan Hou
Yu Qiao
Wenping Wang
+ CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition 2023 Deepti Hegde
Jeya Maria Jose Valanarasu
Vishal M. Patel
+ PDF Chat CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training 2023 Tianyu Huang
Bowen Dong
Yunhan Yang
Xiaoshui Huang
Rynson W. H. Lau
Wanli Ouyang
Wangmeng Zuo
+ PDF Chat CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition 2023 Deepti Hegde
Jeya Maria Jose Valanarasu
Vishal M. Patel
+ EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder 2022 Xiaoshui Huang
Sheng Li
Wentao Qu
Tong He
Yifan Zuo
Wanli Ouyang
+ ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding 2022 Le Xue
Mingfei Gao
Xing Chen
Roberto MartĂ­n-MartĂ­n
Jiajun Wu
Caiming Xiong
Ran Xu
Juan Carlos Niebles
Silvio Savarese
+ PDF Chat ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding 2023 Le Xue
Mingfei Gao
Xing Chen
Roberto MartĂ­n-MartĂ­n
Jiajun Wu
Caiming Xiong
Ran Xu
Juan Carlos Niebles
Silvio Savarese
+ Joint Representation Learning for Text and 3D Point Cloud 2023 Rui Huang
Xuran Pan
Henry Zheng
Haojun Jiang
Zhifeng Xie
Shiji Song
Gao Huang
+ OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding 2023 Minghua Liu
Ruoxi Shi
Kaiming Kuang
Yinhao Zhu
Xuanlin Li
Shizhong Han
Hong Cai
Fatih Porikli
Hao Su
+ PDF Chat CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP 2023 Junbo Zhang
Runpei Dong
Kaisheng Ma