CapsFusion: Rethinking Image-Text Data at Scale

Type: Preprint

Publication Date: 2023-01-01

Citations: 2

DOI: https://doi.org/10.48550/arxiv.2310.20550

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ Improving Multimodal Datasets with Image Captioning 2023 Thao D. Nguyen
Samir Yitzhak Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
+ PDF Chat Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis 2024 D. Bucciarelli
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
+ PDF Chat AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities 2023 Zhongzhi Chen
Guang Liu
Bowen Zhang
Qinghong Yang
Ledell Wu
+ PDF Chat Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models 2024 Zhengfeng Lai
Vasileios Saveris
Chen Chen
Hong-You Chen
Haotian Zhang
Bowen Zhang
Juan Lao Tebar
Wenze Hu
Zhe Gan
Peter Grasch
+ CoCa: Contrastive Captioners are Image-Text Foundation Models 2022 Jiahui Yu
Zirui Wang
Vijay K. Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
+ SIEVE: Multimodal Dataset Pruning Using Image Captioning Models 2023 Anas Mahmoud
Mostafa Elhoushi
Amro Abbas
Yang Yu
Newsha Ardalani
Hugh Leather
Ari S. Morcos
+ PDF Chat BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions 2024 Anas Awadalla
Le Xue
Manli Shu
An Yan
Jun Wang
Senthil Purushwalkam
Sheng Shen
Hannah Lee
Oscar Lo
Jae Sung Park
+ PDF Chat CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions 2024 Yanqing Liu
Xianhang Li
Zeyu Wang
Bingchen Zhao
Cihang Xie
+ SITTA: A Semantic Image-Text Alignment for Image Captioning 2023 Fabian Paischer
Thomas Adler
Markus Hofmarcher
Sepp Hochreiter
+ PDF Chat CompCap: Improving Multimodal Large Language Models with Composite Captions 2024 Xiaohong Chen
Satya Narayan Shukla
Mahmoud Azab
Ananya Singh
Qifan Wang
David Dawei Yang
Shengyun Peng
Hanchao Yu
Yan Shen
Xuewen Zhang
+ PDF Chat Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning 2024 Zhiyue Liu
Jinyuan Liu
Fanrong Ma
+ Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning 2023 Zhiyue Liu
Jinyuan Liu
Fanrong Ma
+ BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation 2022 Junnan Li
Dongxu Li
Caiming Xiong
Steven C. H. Hoi
+ ALIP: Adaptive Language-Image Pre-training with Synthetic Caption 2023 Kaicheng Yang
Jiankang Deng
Xiang An
Jiawei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
+ PDF Chat ALIP: Adaptive Language-Image Pre-training with Synthetic Caption 2023 Kaicheng Yang
Jiankang Deng
Xiang An
Jiawei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
+ Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner 2023 Zikang Liu
Sihan Chen
Longteng Guo
H. Li
Xingjian He
Jing Liu
+ Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner 2023 Zikang Liu
Sihan Chen
Longteng Guo
H. Li
Xingjian He
Jing Liu
+ PDF Chat ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation 2024 Moran Yanuka
Morris Alper
Hadar Averbuch‐Elor
Raja Giryes
+ From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions 2023 Zhengfeng Lai
Haotian Zhang
Wentao Wu
Haoping Bai
Aleksei Timofeev
Xianzhi Du
Zhe Gan
Jiulong Shan
Chen‐Nee Chuah
Yinfei Yang
+ PDF Chat NLIP: Noise-Robust Language-Image Pre-training 2023 Runhui Huang
Yanxin Long
Jianhua Han
Hang Xu
Xiwen Liang
Chunjing Xu
Xiaodan Liang

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors