ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

Type: Article

Publication Date: 2024-03-24

Citations: 4

DOI: https://doi.org/10.1609/aaai.v38i13.29371

Abstract

The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in target images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome. The data, code, and interactive demo is available at: https://conceptbed.github.io/

Locations

  • Proceedings of the AAAI Conference on Artificial Intelligence - View - PDF
  • arXiv (Cornell University) - View - PDF
  • Maryland Shared Open Access Repository (USMAI Consortium) - View - PDF

Similar Works

Action Title Year Authors
+ ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models 2023 Maitreya Patel
Tejas Gokhale
Chitta Baral
Yezhou Yang
+ PDF Chat ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty 2024 Xindi Wu
Dingli Yu
Yangsibo Huang
Olga Russakovsky
Sanjeev Arora
+ Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else 2023 Hazarapet Tunanyan
Dejia Xu
Shant Navasardyan
Shuicheng Yan
Humphrey Shi
+ PDF Chat Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models 2024 Gihyun Kwon
Simon Jenni
Dingzeyu Li
Joon‐Young Lee
Jong Chul Ye
Fabian Caba Heilbron
+ PDF Chat CusConcept: Customized Visual Concept Decomposition with Diffusion Models 2024 Zhi Xu
Shaozhe Hao
Kai Han
+ PDF Chat Non-confusing Generation of Customized Concepts in Diffusion Models 2024 Lin Wang
Jingyuan Chen
Jiaxin Shi
Yichen Zhu
Liang Chen
Junzhong Miao
Tao Jin
Zhou Zhao
Fei Wu
Shuicheng Yan
+ PDF Chat MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models 2024 Donghao Zhou
Jiancheng Huang
Jinbin Bai
Jiaze Wang
Hao Chen
Guangyong Chen
Xiaowei Hu
Pheng‐Ann Heng
+ PDF Chat Attention Calibration for Disentangled Text-to-Image Personalization 2024 Yanbing Zhang
Mengping Yang
Qin Zhou
Zhe Wang
+ PDF Chat GRADE: Quantifying Sample Diversity in Text-to-Image Models 2024 Royi Rassin
Aviv Slobodkin
Shauli Ravfogel
Yanai Elazar
Yoav Goldberg
+ PDF Chat Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models 2024 Salma Abdel Magid
Weiwei Pan
Simon Warchol
Grace Guo
Junsik Kim
Mahia Rahman
Hanspeter Pfister
+ PDF Chat Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation 2024 Junjie Shentu
Matthew Watson
Noura Al Moubayed
+ PDF Chat Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting 2024 Weili Zeng
Yichao Yan
Qi Zhu
Zhuo Chen
Pengzhi Chu
Weiming Zhao
Xiaokang Yang
+ DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback 2023 Jiao Sun
Deqing Fu
Yushi Hu
Su Wang
Royi Rassin
Da-Cheng Juan
Dana Alon
Charles Herrmann
Sjoerd van Steenkiste
Ranjay Krishna
+ Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models 2023 Saman Motamed
Danda Pani Paudel
Luc Van Gool
+ PDF Chat T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts 2024 Ziwei Huang
Wanggui He
Qian Long
Yandi Wang
Haoyuan Li
Zhelun Yu
Fangxun Shu
Long Chen
Hao Jiang
Leilei Gan
+ Multi-Concept Customization of Text-to-Image Diffusion 2022 Nupur Kumari
Bingliang Zhang
Richard Zhang
Eli Shechtman
Jun-Yan Zhu
+ PDF Chat MultiBooth: Towards Generating All Your Concepts in an Image from Text 2024 Chenyang Zhu
Kai Li
Yue Ma
Chunming He
Xiu Li
+ PDF Chat Multi-Concept Customization of Text-to-Image Diffusion 2023 Nupur Kumari
Bingliang Zhang
Richard Zhang
Eli Shechtman
Jun-Yan Zhu
+ PDF Chat Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation 2024 Raphael Tang
Xinyu Zhang
Lixinyu Xu
Yao Lu
Wenyan Li
Pontus Stenetorp
Jimmy Lin
Ferhan TĂźre
+ Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark 2022 Vitali Petsiuk
Alexander E. Siemenn
Saisamrit Surbehera
Zad Chin
Keith Tyser
Gregory Hunter
Arvind Raghavan
Yann Hicke
Bryan A. Plummer
Ori Kerret

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (29)

Action Title Year Authors
+ PDF Chat Deep Residual Learning for Image Recognition 2016 Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
+ PDF Chat Deeper, Broader and Artier Domain Generalization 2017 Da Li
Yongxin Yang
Yi-Zhe Song
Timothy M. Hospedales
+ The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision 2019 Jiayuan Mao
Chuang Gan
Pushmeet Kohli
Joshua B. Tenenbaum
Jiajun Wu
+ Neural Discrete Representation Learning 2017 Aäron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
+ Concept Bottleneck Models 2020 Pang Wei Koh
Thao Nguyen
Yew Siang Tang
Stephen Mussmann
Emma Pierson
Been Kim
Percy Liang
+ PDF Chat The Open Images Dataset V4: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale 2020 Alina Kuznetsova
Hassan Rom
Neil Alldrin
Jasper Uijlings
Ivan Krasin
Jordi Pont-Tuset
Shahab Kamali
Stefan Popov
Matteo Malloci
Alexander Kolesnikov
+ ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision 2021 Wonjae Kim
Bokyung Son
Ildoo Kim
+ PDF Chat CLIPScore: A Reference-free Evaluation Metric for Image Captioning 2021 Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
+ Learning Transferable Visual Models From Natural Language Supervision 2021 Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
+ DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models 2022 Jaemin Cho
Abhay Zala
Mohit Bansal