Ask a Question

Prefer a chat interface with context about you and your work?

Understanding Multimodal Deep Neural Networks: A Concept Selection View

Understanding Multimodal Deep Neural Networks: A Concept Selection View

The multimodal deep neural networks, represented by CLIP, have generated rich downstream applications owing to their excellent performance, thus making understanding the decision-making process of CLIP an essential research topic. Due to the complex structure and the massive pre-training data, it is often regarded as a black-box model that is …