ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion
Transformer
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion
Transformer
The recent surge of interest in comprehensive multimodal models has necessitated the unification of diverse modalities. However, the unification suffers from disparate methodologies. Continuous visual generation necessitates the full-sequence diffusion-based approach, despite its divergence from the autoregressive modeling in the text domain. We posit that autoregressive modeling, i.e., predicting the …