Ask a Question

Prefer a chat interface with context about you and your work?

Kronecker Attention Networks

Kronecker Attention Networks

Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs …