Ask a Question

Prefer a chat interface with context about you and your work?

Multi-Head Attention with Disagreement Regularization

Multi-Head Attention with Disagreement Regularization

Multi-head attention is appealing for the ability to jointly attend to information from different representation subspaces at different positions. In this work, we introduce a disagreement regularization to explicitly encourage the diversity among multiple attention heads. Specifically, we propose three types of disagreement regularization, which respectively encourage the subspace, the …