Scheduled DropHead: A Regularization Method for Transformer Models
Scheduled DropHead: A Regularization Method for Transformer Models
We introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism which is a key component of transformer. In contrast to the conventional dropout mechanism which randomly drops units or connections, DropHead drops entire attention heads during training to prevent the multi-head attention model from being …