Ask a Question

Prefer a chat interface with context about you and your work?

Scheduled DropHead: A Regularization Method for Transformer Models

Scheduled DropHead: A Regularization Method for Transformer Models

We introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism which is a key component of transformer. In contrast to the conventional dropout mechanism which randomly drops units or connections, DropHead drops entire attention heads during training to prevent the multi-head attention model from being …