Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies
Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies
Modelling long-term dependencies is a challenge for recurrent neural networks. This is primarily due to the fact that gradients vanish during training, as the sequence length increases. Gradients can be attenuated by transition operators and are attenuated or dropped by activation functions. Canonical architectures like LSTM alleviate this issue by …