Towards More Robust Interpretation via Local Gradient Alignment
Towards More Robust Interpretation via Local Gradient Alignment
Neural network interpretation methods, particularly feature attribution methods, are known to be fragile with respect to adversarial input perturbations. To address this, several methods for enhancing the local smoothness of the gradient while training have been proposed for attaining robust feature attributions. However, the lack of considering the normalization of …