ReAGent: Towards A Model-agnostic Feature Attribution Method for
Generative Language Models
ReAGent: Towards A Model-agnostic Feature Attribution Method for
Generative Language Models
Feature attribution methods (FAs), such as gradients and attention, are widely employed approaches to derive the importance of all input features to the model predictions. Existing work in natural language processing has mostly focused on developing and testing FAs for encoder-only language models (LMs) in classification tasks. However, it is …