Ask a Question

Prefer a chat interface with context about you and your work?

A causal framework for explaining the predictions of black-box sequence-to-sequence models

A causal framework for explaining the predictions of black-box sequence-to-sequence models

We interpret the predictions of any black-box structured input-structured output model around a specific input-output pair. Our method returns an "explanation" consisting of groups of input-output tokens that are causally related. These dependencies are inferred by querying the model with perturbed inputs, generating a graph over tokens from the responses, …