Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

Type: Preprint

Publication Date: 2023-01-01

Citations: 3

DOI: https://doi.org/10.48550/arxiv.2310.00603

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ Causal Proxy Models for Concept-Based Model Explanations 2022 Zhengxuan Wu
Karel D’Oosterlinck
Atticus Geiger
Amir Zur
Christopher Potts
+ PDF Chat Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models 2024 Wei Jie Yeo
Ranjan Satapthy
Erik Cambria
+ Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models 2021 Tongshuang Wu
Marco TĂșlio Ribeiro
Jeffrey Heer
Daniel S. Weld
+ PDF Chat Towards Faithful Model Explanation in NLP: A Survey 2024 Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
+ Towards Faithful Model Explanation in NLP: A Survey 2022 Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
+ PDF Chat Large Language Models As Faithful Explainers 2024 Yu-Neng Chuang
Guanchu Wang
Chia-Yuan Chang
Ruixiang Tang
Fan Yang
Mengnan Du
Xuanting Cai
Xia Hu
+ Quantifying Uncertainty in Natural Language Explanations of Large Language Models 2023 Sree Harsha Tanneru
Chirag Agarwal
Himabindu Lakkaraju
+ CREST: A Joint Framework for Rationalization and Counterfactual Text Generation 2023 Marcos Treviso
Alexis Ross
Ricardo Rei
André F. T. Martins
+ CREST: A Joint Framework for Rationalization and Counterfactual Text Generation 2023 Marcos Treviso
Alexis Ross
Ricardo Rei
André F. T. Martins
+ Explainability for Large Language Models: A Survey 2023 Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Mengnan Du
+ PDF Chat Natural Language Counterfactual Explanations for Graphs Using Large Language Models 2024 Flavio Giorgi
Cesare Campagnano
Fabrizio Silvestri
Gabriele Tolomei
+ Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI 2022 Suzanna Sia
Anton Belyy
Amjad Almahairi
Madian Khabsa
Luke Zettlemoyer
Lambert Mathias
+ PDF Chat Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI 2023 Suzanna Sia
Anton Belyy
Amjad Almahairi
Madian Khabsa
Luke Zettlemoyer
Lambert Mathias
+ PDF Chat Inference to the Best Explanation in Large Language Models 2024 Dhairya Dalal
Marco Valentino
André Freitas
Paul Buitelaar
+ PDF Chat Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models 2024 Chirag Agarwal
Sree Harsha Tanneru
Himabindu Lakkaraju
+ Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations 2023 Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Yu Zhou
Kathleen McKeown
+ PDF Chat The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models 2024 Noah Siegel
Oana-Maria Camburu
Nicolas Heess
MarĂ­a PĂ©rez‐Ortiz
+ Are Large Language Models Post Hoc Explainers? 2023 Nicholas Kroeger
Dan Ley
Satyapriya Krishna
Chirag Agarwal
Himabindu Lakkaraju
+ PDF Chat Using LLMs for Explaining Sets of Counterfactual Examples to Final Users 2024 Arturo Fredes
Jordi VitriĂ 
+ Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework 2023 R. P. Zhao
Xingxuan Li
Shafiq Joty
Chengwei Qin
Lidong Bing

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors