MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Type: Preprint

Publication Date: 2024-07-08

Citations: 1

DOI: https://doi.org/10.48550/arxiv.2407.06460

Abstract

Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approximate unlearning algorithms. The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. Using these criteria, we benchmark how effectively eight popular unlearning algorithms on 7B-parameter LMs can unlearn Harry Potter books and news articles. Our results demonstrate that most algorithms can prevent verbatim memorization and knowledge memorization to varying degrees, but only one algorithm does not lead to severe privacy leakage. Furthermore, existing algorithms fail to meet deployer's expectations because they often degrade general model utility and also cannot sustainably accommodate successive unlearning requests or large-scale content removal. Our findings identify key issues with the practicality of existing unlearning algorithms on language models, and we release our benchmark to facilitate further evaluations: muse-bench.github.io

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Position: LLM Unlearning Benchmarks are Weak Measures of Progress 2024 Pratiksha Thaker
Shengyuan Hu
N. R. Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
+ PDF Chat To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models 2024 Bozhong Tian
Xiaozhuan Liang
Siyuan Cheng
Qingbin Liu
Mengru Wang
Dianbo Sui
Xi Chen
Huajun Chen
Ningyu Zhang
+ PDF Chat Machine Unlearning for Traditional Models and Large Language Models: A Short Survey 2024 Yi Xu
+ PDF Chat Eight Methods to Evaluate Robust Unlearning in LLMs 2024 Aengus Lynch
Phillip Guo
Aidan Ewart
Stephen T. Casper
Dylan Hadfield-Menell
+ PDF Chat Digital Forgetting in Large Language Models: A Survey of Unlearning Methods 2024 Alberto Blanco-Justicia
Najeeb Moharram Jebreel
Benet Manzanares
David Sánchez
Josep Domingo‐Ferrer
Guillem Collell
Kuan Eeik Tan
+ PDF Chat The Frontier of Data Erasure: Machine Unlearning for Large Language Models 2024 Youyang Qu
Ming Ding
Nian X. Sun
Kanchana Thilakarathna
Tianqing Zhu
Dusit Niyato
+ PDF Chat A Closer Look at Machine Unlearning for Large Language Models 2024 Xiaojian Yuan
Tianyu Pang
Chao Du
Kejiang Chen
Weiming Zhang
Min Lin
+ PDF Chat RESTOR: Knowledge Recovery through Machine Unlearning 2024 Keivan Rezaei
Khyathi Raghavi Chandu
Soheil Feizi
Yejin Choi
Faeze Brahman
Abhilasha Ravichander
+ PDF Chat Offset Unlearning for Large Language Models 2024 James Y. Huang
Wenxuan Zhou
Wang Fei
Fred Morstatter
Sheng Zhang
Hoifung Poon
Muhao Chen
+ PDF Chat RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models 2024 Zhuoran Jin
Pengfei Cao
Chenhao Wang
Zhitao He
Hongbang Yuan
Jiachun Li
Yubo Chen
Kang Liu
Jun Zhao
+ PDF Chat Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge 2024 Zhiwei Zhang
Fali Wang
Xiaomin Li
Zongyu Wu
Xianfeng Tang
Hui Liu
Qi He
Wenpeng Yin
Suhang Wang
+ Digital forgetting in large language models: a survey of unlearning methods 2025 Alberto Blanco-Justicia
Najeeb Moharram Jebreel
Benet Manzanares-Salor
David Sánchez
Josep Domingo‐Ferrer
Guillem Collell
Kuan Eeik Tan
+ PDF Chat Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning 2024 Chongyu Fan
Liu Jian-cheng
Licong Lin
Jinghan Jia
Ruiqi Zhang
Mei Song
Sijia Liu
+ PDF Chat Mitigating Memorization In Language Models 2024 Mansi Sakarvadia
Aswathy Ajith
Arham Khan
Nathaniel Hudson
Caleb Geniesse
Kyle Chard
Yaoqing Yang
Ian Foster
Michael W. Mahoney
+ Unlearn What You Want to Forget: Efficient Unlearning for LLMs 2023 Jiaao Chen
Diyi Yang
+ PDF Chat Unlearn What You Want to Forget: Efficient Unlearning for LLMs 2023 Jiaao Chen
Diyi Yang
+ PDF Chat Unlearning Reveals the Influential Training Data of Language Models 2024 Masaru Isonuma
Ivan Titov
+ PDF Chat MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts 2024 Tianle Gu
Kexin Huang
R.C. Luo
Yuanqi Yao
Yujiu Yang
Teng Yan
Yingchun Wang
+ PDF Chat LLM Unlearning via Loss Adjustment with Only Forget Data 2024 Yaxuan Wang
Jiaheng Wei
Chris Yuhao Liu
Jinlong Pang
Quan Liu
Ankit Parag Shah
Yujia Bao
Yang Liu
Wei Wei
+ PDF Chat Machine Unlearning in Large Language Models 2024 Saaketh Koundinya Gundavarapu
Shreya Agarwal
Arushi Arora
Chandana Thimmalapura Jagadeeshaiah

Works That Cite This (1)

Action Title Year Authors
+ PDF Chat Reputation Management in the ChatGPT Era 2024 Reuben Binns
Lilian Edwards

Works Cited by This (0)

Action Title Year Authors