MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang

Type: Preprint

Publication Date: 2024-07-08

Citations: 1

DOI: https://doi.org/10.48550/arxiv.2407.06460

View Publication

Download PDF

Abstract

Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approximate unlearning algorithms. The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. Using these criteria, we benchmark how effectively eight popular unlearning algorithms on 7B-parameter LMs can unlearn Harry Potter books and news articles. Our results demonstrate that most algorithms can prevent verbatim memorization and knowledge memorization to varying degrees, but only one algorithm does not lead to severe privacy leakage. Furthermore, existing algorithms fail to meet deployer's expectations because they often degrade general model utility and also cannot sustainably accommodate successive unlearning requests or large-scale content removal. Our findings identify key issues with the practicality of existing unlearning algorithms on language models, and we release our benchmark to facilitate further evaluations: muse-bench.github.io

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+ PDF Chat	Position: LLM Unlearning Benchmarks are Weak Measures of Progress	2024	Pratiksha Thaker Shengyuan Hu N. R. Kale Yash Maurya Zhiwei Steven Wu Virginia Smith
+ PDF Chat	To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models	2024	Bozhong Tian Xiaozhuan Liang Siyuan Cheng Qingbin Liu Mengru Wang Dianbo Sui Xi Chen Huajun Chen Ningyu Zhang
+ PDF Chat	Machine Unlearning for Traditional Models and Large Language Models: A Short Survey	2024	Yi Xu
+ PDF Chat	Eight Methods to Evaluate Robust Unlearning in LLMs	2024	Aengus Lynch Phillip Guo Aidan Ewart Stephen T. Casper Dylan Hadfield-Menell
+ PDF Chat	Digital Forgetting in Large Language Models: A Survey of Unlearning Methods	2024	Alberto Blanco-Justicia Najeeb Moharram Jebreel Benet Manzanares David Sánchez Josep Domingo‐Ferrer Guillem Collell Kuan Eeik Tan
+ PDF Chat	The Frontier of Data Erasure: Machine Unlearning for Large Language Models	2024	Youyang Qu Ming Ding Nian X. Sun Kanchana Thilakarathna Tianqing Zhu Dusit Niyato
+ PDF Chat	A Closer Look at Machine Unlearning for Large Language Models	2024	Xiaojian Yuan Tianyu Pang Chao Du Kejiang Chen Weiming Zhang Min Lin
+ PDF Chat	RESTOR: Knowledge Recovery through Machine Unlearning	2024	Keivan Rezaei Khyathi Raghavi Chandu Soheil Feizi Yejin Choi Faeze Brahman Abhilasha Ravichander
+ PDF Chat	Offset Unlearning for Large Language Models	2024	James Y. Huang Wenxuan Zhou Wang Fei Fred Morstatter Sheng Zhang Hoifung Poon Muhao Chen
+ PDF Chat	RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models	2024	Zhuoran Jin Pengfei Cao Chenhao Wang Zhitao He Hongbang Yuan Jiachun Li Yubo Chen Kang Liu Jun Zhao
+ PDF Chat	Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge	2024	Zhiwei Zhang Fali Wang Xiaomin Li Zongyu Wu Xianfeng Tang Hui Liu Qi He Wenpeng Yin Suhang Wang
+	Digital forgetting in large language models: a survey of unlearning methods	2025	Alberto Blanco-Justicia Najeeb Moharram Jebreel Benet Manzanares-Salor David Sánchez Josep Domingo‐Ferrer Guillem Collell Kuan Eeik Tan
+ PDF Chat	Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning	2024	Chongyu Fan Liu Jian-cheng Licong Lin Jinghan Jia Ruiqi Zhang Mei Song Sijia Liu
+ PDF Chat	Mitigating Memorization In Language Models	2024	Mansi Sakarvadia Aswathy Ajith Arham Khan Nathaniel Hudson Caleb Geniesse Kyle Chard Yaoqing Yang Ian Foster Michael W. Mahoney
+	Unlearn What You Want to Forget: Efficient Unlearning for LLMs	2023	Jiaao Chen Diyi Yang
+ PDF Chat	Unlearn What You Want to Forget: Efficient Unlearning for LLMs	2023	Jiaao Chen Diyi Yang
+ PDF Chat	Unlearning Reveals the Influential Training Data of Language Models	2024	Masaru Isonuma Ivan Titov
+ PDF Chat	MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts	2024	Tianle Gu Kexin Huang R.C. Luo Yuanqi Yao Yujiu Yang Teng Yan Yingchun Wang
+ PDF Chat	LLM Unlearning via Loss Adjustment with Only Forget Data	2024	Yaxuan Wang Jiaheng Wei Chris Yuhao Liu Jinlong Pang Quan Liu Ankit Parag Shah Yujia Bao Yang Liu Wei Wei
+ PDF Chat	Machine Unlearning in Large Language Models	2024	Saaketh Koundinya Gundavarapu Shreya Agarwal Arushi Arora Chandana Thimmalapura Jagadeeshaiah

Works That Cite This (1)

Action	Title	Year	Authors
+ PDF Chat	Reputation Management in the ChatGPT Era	2024	Reuben Binns Lilian Edwards

Works Cited by This (0)

Action	Title	Year	Authors