Ignore Previous Prompt: Attack Techniques For Language Models

Type: Preprint

Publication Date: 2022-01-01

Citations: 46

DOI: https://doi.org/10.48550/arxiv.2211.09527

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Jailbreaking and Mitigation of Vulnerabilities in Large Language Models 2024 Benji Peng
Ziqian Bi
Qian Niu
Ming Liu
Pohsun Feng
Tianyang Wang
Lingzhi Yan
Yizhu Wen
Yichao Zhang
Caitlyn Heqi Yin
+ PDF Chat Jailbreaking as a Reward Misspecification Problem 2024 Zhihui Xie
Jiahui Gao
Lei Li
Zhenguo Li
Qi Liu
Lingpeng Kong
+ AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models 2023 Xiaogeng Liu
Nan Xu
Muhao Chen
Chaowei Xiao
+ PDF Chat Robustness-aware Automatic Prompt Optimization 2024 Zhihua Shi
Zhenting Wang
Yongye Su
Weidi Luo
Fan Yang
Yongfeng Zhang
+ PDF Chat Automatic and Universal Prompt Injection Attacks against Large Language Models 2024 Xiaogeng Liu
Zhiyuan Yu
Yizhe Zhang
Ning Zhang
Chaowei Xiao
+ PDF Chat Enhancing Adversarial Attacks through Chain of Thought 2024 Juan Su
+ PDF Chat Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment 2024 Jason Vega
Junsheng Huang
Guoqing Zhang
Hangoo Kang
Minjia Zhang
Mayank Singh
+ PDF Chat Tastle: Distract Large Language Models for Automatic Jailbreak Attack 2024 Zeguan Xiao
Yan Yang
Guanhua Chen
Yun Chen
+ PDF Chat Jailbreak Attacks and Defenses Against Large Language Models: A Survey 2024 Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
+ PDF Chat JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs 2024 Hongyi Li
J. S Ye
Jie Wu
Tianjie Yan
Chu Wang
Zhixin Li
+ Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation 2023 Yangsibo Huang
Samyak Gupta
Mengzhou Xia
Kai Li
Danqi Chen
+ PDF Chat Dagger Behind Smile: Fool LLMs with a Happy Ending Story 2025 X. Y. Song
Zhenglei Xie
Shuo Huai
J.L.C. Kong
Jun Luo
+ Open Sesame! Universal Black-Box Jailbreaking of Large Language Models 2024 Raz Lapid
Ron Langberg
Moshe Sipper
+ PDF Chat EEG-Defender: Defending against Jailbreak through Early Exit Generation of Large Language Models 2024 Chuanxi Zhao
Zhihao Dou
Kaizhu Huang
+ Attack Prompt Generation for Red Teaming and Defending Large Language Models 2023 Boyi Deng
Wenjie Wang
Fuli Feng
Yang Deng
Qifan Wang
Xiangnan He
+ PDF Chat $\textit{LinkPrompt}$: Natural and Universal Adversarial Attacks on Prompt-based Language Models 2024 Yue Xu
Wenjie Wang
+ Open Sesame! Universal Black Box Jailbreaking of Large Language Models 2023 Raz Lapid
Ron Langberg
Moshe Sipper
+ PDF Chat Goal-guided Generative Prompt Injection Attack on Large Language Models 2024 Chong Zhang
Mingyu Jin
Qinkai Yu
Chengzhi Liu
Haochen Xue
Xiao-Bo Jin
+ Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks 2023 Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael Abuā€Ghazaleh
+ PDF Chat ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming 2024 Simone Tedeschi
Felix Friedrich
Patrick Schramowski
Kristian Kersting
Roberto Navigli
Huu Du Nguyen
Bo Li

Works That Cite This (23)

Action Title Year Authors
+ PDF Chat Target-driven Attack for Large Language Models 2024 Chong Zhang
Mingyu Jin
Shu Dong
T Wang
Dongfang Liu
Xiao-Bo Jin
+ Building Privacy-Preserving and Secure Geospatial Artificial Intelligence Foundation Models (Vision Paper) 2023 Jinmeng Rao
Song Gao
Gengchen Mai
Krzysztof Janowicz
+ MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots 2024 Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
+ Multi-step Jailbreaking Privacy Attacks on ChatGPT 2023 Haoran Li
Dadi Guo
Wei Fan
Mingshi Xu
Jie Huang
Fanpu Meng
Yangqiu Song
+ Attack Prompt Generation for Red Teaming and Defending Large Language Models 2023 Boyi Deng
Wenjie Wang
Fuli Feng
Yang Deng
Qifan Wang
Xiangnan He
+ PDF Chat Unveiling the Implicit Toxicity in Large Language Models 2023 Jiaxin Wen
Pei Ke
Hao Sun
Zhexin Zhang
Chengfei Li
Jinfeng Bai
Minlie Huang
+ ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models 2023 Alex Mei
Sharon Levy
William Yang Wang
+ PDF Chat A Survey on ChatGPT: AIā€“Generated Contents, Challenges, and Solutions 2023 Yuntao Wang
Yanghe Pan
Miao Yan
Zhou Su
Tom H. Luan
+ Embedding Democratic Values into Social Media AIs via Societal Objective Functions 2023 Chenyan Jia
Michelle S. Lam
Minh Triet Chau
Jeffrey T. Hancock
Michael S. Bernstein
+ Embedding Democratic Values into Social Media AIs via Societal Objective Functions 2024 Chenyan Jia
Michelle S. Lam
Minh Triet Chau
Jeffrey T. Hancock
Michael S. Bernstein

Works Cited by This (0)

Action Title Year Authors