Jailbreaking Proprietary Large Language Models using Word Substitution
Cipher
Jailbreaking Proprietary Large Language Models using Word Substitution
Cipher
Large Language Models (LLMs) are aligned to moral and ethical guidelines but remain susceptible to creative prompts called Jailbreak that can bypass the alignment process. However, most jailbreaking prompts contain harmful questions in the natural language (mainly English), which can be detected by the LLM themselves. In this paper, we …