Ask a Question

Prefer a chat interface with context about you and your work?

Jailbreaking Proprietary Large Language Models using Word Substitution Cipher

Jailbreaking Proprietary Large Language Models using Word Substitution Cipher

Large Language Models (LLMs) are aligned to moral and ethical guidelines but remain susceptible to creative prompts called Jailbreak that can bypass the alignment process. However, most jailbreaking prompts contain harmful questions in the natural language (mainly English), which can be detected by the LLM themselves. In this paper, we …