Toxic Subword Pruning for Dialogue Response Generation on Large Language
Models
Toxic Subword Pruning for Dialogue Response Generation on Large Language
Models
How to defend large language models (LLMs) from generating toxic content is an important research area. Yet, most research focused on various model training techniques to remediate LLMs by updating their weights. A typical related research area is safety alignment. This however is often costly and tedious and can expose …