Ask a Question

Prefer a chat interface with context about you and your work?

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Deep learning models often suffer from a lack of interpretability due to polysemanticity, where individual neurons are activated by multiple unrelated semantics, resulting in unclear attributions of model behavior. Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability but are commonly believed to …