Ask a Question

Prefer a chat interface with context about you and your work?

Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation

Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation

Knowledge distillation aims at obtaining a compact and effective model by learning the mapping function from a much larger one. Due to the limited capacity of the student, the student would underfit the teacher. Therefore, student performance would unexpectedly drop when distilling from an oversized teacher, termed the capacity gap …