Snapshot Distillation: Teacher-Student Optimization in One Generation
Snapshot Distillation: Teacher-Student Optimization in One Generation
Optimizing a deep neural network is a fundamental task in computer vision, yet direct training methods often suffer from over-fitting. Teacher-student optimization aims at providing complementary cues from a model trained previously, but these approaches are often considerably slow due to the pipeline of training a few generations in sequence, …