Recurrent neural network training with dark knowledge transfer

Type: Preprint

Publication Date: 2016-03-01

Citations: 106

DOI: https://doi.org/10.1109/icassp.2016.7472809

Download PDF

Abstract

Recurrent neural networks (RNNs), particularly long short-term memory (LSTM), have gained much attention in automatic speech recognition (ASR). Although some successful stories have been reported, training RNNs remains highly challenging, especially with limited training data. Recent research found that a well-trained model can be used as a teacher to train other child models, by using the predictions generated by the teacher model as supervision. This knowledge transfer learning has been employed to train simple neural nets with a complex one, so that the final performance can reach a level that is infeasible to obtain by regular training. In this paper, we employ the knowledge transfer learning approach to train RNNs (precisely LSTM) using a deep neural network (DNN) model as the teacher. This is different from most of the existing research on knowledge transfer learning, since the teacher (DNN) is assumed to be weaker than the child (RNN); however, our experiments on an ASR task showed that it works fairly well: without applying any tricks on the learning scheme, this approach can train RNNs successfully even with limited training data.

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ Knowledge Transfer Pre-training 2015 Zhiyuan Tang
Dong Wang
Yiqiao Pan
Zhiyong Zhang
+ Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization 2019 Yangyang Shi
Mei-Yuh Hwang
Xin Lei
Haoyu Sheng
+ PDF Chat Knowledge Distillation for Recurrent Neural Network Language Modeling with Trust Regularization 2019 Yangyang Shi
Mei-Yuh Hwang
Xin Lei
Haoyu Sheng
+ Robust Transfer Learning with Pretrained Language Models through Adapters 2021 Wenjuan Han
Bo Pang
Ying Wu
+ PDF Chat Bridgenets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and Its Application to Distant Speech Recognition 2018 Jaeyoung Kim
Mostafa El‐Khamy
Jungwon Lee
+ BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition 2017 Jaeyoung Kim
Mostafa El‐Khamy
Jungwon Lee
+ Technical Report: Combining knowledge from Transfer Learning during training and Wide Resnets 2022 Wolfgang Fuhl
+ Transferring Knowledge from a RNN to a DNN 2015 William Chan
Nan Rosemary Ke
Ian Lane
+ PDF Chat Transferring knowledge from a RNN to a DNN 2015 William Chan
Nan Rosemary Ke
Ian Lane
+ Generative Transfer Learning between Recurrent Neural Networks. 2016 Sungho Shin
Kyuyeon Hwang
Wonyong Sung
+ PDF Chat TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition 2021 Ji Won Yoon
Hyeonseung Lee
Hyung Yong Kim
Won Ik Cho
Nam Soo Kim
+ PDF Chat Knowledge distillation for small-footprint highway networks 2017 Liang Lu
Michelle Guo
Steve Renals
+ Knowledge Distillation for Small-footprint Highway Networks 2016 Liang Lu
Michelle Guo
Steve Renals
+ Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers 2021 Shucong Zhang
Cong-Thanh Do
Rama Doddipatla
Erfan Loweimi
Peter Bell
Steve Renals
+ Recurrent Neural Network Regularization 2014 Wojciech Zaremba
Ilya Sutskever
Oriol Vinyals
+ Distilling Hubert with LSTMs via Decoupled Knowledge Distillation 2024 Danilo de Oliveira
Timo Gerkmann
+ Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks 2019 Zhong Qiu Lin
Alexander Wong
+ Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks 2019 Zhong Qiu Lin
Alexander Wong
+ Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation 2023 Danilo de Oliveira
Timo Gerkmann
+ PDF Chat An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models 2019 Alexandra Chronopoulou
Christos Baziotis
Alexandros Potamianos