Recurrent neural network training with dark knowledge transfer

Zhiyuan Tang, Dong Wang, Zhiyong Zhang

Type: Preprint

Publication Date: 2016-03-01

Citations: 106

DOI: https://doi.org/10.1109/icassp.2016.7472809

Abstract

Recurrent neural networks (RNNs), particularly long short-term memory (LSTM), have gained much attention in automatic speech recognition (ASR). Although some successful stories have been reported, training RNNs remains highly challenging, especially with limited training data. Recent research found that a well-trained model can be used as a teacher to train other child models, by using the predictions generated by the teacher model as supervision. This knowledge transfer learning has been employed to train simple neural nets with a complex one, so that the final performance can reach a level that is infeasible to obtain by regular training. In this paper, we employ the knowledge transfer learning approach to train RNNs (precisely LSTM) using a deep neural network (DNN) model as the teacher. This is different from most of the existing research on knowledge transfer learning, since the teacher (DNN) is assumed to be weaker than the child (RNN); however, our experiments on an ASR task showed that it works fairly well: without applying any tricks on the learning scheme, this approach can train RNNs successfully even with limited training data.

Locations

arXiv (Cornell University) - View - PDF
DataCite API - View

Similar Works

Action	Title	Year	Authors
+	Knowledge Transfer Pre-training	2015	Zhiyuan Tang Dong Wang Yiqiao Pan Zhiyong Zhang
+	Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization	2019	Yangyang Shi Mei-Yuh Hwang Xin Lei Haoyu Sheng
+ PDF Chat	Knowledge Distillation for Recurrent Neural Network Language Modeling with Trust Regularization	2019	Yangyang Shi Mei-Yuh Hwang Xin Lei Haoyu Sheng
+	Robust Transfer Learning with Pretrained Language Models through Adapters	2021	Wenjuan Han Bo Pang Ying Wu
+ PDF Chat	Bridgenets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and Its Application to Distant Speech Recognition	2018	Jaeyoung Kim Mostafa El‐Khamy Jungwon Lee
+	BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition	2017	Jaeyoung Kim Mostafa El‐Khamy Jungwon Lee
+	Technical Report: Combining knowledge from Transfer Learning during training and Wide Resnets	2022	Wolfgang Fuhl
+	Transferring Knowledge from a RNN to a DNN	2015	William Chan Nan Rosemary Ke Ian Lane
+ PDF Chat	Transferring knowledge from a RNN to a DNN	2015	William Chan Nan Rosemary Ke Ian Lane
+	Generative Transfer Learning between Recurrent Neural Networks.	2016	Sungho Shin Kyuyeon Hwang Wonyong Sung
+ PDF Chat	TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition	2021	Ji Won Yoon Hyeonseung Lee Hyung Yong Kim Won Ik Cho Nam Soo Kim
+ PDF Chat	Knowledge distillation for small-footprint highway networks	2017	Liang Lu Michelle Guo Steve Renals
+	Knowledge Distillation for Small-footprint Highway Networks	2016	Liang Lu Michelle Guo Steve Renals
+	Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers	2021	Shucong Zhang Cong-Thanh Do Rama Doddipatla Erfan Loweimi Peter Bell Steve Renals
+	Recurrent Neural Network Regularization	2014	Wojciech Zaremba Ilya Sutskever Oriol Vinyals
+	Distilling Hubert with LSTMs via Decoupled Knowledge Distillation	2024	Danilo de Oliveira Timo Gerkmann
+	Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks	2019	Zhong Qiu Lin Alexander Wong
+	Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks	2019	Zhong Qiu Lin Alexander Wong
+	Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation	2023	Danilo de Oliveira Timo Gerkmann
+ PDF Chat	An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models	2019	Alexandra Chronopoulou Christos Baziotis Alexandros Potamianos

Works That Cite This (41)

Action	Title	Year	Authors
+ PDF Chat	Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning	2023	Wenke Huang Mang Ye Zekun Shi Bo Du
+ PDF Chat	Learning Deep Representations with Probabilistic Knowledge Transfer	2018	Nikolaos Passalis Anastasios Tefas
+	Knowledge distillation for optimization of quantized deep neural networks	2019	Sungho Shin Yoonho Boo Wonyong Sung
+	LabelEnc: A New Intermediate Supervision Method for Object Detection	2020	Hao Miao Yitao Liu Xiangyu Zhang Jian Sun
+	Blending LSTMs into CNNs	2015	Krzysztof J. Geras Abdelrahman Mohamed Rich Caruana Gregor Urban Shengjie Wang Özlem Aslan Matthai Philipose Matthew Richardson Charles Sutton
+ PDF Chat	Copycat CNN: Are random non-Labeled data enough to steal knowledge from black-box models?	2021	Jacson Rodrigues Correia-Silva Rodrigo F. Berriel Claudine Badué Alberto F. De Souza Thiago Oliveira-Santos
+ PDF Chat	Bolt	2017	Davis Blalock John V. Guttag
+ PDF Chat	Heterogeneous Knowledge Distillation Using Information Flow Modeling	2020	Nikolaos Passalis Maria Tzelepi Anastasios Tefas
+	Compressing End-to-end ASR Networks by Tensor-Train Decomposition	2018	Takuma Mori Andros Tjandra Sakriani Sakti Satoshi Nakamura
+ PDF Chat	Knowledge Distillation for Optimization of Quantized Deep Neural Networks	2020	Sungho Shin Yoonho Boo Wonyong Sung

Works Cited by This (6)

Action	Title	Year	Authors
+	FitNets: Hints for Thin Deep Nets	2014	Adriana Romero Nicolas Ballas Samira Ebrahimi Kahou Antoine Chassang Carlo Gatta Yoshua Bengio
+	Distilling the Knowledge in a Neural Network	2015	Geoffrey E. Hinton Oriol Vinyals Jay B. Dean
+ PDF Chat	Deep learning in neural networks: An overview	2014	Jürgen Schmidhuber
+	Do Deep Nets Really Need to be Deep?	2013	Jimmy Ba Rich Caruana
+	Transferring Knowledge from a RNN to a DNN	2015	William Chan Nan Rosemary Ke Ian Lane
+ PDF Chat	Speech recognition with deep recurrent neural networks	2013	Alex Graves Abdelrahman Mohamed Geoffrey E. Hinton