Lifelong Language Knowledge Distillation

Type: Article

Publication Date: 2020-01-01

Citations: 28

DOI: https://doi.org/10.18653/v1/2020.emnlp-main.233

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Lifelong Language Knowledge Distillation 2020 Yung-Sung Chuang
Shang‐Yu Su
Yun-Nung Chen
+ PDF Chat GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation 2024 Mohsen Gholami
Mohammad Akbari
Cindy Hu
Vaden Masrani
Z. Jane Wang
Yong Zhang
+ Generation-Distillation for Efficient Natural Language Understanding in Low-Data Settings 2019 Luke Melas-Kyriazi
George Han
Celine Liang
+ Generation-Distillation for Efficient Natural Language Understanding in Low-Data Settings 2020 Luke Melas-Kyriazi
George Han
Celine Liang
+ LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning 2020 Kaitao Song
Hao Sun
Xu Tan
Tao Qin
Jianfeng Lu
Hongzhi Liu
Tie‐Yan Liu
+ Lifelong Language Pretraining with Distribution-Specialized Experts 2023 Wuyang Chen
Yanqi Zhou
Nan Du
Yanping Huang
James Laudon
Zhifeng Chen
C. C. Iuras ̧cu
+ PDF Chat L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational Language Models✱ 2024 Aidin Shiri
Kaushik Roy
Amit Sheth
Manas Gaur
+ ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization 2023 Weixin Liu
Xuyi Chen
Jiaxiang Liu
Shikun Feng
Yu Sun
Hao Tian
Hua Wu
+ L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational Language Models 2023 Aidin Shiri
Kaushik Roy
Amit Sheth
Manas Gaur
+ Knowledge Inheritance for Pre-trained Language Models 2021 Yujia Qin
Yankai Lin
Jing Yi
Jiajie Zhang
Xu Han
Zhengyan Zhang
Yusheng Su
Zhiyuan Liu
Peng Li
Maosong Sun
+ LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5 2021 Chengwei Qin
Shafiq Joty
+ Data-Free Distillation of Language Model by Text-to-Text Transfer 2023 Zheyuan Bai
Xinduo Liu
Hailin Hu
Tianyu Guo
Qinghua Zhang
Yunhe Wang
+ PDF Chat Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models 2024 Yao Fu
Yu Yin
Xuewang Han
Ruiteng Li
Xianxuan Long
Haoran Yu
Pan Li
+ MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models 2019 Linqing Liu
Huan Wang
Jimmy Lin
Richard Socher
Caiming Xiong
+ Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes 2023 Cheng-Yu Hsieh
Chun‐Liang Li
Chih‐Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen‐Yu Lee
Tomas Pfister
+ PDF Chat Preserving Knowledge in Large Language Model: A Model-Agnostic Self-Decompression Approach 2024 Zilun Zhang
Yutao Sun
Tiancheng Zhao
Leigang Sha
R. Xu
Kyusong Lee
Jianwei Yin
+ PDF Chat Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale 2024 Flavio Di Palo
Pratibha Singhi
Bilal Fadlallah
+ ELLE: Efficient Lifelong Pre-training for Emerging Data 2022 Yujia Qin
Jiajie Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
+ PDF Chat ELLE: Efficient Lifelong Pre-training for Emerging Data 2022 Yujia Qin
Jiajie Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
+ PDF Chat ZeroGen: Efficient Zero-shot Learning via Dataset Generation 2022 Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Changyuan Yu
Lingpeng Kong