ELLE: Efficient Lifelong Pre-training for Emerging Data
ELLE: Efficient Lifelong Pre-training for Emerging Data
Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. This requires PLMs to integrate the information from all the sources in a lifelong manner. Although this goal could be achieved by exhaustive pre-training on all …