Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
Do all instances need inference through the big models for a correct prediction? Perhaps not; some instances are easy and can be answered correctly by even small capacity models. This provides opportunities for improving the computational efficiency of systems. In this work, we present an explorative study on 'model cascading', …