Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
In this paper, we propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference. Conventional methods require the consistency of datasets for bitwidth search and model deployment to guarantee the policy optimality, leading to heavy search cost on challenging largescale datasets in realistic applications. On the contrary, our GMPQ searches …