Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Type: Article

Publication Date: 2019-10-01

Citations: 365

DOI: https://doi.org/10.1109/iccv.2019.00495

Abstract

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive experiments over several popular network structures show that training low-bit neural networks with DSQ can consistently outperform state-of-the-art quantization methods. Besides, our first efficient implementation for deploying 2 to 4-bit DSQ on devices with ARM architecture achieves up to 1.7× speed up, compared with the open-source 8-bit high-performance inference framework NCNN [31].

Locations

  • arXiv (Cornell University) - View - PDF
  • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) - View

Similar Works

Action Title Year Authors
+ Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks 2019 Ruihao Gong
Xianglong Liu
Shenghu Jiang
Tianxiang Li
Peng Hu
Jiazhen Lin
Fengwei Yu
Junjie Yan
+ Joint Training of Low-Precision Neural Network with Quantization Interval Parameters 2018 Sangil Jung
Changyong Son
Seohyung Lee
Jinwoo Son
Youngjun Kwak
Jae‐Joon Han
Changkyu Choi
+ PDF Chat Distance-aware Quantization 2021 Dohyung Kim
Junghyup Lee
Bumsub Ham
+ PDF Chat Distance-aware Quantization 2021 Dohyung kim
Junghyup Lee
Bumsub Ham
+ Distance-aware Quantization 2021 Dohyung kim
Junghyup Lee
Bumsub Ham
+ Learnable Companding Quantization for Accurate Low-bit Neural Networks 2021 Kohei Yamamoto
+ PDF Chat Learnable Companding Quantization for Accurate Low-bit Neural Networks 2021 Kohei Yamamoto
+ Quantization Networks 2019 Jiwei Yang
Xu Shen
Jun Xing
Xinmei Tian
Houqiang Li
Bing Deng
Jianqiang Huang
Xian–Sheng Hua
+ Differentiable Fine-grained Quantization for Deep Neural Network Compression 2018 Hsin-Pai Cheng
Yuanjun Huang
Xuyang Guo
Feng Yan
Yifei Huang
Wei Wen
Hai Li
Yiran Chen
+ Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization 2017 Yinpeng Dong
Renkun Ni
Jianguo Li
Yurong Chen
Jun Zhu
Hang Su
+ Adaptive Binary-Ternary Quantization. 2019 Ryan Razani
Grégoire Morin
Vahid Partovi Nia
Eyyüb Sari
+ SQuAT: Sharpness- and Quantization-Aware Training for BERT 2022 Zheng Wang
Juncheng B Li
Shuhui Qu
Florian Metze
Emma Strubell
+ PDF Chat Adaptive Binary-Ternary Quantization 2021 Ryan Razani
Grégoire Morin
Eyyüb Sari
Vahid Partovi Nia
+ Adaptive Binary-Ternary Quantization 2019 Ryan Razani
Grégoire Morin
Vahid Partovi Nia
Eyyüb Sari
+ A Survey on Methods and Theories of Quantized Neural Networks 2018 Yunhui Guo
+ DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference 2023 Jiajun Zhou
Jiajun Wu
Yizhao Gao
Yuhao Ding
Chaofan Tao
Boyu Li
Fengbin Tu
Kwang‐Ting Cheng
Hayden Kwok‐Hay So
Ngai Wong
+ PDF Chat Low-bit Quantization of Neural Networks for Efficient Inference 2019 Yoni Choukroun
Eli Kravchik
Fan Yang
Pavel Kisilev
+ PDF Chat Adaptive Loss-Aware Quantization for Multi-Bit Networks 2020 Zhongnan Qu
Zimu Zhou
Yun Cheng
Lothar Thiele
+ Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss 2018 Sangil Jung
Changyong Son
Seohyung Lee
Jinwoo Son
Youngjun Kwak
Jae‐Joon Han
Sung Ju Hwang
Changkyu Choi
+ SDQ: Stochastic Differentiable Quantization with Mixed Precision 2022 Xijie Huang
Zhiqiang Shen
Shichao Li
Zechun Liu
Xianghong Hu
Jeffry Wicaksana
Eric P. Xing
Kwang‐Ting Cheng

Works That Cite This (196)

Action Title Year Authors
+ PDF Chat Robust knowledge distillation based on feature variance against backdoored teacher model 2024 Jinyin Chen
Xiaoming Zhao
Haibin Zheng
Xiao Li
Sheng Xiang
Haifeng Guo
+ DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference 2023 Jiajun Zhou
Jiajun Wu
Yizhao Gao
Yuhao Ding
Chaofan Tao
Boyu Li
Fengbin Tu
Kwang‐Ting Cheng
Hayden Kwok‐Hay So
Ngai Wong
+ SEAM: Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization 2023 Chen Tang
Kai Ouyang
Zenghao Chai
Yunpeng Bai
Yuan Meng
Zhi Wang
Wenwu Zhu
+ SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation 2020 Yang Zhao
Xiaohan Chen
Yue Wang
Chaojian Li
Haoran You
Yonggan Fu
Yuan Xie
Zhangyang Wang
Yingyan Lin
+ PDF Chat Pruning and quantization for deep neural network acceleration: A survey 2021 Tailin Liang
John Glossner
Lei Wang
Shi Shao-bo
Xiaotong Zhang
+ PDF Chat OMPQ: Orthogonal Mixed Precision Quantization 2023 Yuexiao Ma
Taisong Jin
Xiawu Zheng
Yan Wang
Huixia Li
Yongjian Wu
Guannan Jiang
Wei Zhang
Rongrong Ji
+ Distance-aware Quantization 2021 Dohyung kim
Junghyup Lee
Bumsub Ham
+ SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices 2023 Zhengang Li
Geng Yuan
Tomoharu Yamauchi
Masoud Zabihi
Yanyue Xie
Peiyan Dong
Xulong Tang
Nobuyuki Yoshikawa
Devesh Tiwari
Yanzhi Wang
+ PDF Chat SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation 2020 Yang Zhao
Xiaohan Chen
Yue Wang
Chaojian Li
Haoran You
Yonggan Fu
Yuan Xie
Zhangyang Wang
Yingyan Lin
+ PDF Chat Adaptive Loss-Aware Quantization for Multi-Bit Networks 2020 Zhongnan Qu
Zimu Zhou
Yun Cheng
Lothar Thiele