Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic

Michaela Blott, Thomas B. Preuber, Nicholas J. Fraser, Giulio Gambardella, Kenneth M. O'Brien, Yaman Umuroglu, Miriam Leeser

Type: Article

Publication Date: 2017-11-01

Citations: 7

DOI: https://doi.org/10.1109/iccd.2017.73

Download PDF

Abstract

Convolutional Neural Networks have dramatically improved in recent years, surpassing human accuracy on certain problems and performance exceeding that of traditional computer vision algorithms. While the compute pattern in itself is relatively simple, significant compute and memory challenges remain as CNNs may contain millions of floating-point parameters and require billions of floating-point operations to process a single image. These computational requirements, combined with storage footprints that exceed typical cache sizes, pose a significant performance and power challenge for modern compute architectures. One of the promising opportunities to scale performance and power efficiency is leveraging reduced precision representations for all activations and weights as this allows to scale compute capabilities, reduce weight and feature map buffering requirements as well as energy consumption. While a small reduction in accuracy is encountered, these Quantized Neural Networks have been shown to achieve state-of-the-art accuracy on standard benchmark datasets, such as MNIST, CIFAR-10, SVHN and even ImageNet, and thus provide highly attractive design trade-offs. Current research has focused mainly on the implementation of extreme variants with full binarization of weights and or activations, as well typically smaller input images. Within this paper, we investigate the scalability of dataflow architectures with respect to supporting various precisions for both weights and activations, larger image dimensions, and increasing numbers of feature map channels. Key contributions are a formalized approach to understanding the scalability of the existing hardware architecture with cost models and a performance prediction as a function of the target device size. We provide validating experimental results for an ImageNet classification on a server class platform, namely the AWS F1 node.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic	2018	Michaela Blott Thomas B. Preußer Nicholas J. Fraser Giulio Gambardella Kenneth M. O'Brien Yaman Umuroglu Miriam Leeser
+	Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic	2018	Michaela Blott Thomas B. Preußer Nicholas C. Fraser Giulio Gambardella Kenneth M. O'Brien Yaman Umuroglu Miriam Leeser
+	Scaling Binarized Neural Networks on Reconfigurable Logic	2017	Nicholas J. Fraser Yaman Umuroglu Giulio Gambardella Michaela Blott Philip H. W. Leong Magnus Jahre Kees Vissers
+ PDF Chat	Scaling Binarized Neural Networks on Reconfigurable Logic	2017	Nicholas J. Fraser Yaman Umuroglu Giulio Gambardella Michaela Blott Philip H. W. Leong Magnus Jahre Kees Vissers
+	Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs	2018	Philip Colangelo Nasibeh Nasiri Asit K. Mishra Eriko Nurvitadhi Martin Margala Kevin Nealis
+	Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks	2017	Hokchhay Tann Soheil Hashemi R. Iris Bahar Sherief Reda
+	Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks	2017	Hokchhay Tann Soheil Hashemi Iris Bahar Sherief Reda
+ PDF Chat	Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks	2017	Hokchhay Tann Soheil Hashemi R. Iris Bahar Sherief Reda
+	Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation	2023	Stylianos I. Venieris Javier Fernández-Marqués Nicholas D. Lane
+ PDF Chat	AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers	2019	Julian Faraone Martin Kumm Martin Hardieck Peter Zipf Xueyuan Liu David Boland Philip H. W. Leong
+ PDF Chat	Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network Acceleration	2022	R. Elangovan Shubham Jain Anand Raghunathan
+	Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration	2020	R. Elangovan Shubham Jain Anand Raghunathan
+	Accuracy to Throughput Trade-offs for Reduced Precision Neural Networks on Reconfigurable Logic	2018	Su Jiang Nicholas J. Fraser Giulio Gambardella Michaela Blott Gianluca Durelli David B. Thomas Philip H. W. Leong Peter Y. K. Cheung
+	Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation	2023	Stylianos I. Venieris Javier Fernández-Marqués Nicholas D. Lane
+ PDF Chat	LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference	2020	Erwei Wang James J. Davis Peter Y. K. Cheung George A. Constantinides
+ PDF Chat	Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform	2018	Chaim Baskin Natan Liss Evgenii Zheltonozhskii Alex Bronstein Avi Mendelson
+	Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine	2019	Renzo Andri Lukas Cavigelli Davide Rossi Luca Benini
+	FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations	2020	Yichi Zhang Junhao Pan Xinheng Liu Hongzheng Chen Deming Chen Zhiru Zhang
+ PDF Chat	FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations	2020	Yichi Zhang Junhao Pan Xinheng Liu Hongzheng Chen Deming Chen Zhiru Zhang
+	FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations	2020	Yichi Zhang Junhao Pan Xinheng Liu Hongzheng Chen Deming Chen Zhiru Zhang

Works That Cite This (2)

Action	Title	Year	Authors
+	QuTiBench: Benchmarking Neural Networks on Heterogeneous Hardware	2019	Michaela Blott Lisa Halder Miriam Leeser Linda Doyle
+	FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters	2020	Tianqi Wang Tong Geng Ang Li Xi Jin Martin C. Herbordt

Works Cited by This (7)

Action	Title	Year	Authors
+ PDF Chat	XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks	2016	Mohammad Rastegari Vicente Ordóñez Joseph Redmon Ali Farhadi
+	Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1	2016	Matthieu Courbariaux Itay Hubara Daniel Soudry Ran El‐Yaniv Yoshua Bengio
+	DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients	2016	Shuchang Zhou Yuxin Wu Zekun Ni Xinyu Zhou He Wen Yuheng Zou
+ PDF Chat	Scaling Binarized Neural Networks on Reconfigurable Logic	2017	Nicholas J. Fraser Yaman Umuroglu Giulio Gambardella Michaela Blott Philip H. W. Leong Magnus Jahre Kees Vissers
+ PDF Chat	Deep Learning with Low Precision by Half-Wave Gaussian Quantization	2017	Zhaowei Cai Xiaodong He Jian Sun Nuno Vasconcelos
+ PDF Chat	Generic and universal parallel matrix summation with a flexible compression goal for Xilinx FPGAs	2017	Thomas B. Preuser
+ PDF Chat	FINN	2017	Yaman Umuroglu Nicholas J. Fraser Giulio Gambardella Michaela Blott Philip H. W. Leong Magnus Jahre Kees Vissers