Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic

Type: Article

Publication Date: 2017-11-01

Citations: 7

DOI: https://doi.org/10.1109/iccd.2017.73

Download PDF

Abstract

Convolutional Neural Networks have dramatically improved in recent years, surpassing human accuracy on certain problems and performance exceeding that of traditional computer vision algorithms. While the compute pattern in itself is relatively simple, significant compute and memory challenges remain as CNNs may contain millions of floating-point parameters and require billions of floating-point operations to process a single image. These computational requirements, combined with storage footprints that exceed typical cache sizes, pose a significant performance and power challenge for modern compute architectures. One of the promising opportunities to scale performance and power efficiency is leveraging reduced precision representations for all activations and weights as this allows to scale compute capabilities, reduce weight and feature map buffering requirements as well as energy consumption. While a small reduction in accuracy is encountered, these Quantized Neural Networks have been shown to achieve state-of-the-art accuracy on standard benchmark datasets, such as MNIST, CIFAR-10, SVHN and even ImageNet, and thus provide highly attractive design trade-offs. Current research has focused mainly on the implementation of extreme variants with full binarization of weights and or activations, as well typically smaller input images. Within this paper, we investigate the scalability of dataflow architectures with respect to supporting various precisions for both weights and activations, larger image dimensions, and increasing numbers of feature map channels. Key contributions are a formalized approach to understanding the scalability of the existing hardware architecture with cost models and a performance prediction as a function of the target device size. We provide validating experimental results for an ImageNet classification on a server class platform, namely the AWS F1 node.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic 2018 Michaela Blott
Thomas B. Preußer
Nicholas J. Fraser
Giulio Gambardella
Kenneth M. O'Brien
Yaman Umuroglu
Miriam Leeser
+ Scaling Neural Network Performance through Customized Hardware Architectures on Reconfigurable Logic 2018 Michaela Blott
Thomas B. Preußer
Nicholas C. Fraser
Giulio Gambardella
Kenneth M. O'Brien
Yaman Umuroglu
Miriam Leeser
+ Scaling Binarized Neural Networks on Reconfigurable Logic 2017 Nicholas J. Fraser
Yaman Umuroglu
Giulio Gambardella
Michaela Blott
Philip H. W. Leong
Magnus Jahre
Kees Vissers
+ PDF Chat Scaling Binarized Neural Networks on Reconfigurable Logic 2017 Nicholas J. Fraser
Yaman Umuroglu
Giulio Gambardella
Michaela Blott
Philip H. W. Leong
Magnus Jahre
Kees Vissers
+ Exploration of Low Numeric Precision Deep Learning Inference Using Intel FPGAs 2018 Philip Colangelo
Nasibeh Nasiri
Asit K. Mishra
Eriko Nurvitadhi
Martin Margala
Kevin Nealis
+ Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks 2017 Hokchhay Tann
Soheil Hashemi
R. Iris Bahar
Sherief Reda
+ Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks 2017 Hokchhay Tann
Soheil Hashemi
Iris Bahar
Sherief Reda
+ PDF Chat Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks 2017 Hokchhay Tann
Soheil Hashemi
R. Iris Bahar
Sherief Reda
+ Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation 2023 Stylianos I. Venieris
Javier Fernández-Marqués
Nicholas D. Lane
+ PDF Chat AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers 2019 Julian Faraone
Martin Kumm
Martin Hardieck
Peter Zipf
Xueyuan Liu
David Boland
Philip H. W. Leong
+ PDF Chat Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network Acceleration 2022 R. Elangovan
Shubham Jain
Anand Raghunathan
+ Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration 2020 R. Elangovan
Shubham Jain
Anand Raghunathan
+ Accuracy to Throughput Trade-offs for Reduced Precision Neural Networks on Reconfigurable Logic 2018 Su Jiang
Nicholas J. Fraser
Giulio Gambardella
Michaela Blott
Gianluca Durelli
David B. Thomas
Philip H. W. Leong
Peter Y. K. Cheung
+ Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation 2023 Stylianos I. Venieris
Javier Fernández-Marqués
Nicholas D. Lane
+ PDF Chat LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference 2020 Erwei Wang
James J. Davis
Peter Y. K. Cheung
George A. Constantinides
+ PDF Chat Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform 2018 Chaim Baskin
Natan Liss
Evgenii Zheltonozhskii
Alex Bronstein
Avi Mendelson
+ Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine 2019 Renzo Andri
Lukas Cavigelli
Davide Rossi
Luca Benini
+ FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations 2020 Yichi Zhang
Junhao Pan
Xinheng Liu
Hongzheng Chen
Deming Chen
Zhiru Zhang
+ PDF Chat FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations 2020 Yichi Zhang
Junhao Pan
Xinheng Liu
Hongzheng Chen
Deming Chen
Zhiru Zhang
+ FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations 2020 Yichi Zhang
Junhao Pan
Xinheng Liu
Hongzheng Chen
Deming Chen
Zhiru Zhang