Strassen's Algorithm for Tensor Contraction

Type: Article

Publication Date: 2018-01-01

Citations: 9

DOI: https://doi.org/10.1137/17m1135578

Abstract

Tensor contraction (TC) is an important computational kernel widely used in numerous applications. It is a multidimensional generalization of matrix multiplication (GEMM). While Strassen's algorithm for GEMM is well studied in theory and practice, extending it to accelerate TC has not been previously pursued. Thus, we believe this to be the first paper to demonstrate how one can in practice speed up TC with Strassen's algorithm. By adopting a block-scatter-matrix format, a novel matrix-centric tensor layout, we can conceptually view TC as GEMM for a general stride storage, with an implicit tensor-to-matrix transformation. This insight enables us to tailor a recent state-of-the-art implementation of Strassen's algorithm to a recent state-of-the-art TC, avoiding explicit transpositions (permutations) and extra workspace, and reducing the overhead of memory movement that is incurred. Performance benefits are demonstrated with a performance model as well as in practice on modern single core, multicore, and distributed memory parallel architectures, achieving up to $1.3 \times$ speedup. The resulting implementations can serve as a drop-in replacement for various applications with significant speedup.

Locations

  • SIAM Journal on Scientific Computing - View
  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Strassen's Algorithm for Tensor Contraction 2017 Jianyu Huang
Devin A. Matthews
Robert A. Geijn
+ Design of a high-performance GEMM-like Tensor-Tensor Multiplication 2016 P. T. Springer
Paolo Bientinesi
+ Design of a high-performance GEMM-like Tensor-Tensor Multiplication 2016 Paul Springer
Paolo Bientinesi
+ PDF Chat Design of a High-Performance GEMM-like Tensor–Tensor Multiplication 2018 Paul Springer
Paolo Bientinesi
+ PDF Chat Tensor Contractions with Extended BLAS Kernels on CPU and GPU 2016 Yang Shi
U. N. Niranjan
Animashree Anandkumar
Cris Cecka
+ High-Performance Tensor Contraction without Transposition 2016 Devin A. Matthews
+ PDF Chat Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs 2023 Khalid Ahmad
Cris Cecka
Michael Garland
Mary Hall
+ High-Performance Tensor Contractions for GPUs 2016 Ahmad Abdelfattah
Marc Baboulin
Veselin Dobrev
Jack Dongarra
Christopher Earl
Joël Falcou
Azzam Haidar
Ian Karlin
Tzanio Kolev
Ian Masliah
+ High-performance Tensor Contractions for GPUs 2016 Ahmad Abdelfattah
Marc Baboulin
Veselin Dobrev
J. Dongarra
Chris Earl
Joël Falcou
Azzam Haidar
Ian Karlin
Tzanio Kolev
Ian Masliah
+ Design of a High-Performance Tensor-Vector Multiplication with BLAS 2019 Cem Savaş Başsoy
+ cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs 2017 Antti‐Pekka Hynninen
Dmitry I. Lyakh
+ cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs 2017 Antti‐Pekka Hynninen
Dmitry I. Lyakh
+ PDF Chat Swift: High-Performance Sparse Tensor Contraction for Scientific Applications 2024 Andrew Ensinger
Gabriel Kulp
Victor Agostinelli
Dennis Lyakhov
Lizhong Chen
+ SIMD$^2$: A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM 2022 Yunan Zhang
Po-An Tsai
Hung‐Wei Tseng
+ High Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic 2018 Adam P. Harrison
Dileepan Joseph
+ Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor Contraction 2024 Guoqing Xiao
Chuanghui Yin
Yuedan Chen
Mingxing Duan
Kenli Li
+ GSpTC: High-Performance Sparse Tensor Contraction on CPU-GPU Heterogeneous Systems 2022 Guoqing Xiao
Chuanghui Yin
Yuedan Chen
Mingxing Duan
Kenli Li
+ On Higher-performance Sparse Tensor Transposition 2023 Luanzheng Guo
Gökçen Kestor
+ PDF Chat High Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic 2018 Adam P. Harrison
Doug Joseph
+ BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core Acceleration 2024 栄徳 宍戸
Haotian Wang
Wangdong Yang
Renqiu Ouyang
Keqin Li
Kenli Li

Works Cited by This (11)

Action Title Year Authors
+ PDF Chat Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry 2009 Albert Hartono
Qingda Lu
Thomas Henretty
Sriram Krishnamoorthy
Huaijian Zhang
Gerald Baumgartner
David E. Bernholdt
Marcel Nooijen
Russell M. Pitzer
J. Ramanujam
+ An efficient matrix-matrix multiplication based antisymmetric tensor contraction engine for general order coupled cluster 2010 Michael Hanrath
Anna Engels-Putzka
+ Tensor Decompositions and Applications 2009 Tamara G. Kolda
Brett W. Bader
+ PDF Chat Symmetric Tensors and Symmetric Tensor Rank 2008 Pierre Comon
Gene H. Golub
Lek‐Heng Lim
Bernard Mourrain
+ GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm 1994 Craig C. Douglas
Michael A. Heroux
Gordon Slishman
Roger M. Smith
+ PDF Chat Towards an efficient use of the BLAS library for multilinear tensor contractions 2014 Edoardo Di Napoli
Diego Fabregat‐Traver
Gregorio Quintana‐Ortí
Paolo Bientinesi
+ PDF Chat Efficient MATLAB Computations with Sparse and Factored Tensors 2007 Brett W. Bader
Tamara G. Kolda
+ PDF Chat Improving the Numerical Stability of Fast Matrix Multiplication 2016 Grey Ballard
Austin R. Benson
Alex Druinsky
Benjamin Lipshitz
Oded Schwartz
+ PDF Chat High-Performance Tensor Contraction without Transposition 2018 Devin A. Matthews
+ PDF Chat Design of a High-Performance GEMM-like Tensor–Tensor Multiplication 2018 Paul Springer
Paolo Bientinesi