Strassen's Algorithm for Tensor Contraction

Jianyu Huang, Devin A. Matthews, Robert A. Geijn

Type: Article

Publication Date: 2018-01-01

Citations: 9

DOI: https://doi.org/10.1137/17m1135578

Abstract

Tensor contraction (TC) is an important computational kernel widely used in numerous applications. It is a multidimensional generalization of matrix multiplication (GEMM). While Strassen's algorithm for GEMM is well studied in theory and practice, extending it to accelerate TC has not been previously pursued. Thus, we believe this to be the first paper to demonstrate how one can in practice speed up TC with Strassen's algorithm. By adopting a block-scatter-matrix format, a novel matrix-centric tensor layout, we can conceptually view TC as GEMM for a general stride storage, with an implicit tensor-to-matrix transformation. This insight enables us to tailor a recent state-of-the-art implementation of Strassen's algorithm to a recent state-of-the-art TC, avoiding explicit transpositions (permutations) and extra workspace, and reducing the overhead of memory movement that is incurred. Performance benefits are demonstrated with a performance model as well as in practice on modern single core, multicore, and distributed memory parallel architectures, achieving up to $1.3 \times$ speedup. The resulting implementations can serve as a drop-in replacement for various applications with significant speedup.

Locations

SIAM Journal on Scientific Computing - View
arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Strassen's Algorithm for Tensor Contraction	2017	Jianyu Huang Devin A. Matthews Robert A. Geijn
+	Design of a high-performance GEMM-like Tensor-Tensor Multiplication	2016	P. T. Springer Paolo Bientinesi
+	Design of a high-performance GEMM-like Tensor-Tensor Multiplication	2016	Paul Springer Paolo Bientinesi
+ PDF Chat	Design of a High-Performance GEMM-like Tensor–Tensor Multiplication	2018	Paul Springer Paolo Bientinesi
+ PDF Chat	Tensor Contractions with Extended BLAS Kernels on CPU and GPU	2016	Yang Shi U. N. Niranjan Animashree Anandkumar Cris Cecka
+	High-Performance Tensor Contraction without Transposition	2016	Devin A. Matthews
+ PDF Chat	Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs	2023	Khalid Ahmad Cris Cecka Michael Garland Mary Hall
+	High-Performance Tensor Contractions for GPUs	2016	Ahmad Abdelfattah Marc Baboulin Veselin Dobrev Jack Dongarra Christopher Earl Joël Falcou Azzam Haidar Ian Karlin Tzanio Kolev Ian Masliah
+	High-performance Tensor Contractions for GPUs	2016	Ahmad Abdelfattah Marc Baboulin Veselin Dobrev J. Dongarra Chris Earl Joël Falcou Azzam Haidar Ian Karlin Tzanio Kolev Ian Masliah
+	Design of a High-Performance Tensor-Vector Multiplication with BLAS	2019	Cem Savaş Başsoy
+	cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs	2017	Antti‐Pekka Hynninen Dmitry I. Lyakh
+	cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs	2017	Antti‐Pekka Hynninen Dmitry I. Lyakh
+ PDF Chat	Swift: High-Performance Sparse Tensor Contraction for Scientific Applications	2024	Andrew Ensinger Gabriel Kulp Victor Agostinelli Dennis Lyakhov Lizhong Chen
+	SIMD$^2$: A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM	2022	Yunan Zhang Po-An Tsai Hung‐Wei Tseng
+	High Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic	2018	Adam P. Harrison Dileepan Joseph
+	Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor Contraction	2024	Guoqing Xiao Chuanghui Yin Yuedan Chen Mingxing Duan Kenli Li
+	GSpTC: High-Performance Sparse Tensor Contraction on CPU-GPU Heterogeneous Systems	2022	Guoqing Xiao Chuanghui Yin Yuedan Chen Mingxing Duan Kenli Li
+	On Higher-performance Sparse Tensor Transposition	2023	Luanzheng Guo Gökçen Kestor
+ PDF Chat	High Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic	2018	Adam P. Harrison Doug Joseph
+	BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core Acceleration	2024	栄徳宍戸 Haotian Wang Wangdong Yang Renqiu Ouyang Keqin Li Kenli Li

Works That Cite This (4)

Action	Title	Year	Authors
+	Fast Bilinear Algorithms for Symmetric Tensor Contractions	2020	Edgar Solomonik James Demmel
+ PDF Chat	On the Optimal Linear Contraction Order of Tree Tensor Networks, and Beyond	2024	Mihail Stoian Richard M. Milbradt Christian B. Mendl
+	High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs	2020	Natalie Beams Ahmad Abdelfattah Stanimire Tomov Jack Dongarra Tzanio Kolev Yohann Dudouit
+ PDF Chat	Efficient computation of the second-Born self-energy using tensor-contraction operations	2019	Riku Tuovinen Fabio Covito Michael A. Sentef

Works Cited by This (11)

Action	Title	Year	Authors
+ PDF Chat	Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry	2009	Albert Hartono Qingda Lu Thomas Henretty Sriram Krishnamoorthy Huaijian Zhang Gerald Baumgartner David E. Bernholdt Marcel Nooijen Russell M. Pitzer J. Ramanujam
+	An efficient matrix-matrix multiplication based antisymmetric tensor contraction engine for general order coupled cluster	2010	Michael Hanrath Anna Engels-Putzka
+	Tensor Decompositions and Applications	2009	Tamara G. Kolda Brett W. Bader
+ PDF Chat	Symmetric Tensors and Symmetric Tensor Rank	2008	Pierre Comon Gene H. Golub Lek‐Heng Lim Bernard Mourrain
+	GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm	1994	Craig C. Douglas Michael A. Heroux Gordon Slishman Roger M. Smith
+ PDF Chat	Towards an efficient use of the BLAS library for multilinear tensor contractions	2014	Edoardo Di Napoli Diego Fabregat‐Traver Gregorio Quintana‐Ortí Paolo Bientinesi
+ PDF Chat	Efficient MATLAB Computations with Sparse and Factored Tensors	2007	Brett W. Bader Tamara G. Kolda
+ PDF Chat	Improving the Numerical Stability of Fast Matrix Multiplication	2016	Grey Ballard Austin R. Benson Alex Druinsky Benjamin Lipshitz Oded Schwartz
+ PDF Chat	High-Performance Tensor Contraction without Transposition	2018	Devin A. Matthews
+ PDF Chat	Design of a High-Performance GEMM-like Tensor–Tensor Multiplication	2018	Paul Springer Paolo Bientinesi