High-Performance Tensor Contraction without Transposition
High-Performance Tensor Contraction without Transposition
Tensor computations---in particular tensor contraction (TC)---are important kernels in many scientific computing applications. Due to the fundamental similarity of TC to matrix multiplication and to the availability of optimized implementations such as the BLAS, tensor operations have traditionally been implemented in terms of BLAS operations, incurring both a performance and …