HPTT: a high-performance tensor transposition C++ library
HPTT: a high-performance tensor transposition C++ library
Recently we presented TTC, a domain-specific compiler for tensor transpositions. Despite the fact that the performance of the generated code is nearly optimal, due to its offline nature, TTC cannot be utilized in all the application codes in which the tensor sizes and the necessary tensor permutations are determined at …