Ask a Question

Prefer a chat interface with context about you and your work?

Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs

Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs

Batched dense linear algebra kernels are becoming ubiquitous in scientific applications, ranging from tensor contractions in deep learning to data compression in hierarchical low-rank matrix approximation. Within a single API call, these kernels are capable of simultaneously launching up to thousands of similar matrix computations, removing the expensive overhead of …