Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations
Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations
Optimizing the performance of stencil algorithms has been the subject of intense research over the last two decades. Since many stencil schemes have low arithmetic intensity, most optimizations focus on increasing the temporal data access locality, thus reducing the data traffic through the main memory interface with the ultimate goal …