Vectorized and performance‐portable quicksort
Vectorized and performance‐portable quicksort
Abstract Recent works showed that implementations of quicksort using vector CPU instructions can outperform the non‐vectorized algorithms in widespread use. However, these implementations are typically single‐threaded, implemented for a particular instruction set, and restricted to a small set of key types. We lift these three restrictions: our proposed vqsort algorithm …