Ask a Question

Prefer a chat interface with context about you and your work?

Balanced Sparsity for Efficient DNN Inference on GPU

Balanced Sparsity for Efficient DNN Inference on GPU

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But …