Efficient and Effective Tail Latency Minimization in Multi-Stage Retrieval Systems.
Efficient and Effective Tail Latency Minimization in Multi-Stage Retrieval Systems.
Scalable web search systems typically employ multi-stage retrieval architectures, where an initial stage generates a set of candidate documents that are then pruned and re-ranked. Since subsequent stages typically exploit a multitude of features of varying costs using machine-learned models, reducing the number of documents that are considered at each …