Ask AI a math question

This paper introduces SpecInfer, a system that accelerates generative large language model (LLM) serving with treebased speculative inference and verification.The key idea behind SpecInfer is leveraging small speculative models to predict the LLM's outputs; the predictions are organized as a token tree, whose nodes each represent a candidate token sequence.The …

Ask a Question