SpotServe: Serving Generative Large Language Models on Preemptible Instances
SpotServe: Serving Generative Large Language Models on Preemptible Instances
The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPU resources at a much cheaper price …