Ask a Question

Prefer a chat interface with context about you and your work?

SpotServe: Serving Generative Large Language Models on Preemptible Instances

SpotServe: Serving Generative Large Language Models on Preemptible Instances

The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPU resources at a much cheaper price …