Ensuring Fair LLM Serving Amid Diverse Applications
Ensuring Fair LLM Serving Amid Diverse Applications
In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, …