SYMPHONY: Improving Memory Management for LLM Inference Workloads
SYMPHONY: Improving Memory Management for LLM Inference Workloads
Large Language Models (LLMs) are increasingly being deployed in applications such as chatbots, code editors, and conversational agents. A key feature of LLMs is their ability to engage in multi-turn interactions with humans or external tools, enabling a wide range of tasks. Each new request in a multi-turn interaction depends …