IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

Type: Preprint

Publication Date: 2025-01-19

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2501.11067

Abstract

Large Language Models (LLMs) are transforming artificial intelligence, evolving into task-oriented systems capable of autonomous planning and execution. One of the primary applications of LLMs is conversational AI systems, which must navigate multi-turn dialogues, integrate domain-specific APIs, and adhere to strict policy constraints. However, evaluating these agents remains a significant challenge, as traditional methods fail to capture the complexity and variability of real-world interactions. We introduce IntellAgent, a scalable, open-source multi-agent framework designed to evaluate conversational AI systems comprehensively. IntellAgent automates the creation of diverse, synthetic benchmarks by combining policy-driven graph modeling, realistic event generation, and interactive user-agent simulations. This innovative approach provides fine-grained diagnostics, addressing the limitations of static and manually curated benchmarks with coarse-grained metrics. IntellAgent represents a paradigm shift in evaluating conversational AI. By simulating realistic, multi-policy scenarios across varying levels of complexity, IntellAgent captures the nuanced interplay of agent capabilities and policy constraints. Unlike traditional methods, it employs a graph-based policy model to represent relationships, likelihoods, and complexities of policy interactions, enabling highly detailed diagnostics. IntellAgent also identifies critical performance gaps, offering actionable insights for targeted optimization. Its modular, open-source design supports seamless integration of new domains, policies, and APIs, fostering reproducibility and community collaboration. Our findings demonstrate that IntellAgent serves as an effective framework for advancing conversational AI by addressing challenges in bridging research and deployment. The framework is available at https://github.com/plurai-ai/intellagent

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Towards Objectively Benchmarking Social Intelligence for Language Agents at Action Level 2024 Chenxu Wang
Bin Dai
Huaping Liu
Baoyuan Wang
+ AgentBench: Evaluating LLMs as Agents 2023 Xiao Liu
Hao Yu
Hanchen Zhang
Yifan Xu
Xuanyu Lei
Hanyu Lai
Yu‐Cheng Gu
Hangliang Ding
Kaiwen Men
Kejuan Yang
+ PDF Chat TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation 2024 Yaoxiang Wang
Zhiyong Wu
Junfeng Yao
Jinsong Su
+ PDF Chat CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents 2024 Tianqi Xu
Linyao Chen
Dai-Jie Wu
Yanjun Chen
Zecheng Zhang
Xiang Yao
Zhiqiang Xie
Yongchao Chen
Shilong Liu
Bochen Qian
+ PDF Chat Multi-Agent Large Language Models for Conversational Task-Solving 2024 J. K. Becker
+ PDF Chat AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios 2024 Xinyi Mou
Jingcong Liang
Jiayu Lin
Xinnong Zhang
Xiawei Liu
Shiyue Yang
Rong Ye
Lei Chen
Haoyu Kuang
Xuanjing Huang
+ PDF Chat clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents 2024 Anne Beyer
Kranti Chalamalasetti
Sherzod Hakimov
Brielen Madureira
Philipp Sadler
David Schlangen
+ PDF Chat Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines 2024 Yunsu Kim
AhmedElmogtaba Abdelaziz
Thiago Castro Ferreira
Mohamed Al-Badrashiny
Hassan Sawaf
+ AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents 2024 Chang Ma
Junlei Zhang
Zhihao Zhu
Cheng Yang
Yujiu Yang
Yaohui Jin
Zhenzhong Lan
Lingpeng Kong
Junxian He
+ PDF Chat ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents 2024 Vardhan Dongre
Xiaocheng Yang
Emre Can Acikgoz
Suvodip Dey
Gökhan Tür
Hakkani-Tur Dilek
+ PDF Chat A Survey on Complex Tasks for Goal-Directed Interactive Agents 2024 Mareike Hartmann
Alexander Koller
+ PDF Chat DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents 2024 Jiho Kim
Woosog Chay
H.-M. Hwang
Daeun Kyung
Hyunseung Chung
Eunbyeol Cho
Yohan Jo
Edward Kwok Yiu Choi
+ PDF Chat Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence 2024 Weize Chen
Ziming You
Li Ran
Yitong Guan
Qian Chen
Chenyang Zhao
Cheng Yang
Ruobing Xie
Zhiyuan Liu
Maosong Sun
+ Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents 2023 Yang Deng
Wenxuan Zhang
Wai Lam
See-Kiong Ng
Tat‐Seng Chua
+ PDF Chat AutoAgents: A Framework for Automatic Agent Generation 2024 Guangyao Chen
Siwei Dong
Shu Yu
Ge Zhang
Jaward Sesay
Börje F. Karlsson
Jie Fu
Yemin Shi
+ AutoAgents: A Framework for Automatic Agent Generation 2023 Guangyao Chen
Siwei Dong
Yu Shu
Ge Zhang
Jaward Sesay
Börje F. Karlsson
Jie Fu
Yemin Shi
+ PDF Chat Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning 2024 J. D.Z. Chen
Juhao Liang
Benyou Wang
+ LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models 2023 Marwa Abdulhai
Isadora White
Charlie Snell
Charles Sun
Joey Hong
Yuexiang Zhai
Kelvin Xu
Sergey Levine
+ PDF Chat Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models 2024 David Castillo-Bolado
Joseph K. Davidson
Fran Gray
M. Khairul Amri Rosa
+ PDF Chat Planning with Large Language Models for Conversational Agents 2024 Zhigen Li
Jianxiang Peng
Yanmeng Wang
Tianhao Shen
Minghui Zhang
Linxi Su
Shang Wu
Yihang Wu
Yuqian Wang
Ye Wang

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors