Sparks of Large Audio Models: A Survey and Outlook

Type: Preprint

Publication Date: 2023-01-01

Citations: 8

DOI: https://doi.org/10.48550/arxiv.2308.12792

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT 2023 Jiaming Wang
Zhihao Du
Qian Chen
Yunfei Chu
Zhifu Gao
Zerui Li
Kai Hu
Xiaohuan Zhou
Jin Xu
Ziyang Ma
+ PDF Chat Audio-Language Models for Audio-Centric Tasks: A survey 2025 Yi Su
Jisheng Bai
Qisheng Xu
Ke Xu
Yong Dou
+ PDF Chat Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models 2024 Yiming Chen
Xianghu Yue
Xiaoxue Gao
Chen Zhang
Luis Fernando D’Haro
Robby T. Tan
Haizhou Li
+ PDF Chat AudioBench: A Universal Benchmark for Audio Large Language Models 2024 Bin Wang
Xunlong Zou
Geyu Lin
Shuo Sun
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
+ PDF Chat A Survey on Speech Large Language Models 2024 Jing Peng
Yucheng Wang
Yu Xi
Li Xu
Xiaoyu Zhang
Kaiping Yu
+ PDF Chat Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation 2024 Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
J. Zhuang
Xianzhao Chen
Xiaohai Tian
Jun Zhang
Guangzhi Sun
+ AudioPaLM: A Large Language Model That Can Speak and Listen 2023 Paul K. Rubenstein
Chulayuth Asawaroengchai
Duc Dung Nguyen
Ankur Bapna
Zalán Borsos
Félix de Chaumont Quitry
Peter Chen
Dalia El Badawy
Wei Han
Eugene Kharitonov
+ PDF Chat Overview of the Amphion Toolkit (v0.2) 2025 Jiaqi Li
Xueyao Zhang
Yuancheng Wang
Haorui He
Chaoren Wang
Lijun Wang
Huan Liao
J Ão
Zhihui Xie
Yiqiao Huang
+ Pengi: An Audio Language Model for Audio Tasks 2023 Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
+ PDF Chat PAM: Prompting Audio-Language Models for Audio Quality Assessment 2024 Soham Deshmukh
Dareen Alharthi
Benjamin Elizalde
Hannes Gamper
Mahmoud Al Ismail
Rita Singh
Bhiksha Raj
Huaming Wang
+ UniAudio: An Audio Foundation Model Toward Universal Audio Generation 2023 Dongchao Yang
Jinchuan Tian
Xu Tan
Rongjie Huang
Songxiang Liu
Xuankai Chang
Jiatong Shi
Sheng Zhao
Jiang Bian
Xixin Wu
+ Amphion: An Open-Source Audio, Music and Speech Generation Toolkit 2023 Xueyao Zhang
Liumeng Xue
Yuancheng Wang
Yicheng Gu
Xi Chen
Zihao Fang
Haopeng Chen
Lexiao Zou
Chaoren Wang
Jun Han
+ PDF Chat Whispy: Adapting STT Whisper Models to Real-Time Environments 2024 Antonio Bevilacqua
Paolo Saviano
Alessandro Amirante
Simon Pietro Romano
+ PDF Chat Recent Advances in Speech Language Models: A Survey 2024 Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
+ PDF Chat PAM: Prompting Audio-Language Models for Audio Quality Assessment 2024 Soham Deshmukh
Dareen Alharthi
Benjamin Elizalde
Hannes Gamper
Mahmoud Al Ismail
Rita Singh
Bhiksha Raj
Huaming Wang
+ PDF Chat UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner 2024 Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
+ PDF Chat Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model 2024 Zhen Ye
Peiwen Sun
Jiahe Lei
Hongzhan Lin
Xu Tan
Zheqi Dai
Qiuqiang Kong
Jianyi Chen
Jiahao Pan
Qifeng Liu
+ PDF Chat CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech 2024 Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
+ A Whisper transformer for audio captioning trained with synthetic captions and transfer learning 2023 Marek Kadlčík
Adam Hájek
Jürgen Kieslich
Radosław Winiecki
+ PDF Chat Whisper-GPT: A Hybrid Representation Audio Large Language Model 2024 Prateek Verma

Works Cited by This (0)

Action Title Year Authors