Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

Hugo Malard, Salah Zaiem, Robin Algayres

Type: Preprint

Publication Date: 2023-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2309.12712

View Publication

Locations

arXiv (Cornell University) - View
DataCite API - View

Similar Works

Action	Title	Year	Authors
+ PDF Chat	Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR	2024	Yael Segal Aviv Shamsian Aviv Navon Gill Hetz Joseph Keshet
+	Diet deep generative audio models with structured lottery	2020	Philippe Esling Ninon Devis Adrien Bitton Antoine Caillon Axel Chemla--Romeu-Santos Constance Douwes
+ PDF Chat	Whispy: Adapting STT Whisper Models to Real-Time Environments	2024	Antonio Bevilacqua Paolo Saviano Alessandro Amirante Simon Pietro Romano
+ PDF Chat	Don’t Be So Sure! Boosting ASR Decoding via Confidence Relaxation	2023	Tomer Wullach Shlomo E. Chazan
+	Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation	2022	Tomer Wullach Shlomo E. Chazan
+ PDF Chat	Efficient Compression of Multitask Multilingual Speech Models	2024	Thomas Palmeira Ferraz
+ PDF Chat	AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost	2024	Ahmet Gündüz Yunsu Kim Kamer Ali Yüksel Mohamed Al-Badrashiny Thiago Castro Ferreira Hassan Sawaf
+ PDF Chat	Audio-Based Step-Count Estimation for Running - Windowing and Neural Network Baselines	2024	Philipp Wagner Andreas Triantafyllopoulos Alexander Gebhard Björn Schüller
+ PDF Chat	Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models	2024	Alkis Koudounas Flavio Giobergia
+ PDF Chat	Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake	2024	Orchid Chetia Phukan Gautam Siddharth Kashyap Arun Balaji Buduru Rajesh Sharma
+ PDF Chat	DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition	2024	Alexander Polok Dominik Klement Martin Kocour Jiangyu Han Federico Landini Bolaji Yusuf Matthew Wiesner Sanjeev Khudanpur Jaň Černocký Lukáš Burget
+	One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition	2024	Samuele Cornell Jee-weon Jung Shinji Watanabe Stefano Squartini
+	Investigating the Emergent Audio Classification Ability of ASR Foundation Models	2023	Rao Ma Adian Liusie Mark Gales Kate Knill
+	Sparks of Large Audio Models: A Survey and Outlook	2023	Siddique Latif Moazzam Shoukat Fahad Shamshad Muhammad Usama Heriberto Cuayáhuitl Björn Schüller
+	A Processing Framework to Access Large Quantities of Whispered Speech Found in ASMR	2023	Pablo Pérez Zarazaga Gustav Eje Henter Zofia Malisz
+	A processing framework to access large quantities of whispered speech found in ASMR	2023	Pablo Pérez Zarazaga Gustav Eje Henter Zofia Malisz
+ PDF Chat	Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration	2024	Haowei Lou Hye-Young Paik Wenhao Hu Lina Yao
+ PDF Chat	Do You Listen with one or two Microphones? A Unified ASR Model for Single and Multi-Channel Audio	2022	Gokce Keskin Minhua Wu Brian King Harish Mallidi Yang Gao Jasha Droppo Ariya Rastrow Roland Maas
+	Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio	2021	Gokce Keskin Minhua Wu Brian King Harish Mallidi Yang Gao Jasha Droppo Ariya Rastrow Roland Maas
+	One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition	2023	Samuele Cornell Jee-weon Jung Shinji Watanabe Stefano Squartini

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors