Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

Type: Preprint

Publication Date: 2023-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2309.12712

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR 2024 Yael Segal
Aviv Shamsian
Aviv Navon
Gill Hetz
Joseph Keshet
+ Diet deep generative audio models with structured lottery 2020 Philippe Esling
Ninon Devis
Adrien Bitton
Antoine Caillon
Axel Chemla--Romeu-Santos
Constance Douwes
+ PDF Chat Whispy: Adapting STT Whisper Models to Real-Time Environments 2024 Antonio Bevilacqua
Paolo Saviano
Alessandro Amirante
Simon Pietro Romano
+ PDF Chat Don’t Be So Sure! Boosting ASR Decoding via Confidence Relaxation 2023 Tomer Wullach
Shlomo E. Chazan
+ Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation 2022 Tomer Wullach
Shlomo E. Chazan
+ PDF Chat Efficient Compression of Multitask Multilingual Speech Models 2024 Thomas Palmeira Ferraz
+ PDF Chat AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost 2024 Ahmet Gündüz
Yunsu Kim
Kamer Ali Yüksel
Mohamed Al-Badrashiny
Thiago Castro Ferreira
Hassan Sawaf
+ PDF Chat Audio-Based Step-Count Estimation for Running - Windowing and Neural Network Baselines 2024 Philipp Wagner
Andreas Triantafyllopoulos
Alexander Gebhard
Björn Schüller
+ PDF Chat Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models 2024 Alkis Koudounas
Flavio Giobergia
+ PDF Chat Heterogeneity over Homogeneity: Investigating Multilingual Speech Pre-Trained Models for Detecting Audio Deepfake 2024 Orchid Chetia Phukan
Gautam Siddharth Kashyap
Arun Balaji Buduru
Rajesh Sharma
+ PDF Chat DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition 2024 Alexander Polok
Dominik Klement
Martin Kocour
Jiangyu Han
Federico Landini
Bolaji Yusuf
Matthew Wiesner
Sanjeev Khudanpur
Jaň Černocký
Lukáš Burget
+ One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition 2024 Samuele Cornell
Jee-weon Jung
Shinji Watanabe
Stefano Squartini
+ Investigating the Emergent Audio Classification Ability of ASR Foundation Models 2023 Rao Ma
Adian Liusie
Mark Gales
Kate Knill
+ Sparks of Large Audio Models: A Survey and Outlook 2023 Siddique Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Heriberto Cuayáhuitl
Björn Schüller
+ A Processing Framework to Access Large Quantities of Whispered Speech Found in ASMR 2023 Pablo Pérez Zarazaga
Gustav Eje Henter
Zofia Malisz
+ A processing framework to access large quantities of whispered speech found in ASMR 2023 Pablo Pérez Zarazaga
Gustav Eje Henter
Zofia Malisz
+ PDF Chat Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration 2024 Haowei Lou
Hye-Young Paik
Wenhao Hu
Lina Yao
+ PDF Chat Do You Listen with one or two Microphones? A Unified ASR Model for Single and Multi-Channel Audio 2022 Gokce Keskin
Minhua Wu
Brian King
Harish Mallidi
Yang Gao
Jasha Droppo
Ariya Rastrow
Roland Maas
+ Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio 2021 Gokce Keskin
Minhua Wu
Brian King
Harish Mallidi
Yang Gao
Jasha Droppo
Ariya Rastrow
Roland Maas
+ One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition 2023 Samuele Cornell
Jee-weon Jung
Shinji Watanabe
Stefano Squartini

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors