Torchaudio: Building Blocks for Audio and Speech Processing

Type: Article

Publication Date: 2022-04-27

Citations: 61

DOI: https://doi.org/10.1109/icassp43922.2022.9747236

Abstract

This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically differentiable, and production-ready. TorchAudio can be easily installed from Python Package Index repository and the source code is publicly available under a BSD-2-Clause License (as of September 2021) at https://github.com/pytorch/audio. In this document, we provide an overview of the design principles, functionalities, and benchmarks of TorchAudio. We also benchmark our implementation of several audio and speech operations and models. We verify through the benchmarks that our implementations of various operations and models are valid and perform similarly to other publicly available implementations.

Locations

  • arXiv (Cornell University) - View - PDF
  • ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - View

Similar Works

Action Title Year Authors
+ TorchAudio: Building Blocks for Audio and Speech Processing 2021 Yao-Yuan Yang
Moto Hira
Zhaoheng Ni
Anjali Chourdia
Artyom Astafurov
Caroline Chen
Ching-Feng Yeh
Christian Puhrsch
David Pollack
Dmitriy Genzel
+ PDF Chat TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch 2023 Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
Guangzhi Sun
Pingchuan Ma
Ruizhe Huang
Vineel Pratap
Yuekai Zhang
+ TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch 2023 Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
Guangzhi Sun
Pingchuan Ma
Ruizhe Huang
Vineel Pratap
Yuekai Zhang
+ PDF Chat Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models 2024 Haibin Wu
Xuanjun Chen
Yi‐Cheng Lin
Kai‐Wei Chang
Jiawei Du
Ke-Han Lu
Alexander H. Liu
Ho-Lam Chung
Yuan-Kuei Wu
Dongchao Yang
+ Lhotse: a speech data representation library for the modern deep learning ecosystem 2021 Piotr Żelasko
Daniel Povey
Jan Trmal
Sanjeev Khudanpur
+ PDF Chat Shennong: A Python toolbox for audio speech features extraction 2023 Mathieu Bernard
Maxime Poli
Julien Karadayi
Emmanuel Dupoux
+ Audiodec: An Open-Source Streaming High-Fidelity Neural Audio Codec 2023 Yi-Chiao Wu
Israel D. Gebru
Dejan Marković
Alexander Richard
+ PDF Chat Overview of the Amphion Toolkit (v0.2) 2025 Jiaqi Li
Xueyao Zhang
Yuancheng Wang
Haorui He
Chaoren Wang
Lijun Wang
Huan Liao
J Ão
Zhihui Xie
Yiqiao Huang
+ DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data 2021 Shahin Amiriparian
Tobias Hübner
Maurice Gerczuk
Sandra Ottl
Björn Schüller
+ PDF Chat The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge 2024 Yiwei Guo
Chenrun Wang
Yifan Yang
Hankun Wang
Ziyang Ma
Chenpeng Du
Shuai Wang
H. B. Li
Shuai Fan
Hui Zhang
+ PDF Chat The ICME 2025 Audio Encoder Capability Challenge 2025 Junbo Zhang
Heinrich Dinkel
Qiong Song
Helen Wang
Y. Niu
Cheng Si
Xiaofeng Xin
Ke Li
Wenwu Wang
Yujun Wang
+ PDF Chat ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech 2024 Jiatong Shi
Jinchuan Tian
Yihan Wu
Jee-weon Jung
Jia Qi Yip
Yoshiki Masuyama
William Chen
Yuning Wu
Yuxun Tang
Massa Baali
+ Low-complexity deep learning frameworks for acoustic scene classification 2022 Lam Pham
Dat Ngo
Anahid Jalali
Alexander Schindler
+ Transformer-based Sequence Labeling for Audio Classification based on MFCCs 2023 C. S. Sonali
B S Chinmayi
Ahana Balasubramanian
+ PDF Chat Audio-Language Datasets of Scenes and Events: A Survey 2024 Gijs Wijngaard
Elia Formisano
Michele Esposito
Michel Dumontier
+ The PyTorch-Kaldi Speech Recognition Toolkit 2018 Mirco Ravanelli
Titouan Parcollet
Yoshua Bengio
+ The PyTorch-Kaldi Speech Recognition Toolkit 2018 Mirco Ravanelli
Titouan Parcollet
Yoshua Bengio
+ PDF Chat The Pytorch-kaldi Speech Recognition Toolkit 2019 Mirco Ravanelli
Titouan Parcollet
Yoshua Bengio
+ EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition 2020 Chengyu Wang
Mengli Cheng
Hu Xu
Jun Huang
+ PDF Chat EasyASR: A Distributed Machine Learning Platform for End-to-end Automatic Speech Recognition 2021 Chengyu Wang
Mengli Cheng
Hu Xu
Jun Huang

Works That Cite This (23)

Action Title Year Authors
+ Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge 2024 Simon Leglaive
Matthieu Fraticelli
Hend Elghazaly
Léonie Borne
Mostafa Sadeghi
Scott Wisdom
Manuel Pariente
John R. Hershey
Daniel Pressnitzer
Jon Barker
+ GANStrument: Adversarial Instrument Sound Synthesis with Pitch-Invariant Instance Conditioning 2023 Gaku Narita
Junichi Shimizu
Taketo Akama
+ PDF Chat A Large-Scale Evaluation of Speech Foundation Models 2024 Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Lai
Haibin Wu
Jiatong Shi
Xuankai Chang
Hsiang-Sheng Tsai
Wen-Chin Huang
+ TorchGeo: Deep Learning With Geospatial Data 2024 Adam J. Stewart
Caleb Robinson
Isaac Corley
Anthony Ortiz
Juan Lavista Ferres
Arindam Banerjee
+ PDF Chat TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch 2023 Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
Guangzhi Sun
Pingchuan Ma
Ruizhe Huang
Vineel Pratap
Yuekai Zhang
+ PDF Chat The Impact of Silence on Speech Anti-Spoofing 2023 Yuxiang Zhang
Zhuo Li
Jingze Lu
Hua Hua
Wenchao Wang
Pengyuan Zhang
+ ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit 2023 Brian Yan
Jiatong Shi
Yun Tang
Hirofumi Inaguma
Yifan Peng
Siddharth Dalmia
Peter Polák
Patrick Fernandes
Dan Berrebbi
Tomoki Hayashi
+ PDF Chat LPCSE: Neural Speech Enhancement through Linear Predictive Coding 2022 Yang Liu
Na Tang
Xiaoli Chu
Yang Yang
Jun Wang
+ Soft Label Coding for end-to-end Sound Source Localization with ad-hoc Microphone Arrays 2023 Linfeng Feng
Yijun Gong
Xiao-Lei Zhang
+ End-to-End Spoken Language Understanding Using Joint CTC Loss and Self-Supervised, Pretrained Acoustic Encoders 2023 Jixuan Wang
Martin Radfar
Kai Wei
Clement Chung

Works Cited by This (19)

Action Title Year Authors
+ Deep Speech: Scaling up end-to-end speech recognition 2014 Awni Hannun
Carl Case
Jared Casper
Bryan Catanzaro
Greg Diamos
Erich Elsen
Ryan Prenger
Sanjeev Satheesh
Shubho Sengupta
Adam Coates
+ Wav2Letter: an End-to-End ConvNet-based Speech Recognition System 2016 Ronan Collobert
Christian Puhrsch
Gabriel Synnaeve
+ PDF Chat fairseq: A Fast, Extensible Toolkit for Sequence Modeling 2019 Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
+ Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation 2019 Yi Luo
Nima Mesgarani
+ PDF Chat ESPnet: End-to-End Speech Processing Toolkit 2018 Shinji Watanabe
Takaaki Hori
Shigeki Karita
Tomoki Hayashi
Jiro Nishitoba
Yuya Unno
Nelson Enrique Yalta Soplin
Jahn Heymann
Matthew Wiesner
Nanxin Chen
+ PDF Chat Waveglow: A Flow-based Generative Network for Speech Synthesis 2019 Ryan Prenger
Rafael Valle
Bryan Catanzaro
+ PDF Chat The Pytorch-kaldi Speech Recognition Toolkit 2019 Mirco Ravanelli
Titouan Parcollet
Yoshua Bengio
+ PDF Chat Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions 2018 Jonathan Shen
Ruoming Pang
Ron J. Weiss
Mike Schuster
Navdeep Jaitly
Zongheng Yang
Zhifeng Chen
Yu Zhang
Yuxuan Wang
Rj Skerrv-Ryan
+ PyTorch: An Imperative Style, High-Performance Deep Learning Library 2019 Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
Gregory Chanan
Trevor Killeen
Zeming Lin
Natalia Gimelshein
Luca Antiga
+ NeMo: a toolkit for building AI applications using Neural Modules 2019 Oleksii Kuchaiev
Jason Li
Huyen Nguyen
Oleksii Hrinchuk
R. Bret Leary
Boris Ginsburg
Samuel Kriman
Stanislav Beliaev
Vitaly Lavrukhin
Jack Cook