Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Type: Article

Publication Date: 2022-09-16

Citations: 4

DOI: https://doi.org/10.21437/interspeech.2022-11382

Abstract

Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST).Popular VAD tools like WebRTC VAD 1 have generally relied on pause-based segmentation.Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD.In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus.We also propose a hybrid method that combines VAD and the above speech segmentation method.Experimental results reveal that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods.The hybrid approach further improves the translation performance.

Locations

  • arXiv (Cornell University) - View - PDF
  • Interspeech 2022 - View

Similar Works

Action Title Year Authors
+ Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation 2022 Ryo Fukuda
Katsuhito Sudoh
Satoshi Nakamura
+ Improving Speech Translation Accuracy and Time Efficiency with Fine-tuned wav2vec 2.0-based Speech Segmentation 2023 Ryo Fukuda
Katsuhito Sudoh
Satoshi Nakamura
+ PDF Chat Improving Speech Translation Accuracy and Time Efficiency with Fine-tuned wav2vec 2.0-based Speech Segmentation 2023 Ryo Fukuda
Katsuhito Sudoh
Satoshi Nakamura
+ PDF Chat SHAS: Approaching optimal Segmentation for End-to-End Speech Translation 2022 Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa‐jussà
+ SHAS: Approaching optimal Segmentation for End-to-End Speech Translation 2022 Ioannis Tsiamas
Gerard I. Gállego
José A. R. Fonollosa
Marta R. Costa‐jussà
+ PDF Chat Lightweight Audio Segmentation for Long-form Speech Translation 2024 Jaesong Lee
So Yoon Kim
Hanbyul Kim
Joon Son Chung
+ PDF Chat Lightweight Audio Segmentation for Long-form Speech Translation 2024 Jaesong Lee
So Yoon Kim
Hanbyul Kim
Joon Son Chung
+ Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead 2022 Piyush Behre
Naveen Parihar
Sharman Tan
Amy Shah
Eva Sharma
Geoffrey Liu
Shuangyu Chang
Hosam Khalil
Chris Basoglu
Dev S. Pathak
+ PDF Chat End-to-End Simultaneous Speech Translation with Differentiable Segmentation 2023 Shaolei Zhang
Yan Feng
+ End-to-End Simultaneous Speech Translation with Differentiable Segmentation 2023 Shaolei Zhang
Yang Feng
+ Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation 2021 Marco Gaido
Matteo Negri
Mauro Cettolo
Marco Turchi
+ Learning When to Translate for Streaming Speech 2021 Qianqian Dong
Yaoming Zhu
Mingxuan Wang
Lei Li
+ PDF Chat Learning When to Translate for Streaming Speech 2022 Dong Qian
Yaoming Zhu
Mingxuan Wang
Lei Li
+ Contextualized Translation of Automatically Segmented Speech 2020 Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Mauro Cettolo
Marco Turchi
+ Contextualized Translation of Automatically Segmented Speech 2020 Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Mauro Cettolo
Marco Turchi
+ PDF Chat Contextualized Translation of Automatically Segmented Speech 2020 Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Mauro Cettolo
Marco Turchi
+ SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations 2022 Ioannis Tsiamas
José A. R. Fonollosa
Marta R. Costa‐jussà
+ Long-Form End-to-End Speech Translation via Latent Alignment Segmentation 2023 Peter Polák
Ondřej Bojar
+ SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations 2023 Ioannis Tsiamas
José Fonollosa
Marta R. Costa‐jussà
+ Long-form Simultaneous Speech Translation: Thesis Proposal 2023 Peter Polák