Type: Article
Publication Date: 2022-09-16
Citations: 4
DOI: https://doi.org/10.21437/interspeech.2022-11382
Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST).Popular VAD tools like WebRTC VAD 1 have generally relied on pause-based segmentation.Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD.In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus.We also propose a hybrid method that combines VAD and the above speech segmentation method.Experimental results reveal that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods.The hybrid approach further improves the translation performance.