Word Discovery in Visually Grounded, Self-Supervised Speech Models

Type: Article

Publication Date: 2022-09-16

Citations: 25

DOI: https://doi.org/10.21437/interspeech.2022-10652

Abstract

OursFigure 1: HuBERT: sum of attention weights each frame receives from other frames.Ours (VG-HuBERT3): attention weights each frame receives from the [CLS A] token.Attention weights from different attention heads are coded with different colors.

Locations

  • arXiv (Cornell University) - View - PDF
  • Interspeech 2022 - View

Similar Works

Action Title Year Authors
+ Word Discovery in Visually Grounded, Self-Supervised Speech Models 2022 Puyuan Peng
David Harwath
+ PDF Chat Visually Grounded Speech Models for Low-resource Languages and Cognitive Modelling 2024 Leanne Nortje
+ Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model 2023 Puyuan Peng
Shang-Wen Li
Okko Räsänen
Abdelrahman Mohamed
David Harwath
+ PDF Chat Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model 2024 Hung-Chieh Fang
Nai-Xuan Ye
Yi-Jen Shih
Puyuan Peng
Hsuan-Fu Wang
Layne Berry
Hung-yi Lee
David Harwath
+ PDF Chat Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model 2024 Hung-Chieh Fang
Nai-Xuan Ye
Yi-Jen Shih
Puyuan Peng
Hsuan-Fu Wang
Layne Berry
Hung-yi Lee
David Harwath
+ PDF Chat Speech Representation Analysis based on Inter- and Intra-Model Similarities 2024 Yassine El Kheir
Ahmed Ali
Shammur Absar Chowdhury
+ PDF Chat Visually Grounded Models of Spoken Language: A Survey of Datasets, Architectures and Evaluation Techniques 2022 Grzegorz Chrupała
+ PDF Chat Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input 2019 David Harwath
Adrià Recasens
Dídac Surís
Galen Chuang
Antonio Torralba
James Glass
+ PDF Chat Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input 2018 David Harwath
Adrià Recasens
Dídac Surís
Galen Chuang
Antonio Torralba
James Glass
+ Attention-Based Keyword Localisation in Speech using Visual Grounding 2021 Kayode Olaleye
Herman Kamper
+ PDF Chat Attention-Based Keyword Localisation in Speech Using Visual Grounding 2021 Kayode Olaleye
Herman Kamper
+ Attention-Based Keyword Localisation in Speech using Visual Grounding 2021 Kayode Olaleye
Herman Kamper
+ Visually grounded learning of keyword prediction from untranscribed speech 2017 Herman Kamper
Shane Settle
Gregory Shakhnarovich
Karen Livescu
+ PDF Chat Visually Grounded Learning of Keyword Prediction from Untranscribed Speech 2017 Herman Kamper
Shane Settle
Gregory Shakhnarovich
Karen Livescu
+ Visually grounded learning of keyword prediction from untranscribed speech 2017 Herman Kamper
Shane Settle
Gregory Shakhnarovich
Karen Livescu
+ Semantic Speech Retrieval With a Visually Grounded Model of Untranscribed Speech 2018 Herman Kamper
Gregory Shakhnarovich
Karen Livescu
+ PDF Chat What Do Self-Supervised Speech Models Know About Words? 2024 Ankita Pasad
Chung-Ming Chien
Shane Settle
Karen Livescu
+ What do self-supervised speech models know about words? 2023 Ankita Pasad
Chung-Ming Chien
Shane Settle
Karen Livescu
+ Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese 2019 William N. Havard
Jean‐Pierre Chevrot
Laurent Besacier
+ PDF Chat Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese 2019 William N. Havard
Jean‐Pierre Chevrot
Laurent Besacier

Works That Cite This (20)

Action Title Year Authors
+ Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples 2023 Hyeonggon Ryu
Arda Senocak
In So Kweon
Joon Son Chung
+ PDF Chat Generative Spoken Language Model based on continuous word-sized audio tokens 2023 Robin Algayres
Yossi Adi
Tu Anh Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
+ Computational Insights to Acquisition of Phonemes, Words, and Word Meanings in Early Language: Sequential or Parallel Acquisition? 2023 Khazar Khorrami
María Andrea Cruz Blandón
Okko Räsänen
+ Visually Grounded Speech Models Have a Mutual Exclusivity Bias 2024 Leanne Nortje
Dan Oneață
Yevgen Matusevych
Herman Kamper
+ PDF Chat Towards Visually Prompted Keyword Localisation for Zero-Resource Spoken Languages 2023 Leanne Nortje
Herman Kamper
+ PDF Chat ConceptBeam 2022 Yasunori Ohishi
Marc Delcroix
Tsubasa Ochiai
Shoko Araki
Daiki Takeuchi
Daisuke Niizumi
Akisato Kimura
Noboru Harada
Kunio Kashino
+ PDF Chat Word Segmentation on Discovered Phone Units With Dynamic Programming and Self-Supervised Scoring 2022 Herman Kamper
+ XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words 2023 Robin Algayres
Pablo Diego-Simón
Benoît Sagot
Emmanuel Dupoux
+ PDF Chat Self-Supervised Speech Representation Learning: A Review 2022 Abdelrahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob D. Havtorn
Joakim Edin
Christian Igel
Katrin Kirchhoff
Shang-Wen Li
Karen Livescu
Lars Maaløe
+ PDF Chat SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model 2023 Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath