Projects
Reading
People
Ask

SU\G(𝔸)/K·U

Projects
Reading
People
Ask

Sign Up

Ask a Question

Prefer a chat interface with context about you and your work?

Your Question

Related Paper

Word Discovery in Visually Grounded, Self-Supervised Speech Models

Word Discovery in Visually Grounded, Self-Supervised Speech Models

OursFigure 1: HuBERT: sum of attention weights each frame receives from other frames.Ours (VG-HuBERT3): attention weights each frame receives from the [CLS A] token.Attention weights from different attention heads are coded with different colors.

AI Backends

Gemini 2 Flash

GPT-4o

o3-mini

o1-mini

o1

Gemini 2 Pro

Sky-T1

DeepSeek R1

Claude 3 Opus

Claude 3.5 Sonnet

Claude 3.5 Haiku

Sugaku, Inc. Copyright 2024

Privacy Policy, Cookie Policy, Terms and Conditions