Ask a Question

Prefer a chat interface with context about you and your work?

Representations of language in a model of visually grounded speech signal

Representations of language in a model of visually grounded speech signal

We present a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a multi-layer recurrent highway network to model the temporal nature of spoken speech, and show that it learns to extract both form and meaning-based linguistic knowledge from the …