Ask AI a math question

Related Paper

Representations of language in a model of visually grounded speech signal

We present a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a multi-layer recurrent highway network to model the temporal nature of spoken speech, and show that it learns to extract both form and meaning-based linguistic knowledge from the …

Ask a Question