Multilingual Jointly Trained Acoustic and Written Word Embeddings
Multilingual Jointly Trained Acoustic and Written Word Embeddings
Acoustic word embeddings (AWEs) are vector representations of spoken word segments. AWEs can be learned jointly with embeddings of character sequences, to generate phonetically meaningful embeddings of written words, or acoustically grounded word embeddings (AGWEs). Such embeddings have been used to improve speech retrieval, recognition, and spoken term discovery. In …