Lipreading with long short-term memory
Lipreading with long short-term memory
Lipreading, i.e. speech recognition from visual-only recordings of a speaker's face, can be achieved with a processing pipeline based solely on neural networks, yielding significantly better accuracy than conventional methods. Feedforward and recurrent neural network layers (namely Long Short-Term Memory; LSTM) are stacked to form a single structure which is …