Multistage linguistic conditioning of convolutional layers for speech emotion recognition
Multistage linguistic conditioning of convolutional layers for speech emotion recognition
Introduction The effective fusion of text and audio information for categorical and dimensional speech emotion recognition (SER) remains an open issue, especially given the vast potential of deep neural networks (DNNs) to provide a tighter integration of the two. Methods In this contribution, we investigate the effectiveness of deep fusion …