AST: Audio Spectrogram Transformer
AST: Audio Spectrogram Transformer
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for endto-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.To better capture long-range global context, a recent trend is to add a self-attention mechanism on …