+
PDF
Chat
|
Deep Residual Learning for Image Recognition
|
2016
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
4
|
+
PDF
Chat
|
Identity Mappings in Deep Residual Networks
|
2016
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
3
|
+
|
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
|
2019
|
Mingxing Tan
Quoc V. Le
|
2
|
+
PDF
Chat
|
End-to-End Speech Recognition from the Raw Waveform
|
2018
|
Neil Zeghidour
Nicolas Usunier
Gabriel Synnaeve
Ronan Collobert
Emmanuel Dupoux
|
2
|
+
PDF
Chat
|
VoxCeleb2: Deep Speaker Recognition
|
2018
|
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
|
2
|
+
PDF
Chat
|
Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System
|
2018
|
Weicheng Cai
Jinkun Chen
Ming Li
|
2
|
+
|
Attention is All you Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
|
2
|
+
PDF
Chat
|
Densely Connected Convolutional Networks
|
2017
|
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
|
2
|
+
PDF
Chat
|
Residual Attention Network for Image Classification
|
2017
|
Fei Wang
Mengqing Jiang
Chen Qian
Shuo Yang
Cheng Li
Honggang Zhang
Xiaogang Wang
Xiaoou Tang
|
2
|
+
PDF
Chat
|
GhostVLAD for Set-Based Face Recognition
|
2019
|
Yujie Zhong
Relja Arandjelović
Andrew Zisserman
|
2
|
+
|
Unified Hypersphere Embedding for Speaker Recognition
|
2018
|
Mahdi Hajibabaei
Dengxin Dai
|
2
|
+
PDF
Chat
|
CBAM: Convolutional Block Attention Module
|
2018
|
Sanghyun Woo
Jongchan Park
Joon‐Young Lee
In So Kweon
|
2
|
+
PDF
Chat
|
VoxCeleb: A Large-Scale Speaker Identification Dataset
|
2017
|
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
|
2
|
+
|
mixup: Beyond Empirical Risk Minimization
|
2017
|
Hongyi Zhang
Moustapha Cissé
Yann Dauphin
David López-Paz
|
2
|
+
PDF
Chat
|
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
|
2019
|
Jiankang Deng
Jia Guo
Niannan Xue
Stefanos Zafeiriou
|
2
|
+
PDF
Chat
|
Utterance-level Aggregation for Speaker Recognition in the Wild
|
2019
|
Weidi Xie
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
|
2
|
+
PDF
Chat
|
Self Multi-Head Attention for Speaker Recognition
|
2019
|
Miquel India
Pooyan Safari
Javier Hernando
|
2
|
+
PDF
Chat
|
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
|
2019
|
Lanhua You
Wu Guo
Li-Rong Dai
Jun Du
|
2
|
+
PDF
Chat
|
A Deep Neural Network for Short-Segment Speaker Recognition
|
2019
|
Amirhossein Hajavi
Ali Etemad
|
2
|
+
PDF
Chat
|
Attentive Statistics Pooling for Deep Speaker Embedding
|
2018
|
Koji Okabe
Takafumi Koshinaka
Koichi Shinoda
|
2
|
+
PDF
Chat
|
Trainable frontend for robust and far-field keyword spotting
|
2017
|
Yuxuan Wang
Pascal Getreuer
T. A. Hughes
Richard F. Lyon
Rif A. Saurous
|
1
|
+
PDF
Chat
|
CondenseNet: An Efficient DenseNet Using Learned Group Convolutions
|
2018
|
Gao Huang
Shichen Liu
Laurens van der Maaten
Kilian Q. Weinberger
|
1
|
+
PDF
Chat
|
Speaker Recognition from Raw Waveform with SincNet
|
2018
|
Mirco Ravanelli
Yoshua Bengio
|
1
|
+
|
PyTorch: An Imperative Style, High-Performance Deep Learning Library
|
2019
|
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
Gregory Chanan
Trevor Killeen
Zeming Lin
Natalia Gimelshein
Luca Antiga
|
1
|
+
|
Audio Tagging with Noisy Labels and Minimal Supervision
|
2019
|
Eduardo Fonseca
Manoj Plakal
Frederic Font
Daniel P. W. Ellis
Xavier Serra
|
1
|
+
|
A Simple Framework for Contrastive Learning of Visual Representations
|
2020
|
Ting Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
|
1
|
+
PDF
Chat
|
CGCNN: Complex Gabor Convolutional Neural Network on Raw Speech
|
2020
|
Paul-Gauthier Noé
Titouan Parcollet
Mohamed Morchid
|
1
|
+
PDF
Chat
|
Vggsound: A Large-Scale Audio-Visual Dataset
|
2020
|
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
|
1
|
+
|
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
|
2020
|
Alexei Baevski
Henry Zhou
Abdelrahman Mohamed
Michael Auli
|
1
|
+
|
An Ensemble of Convolutional Neural Networks for Audio Classification.
|
2020
|
Loris Nanni
Gianluca Maguolo
Sheryl Brahnam
Michelangelo Paci
|
1
|
+
|
FSD50K: an Open Dataset of Human-Labeled Sound Events
|
2020
|
Eduardo Fonseca
Xavier Favory
Jordi Pons
Frederic Font
Xavier Serra
|
1
|
+
|
Contrastive Learning of General-Purpose Audio Representations.
|
2020
|
Aaqib Saeed
David Grangier
Neil Zeghidour
|
1
|
+
PDF
Chat
|
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
|
2020
|
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
|
1
|
+
PDF
Chat
|
Unsupervised Contrastive Learning of Sound Event Representations
|
2021
|
Eduardo Fonseca
Diego Ortego
Kevin McGuinness
Noel E. O’Connor
Xavier Serra
|
1
|
+
|
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
|
2021
|
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdelrahman Mohamed
|
1
|
+
PDF
Chat
|
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
|
2021
|
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
Kunio Kashino
|
1
|
+
PDF
Chat
|
Towards Learning Universal Audio Representations
|
2022
|
Luyu Wang
Pauline Luc
Yan Wu
Adrià Recasens
Lucas Smaira
Andrew Brock
Andrew Jaegle
Jean-Baptiste Alayrac
Sander Dieleman
João Carreira
|
1
|
+
|
LEAF: A Learnable Frontend for Audio Classification
|
2021
|
Neil Zeghidour
Olivier Teboul
Félix de Chaumont Quitry
Marco Tagliasacchi
|
1
|
+
|
Unified Hypersphere Embedding for Speaker Recognition
|
2018
|
Mahdi Hajibabaei
Dengxin Dai
|
1
|
+
|
Representation Learning with Contrastive Predictive Coding
|
2018
|
Aäron van den Oord
Yazhe Li
Oriol Vinyals
|
1
|
+
|
Attention Is All You Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
|
1
|
+
|
Layer Normalization
|
2016
|
Jimmy Ba
Jamie Kiros
Geoffrey E. Hinton
|
1
|
+
|
Very Deep Convolutional Networks for Large-Scale Image Recognition
|
2014
|
Karen Simonyan
Andrew Zisserman
|
1
|
+
|
Attention-Based Models for Speech Recognition
|
2015
|
Jan Chorowski
Dzmitry Bahdanau
Dmitriy Serdyuk
Kyunghyun Cho
Yoshua Bengio
|
1
|
+
|
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
|
2015
|
Sergey Ioffe
Christian Szegedy
|
1
|
+
|
Deep Speech: Scaling up end-to-end speech recognition
|
2014
|
Awni Hannun
Carl Case
Jared Casper
Bryan Catanzaro
Greg Diamos
Erich Elsen
Ryan Prenger
Sanjeev Satheesh
Shubho Sengupta
Adam Coates
|
1
|
+
PDF
Chat
|
Deep Scattering Spectrum
|
2014
|
Joakim Andén
Stéphane Mallat
|
1
|
+
PDF
Chat
|
Rethinking the Inception Architecture for Computer Vision
|
2016
|
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jon Shlens
Zbigniew Wojna
|
1
|
+
|
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
|
2015
|
Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
Bryan Catanzaro
Jingdong Chen
Mike Chrzanowski
Adam Coates
Greg Diamos
|
1
|
+
|
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
|
2016
|
Christian Szegedy
Sergey Ioffe
Vincent Vanhoucke
Alexander A. Alemi
|
1
|