+
|
Attention Is All You Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
|
2
|
+
PDF
Chat
|
On Vision Features in Multimodal Machine Translation
|
2022
|
Bei Li
Chuanhao Lv
Zefan Zhou
Tao Zhou
Tong Xiao
Anxiang Ma
Jingbo Zhu
|
2
|
+
|
How2: A Large-scale Dataset for Multimodal Language Understanding
|
2018
|
Ramon Sanabria
Ozan Çağlayan
Shruti Palaskar
Desmond Elliott
Loïc Barrault
Lucia Specia
Florian Metze
|
2
|
+
|
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
|
2020
|
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
Georg Heigold
Sylvain Gelly
|
2
|
+
|
Imagination improves Multimodal Translation
|
2017
|
Desmond Elliott
Ákos Kádár
|
1
|
+
PDF
Chat
|
A Spelling Correction Model for End-to-end Speech Recognition
|
2019
|
Jinxi Guo
Tara N. Sainath
Ron J. Weiss
|
1
|
+
PDF
Chat
|
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
|
2019
|
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
|
1
|
+
PDF
Chat
|
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
|
2018
|
Chung‐Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
Zhifeng Chen
Anjuli Kannan
Ron J. Weiss
Kanishka Rao
Ekaterina Gonina
|
1
|
+
PDF
Chat
|
Cross-lingual Visual Verb Sense Disambiguation
|
2019
|
Spandana Gella
Desmond Elliott
Frank Keller
|
1
|
+
PDF
Chat
|
Probing the Need for Visual Context in Multimodal Machine Translation
|
2019
|
Ozan Çağlayan
Pranava Madhyastha
Lucia Specia
Loïc Barrault
|
1
|
+
PDF
Chat
|
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
|
2017
|
João Carreira
Andrew Zisserman
|
1
|
+
|
A Call for Clarity in Reporting BLEU Scores
|
2018
|
Matt Post
|
1
|
+
|
Multi30K: Multilingual English-German Image Descriptions
|
2016
|
Desmond Elliott
Stella Frank
Khalil Sima’an
Lucia Specia
|
1
|
+
|
XLNet: Generalized Autoregressive Pretraining for Language Understanding
|
2019
|
Zhilin Yang
Zihang Dai
Yiming Yang
Jaime Carbonell
Ruslan Salakhutdinov
Quoc V. Le
|
1
|
+
PDF
Chat
|
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
|
2019
|
Mikel Artetxe
Holger Schwenk
|
1
|
+
PDF
Chat
|
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
|
2019
|
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Šivic
|
1
|
+
PDF
Chat
|
VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
|
2019
|
Xin Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan‐Fang Wang
William Yang Wang
|
1
|
+
PDF
Chat
|
SlowFast Networks for Video Recognition
|
2019
|
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
|
1
|
+
PDF
Chat
|
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training
|
2020
|
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
|
1
|
+
|
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
|
2020
|
Di Qi
Lin Su
Jia Song
Edward Cui
Taroon Bharti
Arun Sacheti
|
1
|
+
|
A Simple Framework for Contrastive Learning of Visual Representations
|
2020
|
Ting Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
|
1
|
+
|
Revisiting Round-Trip Translation for Quality Estimation
|
2020
|
Jihyung Moon
Hyunchang Cho
Eunjeong L. Park
|
1
|
+
PDF
Chat
|
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
|
2020
|
Yongjing Yin
Fandong Meng
Jinsong Su
Chulun Zhou
Zhengyuan Yang
Jie Zhou
Jiebo Luo
|
1
|
+
|
BLEURT: Learning Robust Metrics for Text Generation
|
2020
|
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
|
1
|
+
|
Keyframe Segmentation and Positional Encoding for Video-guided Machine Translation Challenge 2020
|
2020
|
Tosho Hirasawa
Zhishen Yang
Mamoru Komachi
Naoaki Okazaki
|
1
|
+
PDF
Chat
|
End-to-End Named Entity Recognition from English Speech
|
2020
|
Hemant Yadav
Sreyan Ghosh
Yi Yu
Rajiv Ratn Shah
|
1
|
+
PDF
Chat
|
TED-LIUM 3: Twice as Much Data and Corpus Repartition for Experiments on Speaker Adaptation
|
2018
|
François Hernandez
Vincent Nguyen
Sahar Ghannay
Natalia Tomashenko
Yannick Estève
|
1
|
+
PDF
Chat
|
Dynamic Context-guided Capsule Network for Multimodal Machine Translation
|
2020
|
Huan Lin
Fandong Meng
Jinsong Su
Yongjing Yin
Zhengyuan Yang
Yubin Ge
Jie Zhou
Jiebo Luo
|
1
|
+
|
COMET: A Neural Framework for MT Evaluation
|
2020
|
Ricardo Rei
Craig Stewart
Ana C Farinha
Alon Lavie
|
1
|
+
PDF
Chat
|
Learning to Localize Actions from Moments
|
2020
|
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
|
1
|
+
|
MultiSubs: A Large-scale Multimodal and Multilingual Dataset
|
2021
|
Josiah Wang
Pranava Madhyastha
Josiel Maimone de Figueiredo
Chiraag Lala
Lucia Specia
|
1
|
+
|
Cross-lingual Visual Pre-training for Multimodal Machine Translation
|
2021
|
Ozan Çağlayan
Menekşe Kuyu
Mustafa Sercan Amac
Pranava Madhyastha
Erkut Erdem
Aykut Erdem
Lucia Specia
|
1
|
+
|
NeurST: Neural Speech Translation Toolkit
|
2021
|
Chengqi Zhao
Mingxuan Wang
Qianqian Dong
Rong Ye
Lei Li
|
1
|
+
PDF
Chat
|
Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models
|
2021
|
Jiaoda Li
Duygu Ataman
Rico Sennrich
|
1
|
+
PDF
Chat
|
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10,000 Hours of Transcribed Audio
|
2021
|
Guoguo Chen
Shuzhou Chai
Guanbo Wang
Jiayu Du
Wei-Qiang Zhang
Chao Weng
Dan Su
Daniel Povey
Jan Trmal
Junbo Zhang
|
1
|
+
PDF
Chat
|
XGPT: Cross-modal Generative Pre-Training for Image Captioning
|
2021
|
Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Ming Zhou
|
1
|
+
|
VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation
|
2022
|
Yihang Li
Shuichiro Shimizu
Weiqi Gu
Chenhui Chu
Sadao Kurohashi
|
1
|
+
PDF
Chat
|
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
|
2022
|
Dan Oneață
Horia Cucu
|
1
|
+
|
Flamingo: a Visual Language Model for Few-Shot Learning
|
2022
|
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
Arthur Mensch
Katie Millican
Malcolm Reynolds
|
1
|
+
|
Unsupervised Speech Recognition
|
2021
|
Alexei Baevski
Wei-Ning Hsu
Alexis Conneau
Michael Auli
|
1
|
+
PDF
Chat
|
Understanding the Behaviour of Contrastive Loss
|
2020
|
Feng Wang
Huaping Liu
|
1
|
+
|
On the Evaluation of Machine Translation for Terminology Consistency
|
2021
|
Md Mahfuz ibn Alam
Antonios Anastasopoulos
Laurent Besacier
James H. Cross
Matthias Gallé
Philipp Koehn
Vassilina Nikoulina
|
1
|
+
|
Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models
|
2022
|
Ninareh Mehrabi
Palash Goyal
Apurv Verma
Jwala Dhamala
Varun Kumar
Qian Hu
Kai-Wei Chang
Richard S. Zemel
Aram Galstyan
Rahul Gupta
|
1
|
+
PDF
Chat
|
Sub-word Level Lip Reading With Visual Attention
|
2022
|
K R Prajwal
Triantafyllos Afouras
Andrew Zisserman
|
1
|
+
|
Exploring Better Text Image Translation with Multimodal Codebook
|
2023
|
Zhibin Lan
Jiawei Yu
Xiang Li
Wen Zhang
Jian Luan
Bin Wang
Degen Huang
Jinsong Su
|
1
|
+
PDF
Chat
|
Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation
|
2023
|
Yaoming Zhu
Zewei Sun
Shanbo Cheng
Luyang Huang
Liwei Wu
Mingxuan Wang
|
1
|
+
PDF
Chat
|
Deep Residual Learning for Image Recognition
|
2016
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
1
|
+
PDF
Chat
|
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
|
2016
|
Shaoqing Ren
Kaiming He
Ross Girshick
Jian Sun
|
1
|
+
|
Multimodal Pivots for Image Caption Translation
|
2016
|
Julian Hitschler
Shigehiko Schamoni
Stefan Riezler
|
1
|
+
|
Mixed Precision Training of Convolutional Neural Networks using Integer Operations
|
2018
|
Dipankar Das
Naveen Mellempudi
Dheevatsa Mudigere
Dhiraj D. Kalamkar
Sasikanth Avancha
Kunal Banerjee
Srinivas Sridharan
K. Vaidyanathan
Bharat Kaul
Evangelos Georganas
|
1
|