VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
We present a new large-scale multilingual video description dataset, VATEX <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> , which contains over 41,250 videos and 825, 000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSRVTT dataset [64], VATEX is multilingual, larger, …