Ask a Question

Prefer a chat interface with context about you and your work?

Appearance-and-Relation Networks for Video Classification

Appearance-and-Relation Networks for Video Classification

Spatiotemporal feature learning in videos is a fundamental problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance …