Video Representation Learning Using Discriminative Pooling
Video Representation Learning Using Discriminative Pooling
Popular deep models for action recognition in videos generate independent predictions for short clips, which are then pooled heuristically to assign an action label to the full video segment. As not all frames may characterize the underlying action-indeed, many are common across multiple actions-pooling schemes that impose equal importance on …