Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Rethinking the Faster R-CNN Architecture for Temporal Action Localization
We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster RCNN object detection framework. TAL-Net addresses three key shortcomings of existing approaches: (1) we improve receptive field alignment using a multi-scale architecture that can accommodate extreme variation in action durations; (2) we …