Figure 1. Overall procedure of the proposed algorithm. Our tracker exploits a pre-trained CNN for both image representation and target localization. Given a set of samples on the input frame, we first extract their features using a pre-trained CNN, and classify them by the online SVM trained until the previous time step. For each positive sample, we back-project the features relevant to target through the network to obtain a saliency map of the sample. Finally, tracking is performed by a sequential Bayesian filtering using the target-specific saliency map as observation. To this end, a generative model is learned from target appearances in the previous saliency maps, and a dense likelihood map is calculated by convolution between the appearance model and the target-specific saliency map. Based on the tracking result of the current frame, the SVM and generative model are updated for subsequent tracking.


We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN). Given a CNN pre-trained on a large-scale image repository in offline, our algorithm takes outputs from hidden layers of the network as feature descriptors since they show excellent representation performance in various general visual recognition problems. The features are used to learn discriminative target appearance models using an online Support Vector Machine (SVM). In addition, we construct target-specific saliency map by back-projecting CNN features with guidance of the SVM, and obtain the final tracking result in each frame based on the appearance model generatively constructed with the saliency map. Since the saliency map reveals spatial configuration of target effectively, it improves target localization accuracy and enables us to achieve pixel-level target segmentation. We verify the effectiveness of our tracking algorithm through extensive experiment on a challenging benchmark, where our method illustrates outstanding performance compared to the state-of-the-art tracking algorithms.


Video 1. Qualitative results of our tracker in selected sequences

Figure 2. Average success plot (left) and precision plot (right) over 50 sequences used in the benchmark[1].


Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network
Seunghoon Hong, Tackgeun You, Suha Kwak and Bohyung Han
ICML - International Conference on Machine Learning, 2015
[Paper] [Bibtex] [Supplementary]


Tracking results on the full benchmark 100 sequences[2]


References & Links

[1] Yi Wu, Jongwoo Lim and Ming-Hsuan Yang. Online Object Tracking: A Benchmark. in CVPR, 2013

[2] Visual Tracker Benchmark Dataset, Dataset Page