Modeling Local Behavior for Multi-Person Tracking



Journal Title

Journal ISSN

Volume Title



Multiple-pedestrian tracking in unconstrained environments is an important task that has received considerable attention from the computer vision community in the past two decades. Accurate multiple-pedestrian tracking can greatly improve the performance of activity recognition and analysis of high level events through a surveillance system.

Traditional approaches to pedestrian tracking build a motion prediction model to track the target. With improvements in object detection methods, recent approaches replace the motion prediction stage and track targets by selecting among the outputs of a detector. To incorporate the merit of traditional and recent approaches, we have developed a novel approach using an ensemble framework that optimally chooses target tracking results from that of independent trackers and a detector at each time step. The compound model is designed to select the best candidate scored by a function integrating detection confidence, appearance affinity, and smoothness constraints.

To further improve the tracking performance we focus on the design of a novel motion prediction model. Human interaction behavior is known to play an important role in human motion. We present a novel tracking approach utilizing human collision avoidance behavior, which is motivated by the human vision system. The model predicts human motion based on modeling of perceived information. An attention map is designed to mimic human reasoning that integrates both spatial and temporal information.

We also develop an enhanced tracker that models human group behavior using a hierarchical group structures. The groups are identified by a bottom-up social group discovery method. The inter- and intra-group structures are modeled as a two-layer graph and tracking is posed as optimization of the integrated structure.

Finally, we propose another novel tracking method to unify multiple human behavior. To investigate the effects of potential multiple social behaviors, we present an algorithm that decomposes the combined social behaviors into multiple basic interaction modes, such as attraction, repulsion, and no interaction. We integrate these multiple social interaction modes into an interactive Markov Chain Monte Carlo tracker and demonstrate how the developed method translates into a more informed motion prediction, resulting in robust tracking performance.



Multi-person Tracking, Local Behavior


Portions of the document appear in: X. Yan, I. A. Kakadiaris, and S. K. Shah, "Predicting Social Interactions for Visual Tracking," Proceedings of the British Machine Vision Conference (2011): 102.1-102.11. And in: X. Yan, X. Wu, I. A. Kakadiaris and S. K. Shah, "To Track or To Detect? An Ensemble Framework for Optimal Selection," Proceedings of the European Conference on Computer Vision (2012): 594-607. And in: X. Yan, I. A. Kakadiaris, and S. K. Shah, "Modeling Local Behavior for Predicting Social Interactions for Visual Tracking," Patter Recognition 47(4) (2014): 1626-1641. And in: X. Yan, A. Cheriyadat, and S. K. Shah, "Hierarchical Group Structures in Multi-Person Tracking," International Conference on Pattern Recognition (2014).