Contextual Information for Applications in Video Surveillance
Wei, Li 1988-
MetadataShow full item record
With a growing network of cameras being used for security applications, video-based monitoring relying on human operators is ineffective and lacking in reliability and scalability. In this thesis, I present automatic solutions that enable monitoring of humans in videos, such as identifying same individuals across different cameras (human re-identification) and recognizing human activities. Analyzing videos using only individual-based features can be very challenging because of the significant appearance and motion variance due to the changing viewpoints, different lighting conditions, and occlusions. Motivated by the fact that people often form groups, it is feasible to model the interaction among group members to disambiguate the individual features in video analysis tasks. This thesis introduces features that leverage the human group as contextual information and demonstrates its performance for the tasks of human re-identification and activity recognition. Two descriptors are introduced for human re-identification. The Subject Centric Group (SCG) feature captures a person’s group appearance and shape information using the estimate of persons' positions in 3D space. The metric is designed to consider both human appearance and group similarity. The Spatial Appearance Group (SAG) feature extracts group appearance and shape information directly from video frames. A random-forest model is trained to predict the group's similarity score. For human activity recognition, I propose context features along with a deep model to recognize the individual subject’s activity in videos of real-world scenes. Besides the motion features of the person, I also utilize group context information and scene context information to improve the recognition performance. This thesis demonstrates the application of proposed features in both problems. Our experiments show that proposed features can reach state-of-the-art accuracy on challenging re-identification datasets that represent real-world scenario, and can also outperform state-of-the art human activity recognition methods on 5-activities and 6-activities versions of the Collective Activities dataset.