Human Detection in the Wild
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Human detection remains a challenging task due to the problems caused by occlusion variance. Visible-body bounding boxes are typically used as an extra supervision signal to improve the performance of human detection. However, visible-body assisted approaches produce a large number of false positives, which result from a lack of adequate and discriminative full-body contextual information. As the most discriminative features of head and human, face detection has attracted much attention. Despite the great progress that has been achieved for accurate face detection, detecting multi-scale faces, especially for small faces, remains a challenging problem. Existing approaches that tackle multi-scale face detection problem could be categorized into two-stage face detectors and single-stage face detectors. Regarding two-stage face detectors, to learn discriminative facial features at various scales, the input pyramids or multi-scale feature maps are deployed to provide more facial information for the network to learn features in various scales. However, they could increase the training difficulty and complexity of the network. Regarding single-stage face detectors, feature fusion and context aggregation have been used to enrich contextual information. However, treating reliable information and noise equally could result in much noise in the fused features at different levels. Moreover, dilated convolutions in the context aggregation module could result in the gridding artifacts problem. The goal of this dissertation is to design, develop, and evaluate human detection algorithms to solve the above problems. Three contributions made in this dissertation could be summarized as follows: (i) A decoupled visible region network for human detection was designed, developed, and evaluated to overcome the occlusion challenge. The proposed human detector improved performance from MR