Human Detection in the Wild




Journal Title

Journal ISSN

Volume Title



Human detection remains a challenging task due to the problems caused by occlusion variance. Visible-body bounding boxes are typically used as an extra supervision signal to improve the performance of human detection. However, visible-body assisted approaches produce a large number of false positives, which result from a lack of adequate and discriminative full-body contextual information. As the most discriminative features of head and human, face detection has attracted much attention. Despite the great progress that has been achieved for accurate face detection, detecting multi-scale faces, especially for small faces, remains a challenging problem. Existing approaches that tackle multi-scale face detection problem could be categorized into two-stage face detectors and single-stage face detectors. Regarding two-stage face detectors, to learn discriminative facial features at various scales, the input pyramids or multi-scale feature maps are deployed to provide more facial information for the network to learn features in various scales. However, they could increase the training difficulty and complexity of the network. Regarding single-stage face detectors, feature fusion and context aggregation have been used to enrich contextual information. However, treating reliable information and noise equally could result in much noise in the fused features at different levels. Moreover, dilated convolutions in the context aggregation module could result in the gridding artifacts problem. The goal of this dissertation is to design, develop, and evaluate human detection algorithms to solve the above problems. Three contributions made in this dissertation could be summarized as follows: (i) A decoupled visible region network for human detection was designed, developed, and evaluated to overcome the occlusion challenge. The proposed human detector improved performance from MR−2 of 11.24 to MR−2 of 10.50 when compared to Bi-box which is inspired by our work on the CityPersons dataset. (ii) A two-stage face detector was designed, developed, and evaluated to overcome scale challenge. It improves performance by mAP of 12.1% when compared to our baseline on the WIDER FACE dataset. (iii) A single-stage face detector was designed, developed, and evaluated to overcome scale challenge. The proposed method achieves the best performance with an mAP of 77.0% on the UFDD dataset.



Human detection, face detection


Portions of this document appear in: L. Shi, X. Xu, I. A. Kakadiaris. “SSFD+: A Robust Two-stage Face Detector,”IEEETransaction on Biometrics, Behavior, and Identity Science, 1(2019), 181-191.; L. Shi, X. Xu, I. A. Kakadiaris. “SANet: Smoothed Attention Netwok for Single-stage FaceDetector,” InProceedings of IEEE International Conference on Biometrics, (Crete, Greece,2019), pp. 11-20.; L. Shi, X. Xu, I. A. Kakadiaris. “SSFD: A Face Detector using A Single-scale Feature Map,”InProceedings of IEEE International Conference on Biometrics: Theory, Applications, andSystems, (LA, CA, 2018), pp. 1-10.