Virtual Multimodal Object Detection and Classification with Deep CNNs



Journal Title

Journal ISSN

Volume Title



In this thesis, first we present two powerful image enhancement methods, both originating from the domain of unlighting illumination normalization. Then, provided a gray-scale image database (physical image modality), we postulate that the images generated by any of the two image enhancement methods can be perceived as virtual imaging modalities. To verify this, we investigate the capability of the virtual imaging modalities to complement information accessible from the physical image modality, when used in tandem as inputs to supervised learning (machine learning) tasks. We begin with a simple score fusion scheme, based on the OpenBR face recognition suite, that evaluates similarity (match-scores) between subject faces in different images. Combining the face matching estimations of systems trained independently on physical and virtual image modalities, we obtain significant improvement in the overall face matching accuracy. Motivated by this, we design and implement a novel Convolutional Neural Network (CNN) architecture, based on the Faster R-CNN network for multi-class object detection and classification. Our architecture combines deep feature representations of the input images, generated by networks trained independently on physical and virtual imaging modalities. Furthermore, for the needs of evaluating sufficiently the overall performance of all multi-class object detectors used in our experiments, we introduce the Average Recall over Precision curve, an alternative more descriptive metric than the commonly used, single value, Mean Average Precision (MeanAP) metric. Average Recall metric allows us to compare in detail the expected accuracy of multi-class detection systems over-all the target object classes. Using the Average Recall metric, we demonstrate how our Multimodal Faster R-CNN based on artificial modalities architecture achieves higher accuracy in popular, challenging tasks of multi-class object detection and classification, relatively to its physical-modality-only peer model.



Deep neural networks, DNN, Convolutional Neural Networks, CNN, Deep learning, Multimodal deep learning, Deep feature fusion, Similarity score fusion, Face recognition, Face verification, Object detection, Object recognition, Region proposals, Multi-class object detection, Image enhancement, Illumination normalization, Faster R-CNN