Action Labeling in Images and Video



Journal Title

Journal ISSN

Volume Title



Deep learning models that attempt to categorize visual content can benefit from being trained with additional information that may be, or may not be, available during deployment. To this end, this dissertation designed, developed, and evaluated methods inspired by the "Learning Using Privileged Information" framework, multimodal data fusion, and knowledge distillation to improve deep learning models' performance. These methods are assessed for the problems of: (i) recognizing carrying actions in "visible spectrum" and "near-infrared" images, as well as (ii) detecting questionable online video content. The experimental results demonstrated the effectiveness of the methods in four new datasets introduced within the context of this work to address the challenges of the problems mentioned above.



Deep learning, LUPI, Multimodal data fusion, Knowledge distillation


Portions of this document appear in: Smailis, Christos, Michalis Vrigkas, and Ioannis A. Kakadiaris. "Recaspia: Recognizing carrying actions in single images using privileged information." In 2019 IEEE International Conference on Image Processing (ICIP), pp. 26-30. IEEE, 2019; and in: Le, Ha, Christos Smailis, Lei Shi, and Ioannis Kakadiaris. "EDGE20: A cross spectral evaluation dataset for multiple surveillance problems." In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2685-2694. 2020; and in: Shafaei, Mahsa, Christos Smailis, Ioannis Kakadiaris, and Thamar Solorio. "A Case Study of Deep Learning-Based Multi-Modal Methods for Labeling the Presence of Questionable Content in Movie Trailers." In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp. 1297-1307. 2021.