A Computational Study of Visual Attention on Objects and Gestures during Infancy

Shah, Shishir Kirit2019-09-182019-09-18August 2012017-08August 201Portions of this document appear in: Mirsharif, Qazaleh, Sidharth Sadani, Shishir Shah, Hanako Yoshida, and Joseph Burling. "A Semi-Automated Method for Object Segmentation in Infant’s Egocentric Videos to Study Object Perception." In Proceedings of International Conference on Computer Vision and Image Processing, pp. 59-69. Springer, Singapore, 2017.https://hdl.handle.net/10657/4809Understanding the pathway to the development of visual attention and the role of vision in object name learning during infancy have been one of the focus of developmental studies over the years. Head cameras have been increasingly used in such studies as they provide a unique source of information about child's momentary visual experiences by approximating child's visual field and may yield new insights into what factors generate attention in infants. However, frame by frame analysis of such videos is cumbersome and time consuming and several parameters that impact child's visual attention such as constant motion of camera cannot be assessed by human analysis. In this thesis, we propose computer vision tools to help developmental scientists perform automated, fast and accurate analysis on videos collected from child-parent tabletop toy play. The computer vision tools in this thesis are used to further our understanding of the development of visual attention in children on objects and gestures. In the first stage of this thesis, we propose a semi-automated method for object segmentation in child's egocentric videos. The method is applied to large volume of videos and obtain binary masks of toy objects that are being used during child-parent toy play. Afterwards, the object masks are used to study how much of the time children visually attend to objects at progressive ages and the location of objects within their visual field. In the second stage, we propose an automated tool for analysis of motion patterns and parent's gestures in videos that are collected from child-parent toy play from third perspective and eye bird view. The proposed method employs an unsupervised clustering approach for clustering the videos into multiple groups by extracting dense trajectories from image sequences and using k-means clustering. Each motion group is further explored to study potential correlations of motion patterns in parent's gestures with object saliency in child's visual field. The proposed methods in this thesis, enable developmental scientists to explore unknown patterns in the development of child's visual by performing automated and accurate analysis on videos of child-parent toy play obtained from multiple views.application/pdfengThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).Object name learningInfant visual attentionHead cameraObject segmentationMotion analysisA Computational Study of Visual Attention on Objects and Gestures during Infancy2019-09-18Thesisborn digital