Automated Lecture Video Indexing With Text Analysis and Machine Learning

dc.contributor.advisorSubhlok, Jaspal
dc.contributor.committeeMemberVerma, Rakesh M.
dc.contributor.committeeMemberJohnson, Olin
dc.contributor.committeeMemberShah, Shishir Kirit
dc.contributor.committeeMemberLiu, Youmei
dc.creatorTuna, Tayfun 1981-
dc.date.accessioned2017-06-13T19:26:51Z
dc.date.available2017-06-13T19:26:51Z
dc.date.createdMay 2015
dc.date.issued2015-05
dc.date.submittedMay 2015
dc.date.updated2017-06-13T19:26:53Z
dc.description.abstractVideos recorded during in-class teaching and made accessible online are a versatile resource on par with a textbook and the classroom itself. Nonetheless, the adoption of lecture videos has been limited, in large part due to the difficulty of quickly accessing the content of interest in a long video lecture. Video indexing, dividing the video into meaningful segments, can significantly improve the accessibility. In this work, we present automatic text-based approaches and machine learning for indexing lecture videos to provide topic-based segmentation. Various text-based indexing algorithms were developed to identify topic transition in video. The indexing algorithms merge neighboring video segments with high text similarity to form topic segments which are represented by index points. In general, it is not clear which feature in a video slide is important for detecting topic change. Therefore, we propose another video indexing approach using machine learning which can use all possible features such as the number of words in a slide, n-grams, title or text with large font size. Among the state of the art machine learning algorithms, ensemble models such as Random Forest and Bagging were found efficient and practical to use. They also provide probability distributions which enables the user to choose a desired number of index points. Evaluation was done on a set of twenty-five lecture videos from courses in Computer Science, Biology, and Earth and Atmospheric Science. The ground truth is established by asking the lecture instructor to manually identify topic transitions in the video. Information gain experiment with machine learning shows that the words with large font size, the words that appear in the video for the first time, and n-gram frequency differences between video slides are important features for identifying the topic transitions in a lecture video. Experimental results shows that text-based indexing provides significant improvement over non-text-based approach and indexing with machine learning provides approximately 80% indexing accuracy on average. An important observation was that, there are significant differences when the topics are manually identified by multiple users who are very familiar with the content. Although further enhancements could improve the performance of video indexing, the performance gains are not expected to reach the ideal output because of the uncertain nature of the ground truth.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.citationThis work contains textual materials and illustrations that first appeared in: Tuna, T., Subhlok, J., and Shah, S. "Indexing and keyword search to ease navigation in lecture videos." Applied Imagery Pattern Recognition Workshop (AIPR), IEEE (Oct 2011), pp. 1-8. © 2011 IEEE.
dc.identifier.urihttp://hdl.handle.net/10657/1772
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectLecture videos
dc.subjectTopic-based Segmentation
dc.subjectVideo Indexing
dc.subjectMachine learning
dc.titleAutomated Lecture Video Indexing With Text Analysis and Machine Learning
dc.type.dcmitext
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TUNA-DISSERTATION-2015.pdf
Size:
9.24 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: