Automated Lecture Video Indexing With Text Analysis and Machine Learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Videos recorded during in-class teaching and made accessible online are a versatile resource on par with a textbook and the classroom itself. Nonetheless, the adoption of lecture videos has been limited, in large part due to the difficulty of quickly accessing the content of interest in a long video lecture. Video indexing, dividing the video into meaningful segments, can significantly improve the accessibility. In this work, we present automatic text-based approaches and machine learning for indexing lecture videos to provide topic-based segmentation. Various text-based indexing algorithms were developed to identify topic transition in video. The indexing algorithms merge neighboring video segments with high text similarity to form topic segments which are represented by index points. In general, it is not clear which feature in a video slide is important for detecting topic change. Therefore, we propose another video indexing approach using machine learning which can use all possible features such as the number of words in a slide, n-grams, title or text with large font size. Among the state of the art machine learning algorithms, ensemble models such as Random Forest and Bagging were found efficient and practical to use. They also provide probability distributions which enables the user to choose a desired number of index points. Evaluation was done on a set of twenty-five lecture videos from courses in Computer Science, Biology, and Earth and Atmospheric Science. The ground truth is established by asking the lecture instructor to manually identify topic transitions in the video. Information gain experiment with machine learning shows that the words with large font size, the words that appear in the video for the first time, and n-gram frequency differences between video slides are important features for identifying the topic transitions in a lecture video. Experimental results shows that text-based indexing provides significant improvement over non-text-based approach and indexing with machine learning provides approximately 80% indexing accuracy on average. An important observation was that, there are significant differences when the topics are manually identified by multiple users who are very familiar with the content. Although further enhancements could improve the performance of video indexing, the performance gains are not expected to reach the ideal output because of the uncertain nature of the ground truth.