Evaluation of Speech And Text-Based Indexing For Classroom Lecture Videos
Lecture videos are useful and great learning resources. At the University of Houston, videos are widely used throughout departments within the College of Natural Sciences and Mathematics such as Computer Science, Biology and Biochemistry, Earth and Atmospheric Sciences, etc. Since most videos are very long, it is difficult to directly access the required topic within a video. The ICS (indexed, captioned, and searchable) videos project provides students direct access to a topic within video lectures by providing index points representing the topic. These index points are generated using text from the extracted images using OCR (optical character recognition) technology. Index points are assigned with the assistance of an indexing algorithm that determines topic change based on text similarity. We present a topic-based lecture video segmentation using speech text/captions. The purpose of this thesis is to utilize the spoken text of a lecture video to assign index points using an underlying text-based indexing algorithm. To achieve this goal, a set of twenty-five lecture videos was taken from various departments at the University of Houston and Coursera website. The captions were produced with the assistance of the YouTube Speech Recognition System. The performances and limitations of OCR text, uncorrected/original speech text, and corrected speech text-based indexing was analyzed. The results indicate that slide text-based indexing yields 4% better results than spoken text-based indexing. The corrected speech text/caption provides better indexing results (11%) where OCR text fails to perform and the results closely matched the ground truth. The error analysis done on speech texts and slide texts prove that poor OCR text and caption quality are some of the main issues that hamper indexing accuracy.