Automatic Keyword Detection for Text Summarization
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Lecture videos are extremely useful and great learning companions for students. The ICS (Indexed, Captioned, and Searchable) video project provides students a flexible way to navigate across the lectures by automatically dividing the lecture into topical segments. Presenting keywords to every segment can provide an overview of the content discussed in a segment and improve navigation. Identifying keywords manually requires human effort and consumes a lot of time for lecture videos that are typically an hour or longer. This thesis proposes methods to automatically detect keywords to summarize the content in a video segment. The input to the keyword detection algorithm is text from the video frames extracted by OCR, and I enhance the text with auto-correction in a post-processing pass. Automatically detecting keywords is challenging as the importance of a word depends on a variety of factors such as frequency, font size, and duration of time it is present on the screen. Other factors include relative frequency in a video segment versus the rest of the video and domain significance derived from external sources. This thesis explores how these factors contribute to the importance of a word and how they can be combined to identify good keywords. I evaluated the performance of the proposed methods by comparing the keywords generated by the algorithm with the tags chosen by experts on 121 segments of 11 videos from different departments like Computer Science, Biology, and Biochemistry. I initialized the features to different combinations of weights and computed metrics like precision, recall, F1, BLEU score, and correlation scores. I also presented an analysis of errors and different areas that can be explored to generate higher quality keywords.