Automatic Keyword Detection for Text Summarization

dc.contributor.advisorSubhlok, Jaspal
dc.contributor.committeeMemberSolorio, Thamar
dc.contributor.committeeMemberBarr, Christopher D.
dc.creatorKoka, Raga Shalini 1994-
dc.date.accessioned2020-01-04T03:23:09Z
dc.date.createdMay 2019
dc.date.issued2019-05
dc.date.submittedMay 2019
dc.date.updated2020-01-04T03:23:09Z
dc.description.abstractLecture videos are extremely useful and great learning companions for students. The ICS (Indexed, Captioned, and Searchable) video project provides students a flexible way to navigate across the lectures by automatically dividing the lecture into topical segments. Presenting keywords to every segment can provide an overview of the content discussed in a segment and improve navigation. Identifying keywords manually requires human effort and consumes a lot of time for lecture videos that are typically an hour or longer. This thesis proposes methods to automatically detect keywords to summarize the content in a video segment. The input to the keyword detection algorithm is text from the video frames extracted by OCR, and I enhance the text with auto-correction in a post-processing pass. Automatically detecting keywords is challenging as the importance of a word depends on a variety of factors such as frequency, font size, and duration of time it is present on the screen. Other factors include relative frequency in a video segment versus the rest of the video and domain significance derived from external sources. This thesis explores how these factors contribute to the importance of a word and how they can be combined to identify good keywords. I evaluated the performance of the proposed methods by comparing the keywords generated by the algorithm with the tags chosen by experts on 121 segments of 11 videos from different departments like Computer Science, Biology, and Biochemistry. I initialized the features to different combinations of weights and computed metrics like precision, recall, F1, BLEU score, and correlation scores. I also presented an analysis of errors and different areas that can be explored to generate higher quality keywords.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10657/5768
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectText Summarization
dc.subjectLecture videos
dc.subjectKeyword Detection
dc.titleAutomatic Keyword Detection for Text Summarization
dc.type.dcmiText
dc.type.genreThesis
local.embargo.lift2021-05-01
local.embargo.terms2021-05-01
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KOKA-THESIS-2019.pdf
Size:
1.66 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.42 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: