Browsing by Author "Freeman, Keegan"

Now showing 1 - 2 of 2

Augmented Intelligence Approach To Educational Data Mining: Student Drop Prediction
(2021-05) Freeman, Keegan
Educational Data Mining (EDM) and Augmented Intelligence (AUI) are two upcoming fields in the machine learning research industry. EDM refers to the use of machine learning elements in an educational format. Typically, this is in the form of utilizing educational data to better understand the learning process. Augmented Intelligence, on the other hand, is a niche of machine learning that refers to people taking a much larger role than typical in artificial intelligence projects. For example, a professional in a given field may provide better insight as to what metrics should be weighed more when considering a given prediction. In this thesis, I review the feasibility of using Augmented Intelligence in the genre of Educational Data Mining to predict the likelihood of a student dropping a course based on demographic, study habit, and student perception information recorded through a survey. Additionally, I will be testing three optimization algorithms to see which is most beneficial in the application of this research. The goal of this research is to ultimately provide instructors with a machine learning model capable of highlighting at risk students such that the instructor can provide intervention techniques in a more timely fashion.
Linguistic Analysis, Data Mining, and Clustering to Predict Document Age
(2019) Freeman, Keegan
The goal of this research is to interpret enough linguistic data to provide a basis for which clustering, and time prediction can be performed. With knowledge of how linguistic structure changes over time, conclusions can be drawn about how communication has and will continue to change. The data collected throughout the 250 raw text sources is as follows: word stem tally, word stem percentage, part of speech tally, part of speech percentage, scored part of speech collocations through both bigrams and trigrams, and average sentence length. Certain suitable metrics were chosen from this linguistic data totaling 66 dimensions for each source “point.” From these 66 point dimensions, filtering was performed to deem which chosen metrics are unique and therefore will be proper indicators of a source’s time-period. This filtering process was performed using an algorithm that determined which metrics over the given time period have a slope within a 1.15 ratio of one another for 90% of the time. Metrics with similar slopes are redundant and therefore their dimensions can be removed from the source points as well as those whose slopes are 0. Using the elbow method and clustering, each source point is designated to a given cluster which should represent various time-spans between 1900 and 2000. With this data, a confusion matrix can be displayed to indicate the success in correctly identifying a sources individual time span. Due to the limited corpora used, prediction beyond precision of an indiscriminate projection has not yet been achieved.