Association Rule Mining for Risk Assessment in Epidemiology

dc.contributor.advisorVilalta, Ricardo
dc.contributor.committeeMemberLindner, Peggy
dc.contributor.committeeMemberPrice, Daniel M.
dc.contributor.committeeMemberTsekos, Nikolaos V.
dc.creatorToti, Giulia 1986-
dc.date.accessioned2017-04-10T02:45:56Z
dc.date.available2017-04-10T02:45:56Z
dc.date.createdAugust 2016
dc.date.issued2016-08
dc.date.submittedAugust 2016
dc.date.updated2017-04-10T02:45:56Z
dc.description.abstractIn epidemiology, a risk assessment measures the association between exposures and a health outcome. Risk characterization has traditionally been performed using statistical methods such as logistic regression, but such methods are not effective when working with highly correlated variables and when trying to assess synergic actions between exposures. These limitations become evident in studies related to asthma, a common chronic that affects 25 million people in the US. The prevalence of asthma is growing and research is struggling to find the reason. Many factors have been associated with causing and triggering asthma, but their interactions, as well as which one is the most responsible for the spreading of asthma, are still unclear. Outdoor air pollution is on the list of possible causes and triggers. Characterizing the connection between asthma and air pollution is not an easy task, because of high collinearity between pollutant agents, possible synergic actions, and difficulty in controlling the exposure. The research community is currently encouraging the use of multi-pollutant models to yield better results. In this dissertation we propose: (i) a modified Apriori association rule mining method for identification of connections between exposures and risk variations, and (ii) a novel genetic algorithm (GA) designed to mine risk-based quantitative association rules. Both methods were tested on a group of synthetic datasets, and on real data collection about pediatric asthma cases and pollution levels in Houston. The results on the synthetic datasets show the advantages of applying our methods to augment traditional logistic regression, and help determining the best metrics to include in the GA fitness function (odds ratio, length, repetition and redundancy). Tests on clinical data suggest the existence of a correlation between asthma and outdoor air pollutants, both alone and as a mixture. The genetic algorithm improves the results of the Apriori-based method by recognizing what appear to be the most dangerous levels of exposure. Future work will help to improve aspects of the GA such as population initialization or rule selection. To date, the proposed methods represent a significant step in the direction of risk assessment based on association rule mining in epidemiological studies.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10657/1704
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectAssociation Rule Mining
dc.subjectGenetic algorithms
dc.subjectEpidemiology
dc.subjectOutdoor Air Pollution
dc.titleAssociation Rule Mining for Risk Assessment in Epidemiology
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TOTI-DISSERTATION-2016.pdf
Size:
3.15 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: