Mass spectrometry data mining for cancer detection

dc.contributor.advisorAzencott, Robert
dc.contributor.committeeMemberJosić, Krešimir
dc.contributor.committeeMemberNicol, Matthew
dc.contributor.committeeMemberHawke, David
dc.creatorKong, Ao 1984-
dc.date.accessioned2016-02-20T23:16:11Z
dc.date.available2016-02-20T23:16:11Z
dc.date.createdDecember 2013
dc.date.issued2013-12
dc.date.updated2016-02-20T23:16:11Z
dc.description.abstractEarly detection of cancer is crucial for successful intervention strategies. Mass spectrometry-based high throughput proteomics is recognized as a major breakthrough in cancer detection. Many machine learning methods have been used to construct classifiers based on mass spectrometry data for discriminating between cancer stages, yet, the classifiers so constructed generally lack biological interpretability. To better assist clinical uses, a key step is to discover ”biomarker signature profiles”, i.e. combinations of a small number of protein biomarkers strongly discriminating between cancer states. This dissertation introduces two innovative algorithms to automatically search for a signature and to construct a high-performance signature-based classifier for cancer discrimination tasks based on mass spectrometry data, such as data acquired by MALDI or SELDI techniques. Our first algorithm assumes that homogeneous groups of mass spectra can be modeled by (unknown) Gibbs distributions to generate an optimal signature and an associated signature-based classifier by robust log-likelihood analysis; our second algorithm uses a stochastic optimization algorithm to search for two lists of biomarkers, and then constructs a signature-based classifier. To support these two algorithms theoretically, this dissertation also studies the empirical probability distributions of mass spectrometry data and implements the actual fitting of Markov random fields to these high-dimensional distributions. We have validated our two signature discovery algorithms on several mass spectrometry datasets related to ovarian cancer and to colorectal cancer patients groups. For these cancer discrimination tasks, our algorithms have yielded better classification performances than existing machine learning algorithms and in addition,have generated more interpretable explicit signatures.
dc.description.departmentMathematics, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10657/1208
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectMass spectrometry
dc.subjectData mining
dc.subjectMachine learning
dc.subjectCancer detection
dc.subjectGibbs distribution
dc.titleMass spectrometry data mining for cancer detection
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentMathematics, Department of
thesis.degree.disciplineMathematics
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KONG-DISSERTATION-2013.pdf
Size:
1.87 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 4 of 4
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE_1.txt
Size:
1.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE_2.txt
Size:
1.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE_3.txt
Size:
1.84 KB
Format:
Plain Text
Description: