New Approaches to Hierarchical Modeling — Frameworks, Algorithms, and Applications

dc.contributor.advisorEick, Christoph F.
dc.contributor.committeeMemberVilalta, Ricardo
dc.contributor.committeeMemberShi, Weidong
dc.contributor.committeeMemberShah, Shishir Kirit
dc.contributor.committeeMemberCooper, Timothy F.
dc.creatorAmalaman, Paul K. 1966-
dc.date.accessioned2019-09-19T02:19:03Z
dc.date.available2019-09-19T02:19:03Z
dc.date.createdDecember 2015
dc.date.issued2015-12
dc.date.submittedDecember 2015
dc.date.updated2019-09-19T02:19:04Z
dc.description.abstractObtaining hierarchical organizations of knowledge is important in many domains. To create such hierarchies, improved techniques for subdividing entities hierarchically ac-cording to similarities and differences are needed. New techniques for organizing docu-ments in hierarchies, for automatic document retrieval and for hierarchical query cluster-ing are being made available at a fast pace. In this work, we investigate new methods to induce hierarchical models with the goal of obtaining better predictive models, to facili-tate the creation of background knowledge with respect to an underlining class distribu-tion, to obtain hierarchical groupings of a set of objects based on background knowledge they share, to detect sub-classes within existing class distribution, and to provide methods to evaluate hierarchical groupings. The results of this effort has led to the development of (1) TPRTI, a new regression tree induction approach which uses turning points, candi-dates split points computed before the recursive process takes place, to recursively split the node datasets; (2) PATHFINDER, a new classification tree induction capable of in-ducing very short trees with high accuracies for the price of not classifying examples deemed difficult to classify; (3) AVALANCHE, a new hierarchical divisive clustering approach which takes as input a distance matrix and forms clusters maximizing inter-cluster distances; (4) STAXAC, a new agglomerative clustering approach which creates supervised taxonomies that unlike traditional agglomerative clustering, which only uses proximity as the single criterion for merging, uses both proximity and class labels infor-mation to obtain hierarchical groupings of a set of objects. We applied the techniques we developed, (1) to molecular phylogenetic-based taxonomy generation and found that this new approach and the obtained supervised taxonomies can help biologists better charac-terize organisms according to some characteristics of interest such as diseases, growth rate, etc.; (2) to data editing; we were able to enhance the accuracy of the k-nearest neighbor classifier by removing minority class examples from clusters that were extracted from a supervised taxonomy; (3) to meta learning; we developed new algorithms that operate on supervised taxonomies and compute both the distribution of the classes within a dataset, and the difficulty of classifying examples belonging to a particular dataset.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.citationPortions of this document appear in: Amalaman, Paul K., Christoph F. Eick, and Nouhad Rizk. "Using turning point detection to obtain better regression trees." In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 325-339. Springer, Berlin, Heidelberg, 2013. And in: Amalaman, Paul K., and Christoph F. Eick. "Avalanche: A hierarchical, divisive clustering algorithm." In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 296-310. Springer, Cham, 2015. And in: Amalaman, Paul K., and Christoph F. Eick. "HC-edit: A hierarchical clustering approach to data editing." In International Symposium on Methodologies for Intelligent Systems, pp. 160-170. Springer, Cham, 2015.
dc.identifier.urihttps://hdl.handle.net/10657/4888
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectDecision trees
dc.subjectRegression tree
dc.subjectClassification tree
dc.subjectSupervised taxonomy
dc.subjectHierarchical clustering
dc.titleNew Approaches to Hierarchical Modeling — Frameworks, Algorithms, and Applications
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AMALAMAN-DISSERTATION-2015.pdf
Size:
3.96 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: