New Approaches to Hierarchical Modeling — Frameworks, Algorithms, and Applications
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Obtaining hierarchical organizations of knowledge is important in many domains. To create such hierarchies, improved techniques for subdividing entities hierarchically ac-cording to similarities and differences are needed. New techniques for organizing docu-ments in hierarchies, for automatic document retrieval and for hierarchical query cluster-ing are being made available at a fast pace. In this work, we investigate new methods to induce hierarchical models with the goal of obtaining better predictive models, to facili-tate the creation of background knowledge with respect to an underlining class distribu-tion, to obtain hierarchical groupings of a set of objects based on background knowledge they share, to detect sub-classes within existing class distribution, and to provide methods to evaluate hierarchical groupings. The results of this effort has led to the development of (1) TPRTI, a new regression tree induction approach which uses turning points, candi-dates split points computed before the recursive process takes place, to recursively split the node datasets; (2) PATHFINDER, a new classification tree induction capable of in-ducing very short trees with high accuracies for the price of not classifying examples deemed difficult to classify; (3) AVALANCHE, a new hierarchical divisive clustering approach which takes as input a distance matrix and forms clusters maximizing inter-cluster distances; (4) STAXAC, a new agglomerative clustering approach which creates supervised taxonomies that unlike traditional agglomerative clustering, which only uses proximity as the single criterion for merging, uses both proximity and class labels infor-mation to obtain hierarchical groupings of a set of objects. We applied the techniques we developed, (1) to molecular phylogenetic-based taxonomy generation and found that this new approach and the obtained supervised taxonomies can help biologists better charac-terize organisms according to some characteristics of interest such as diseases, growth rate, etc.; (2) to data editing; we were able to enhance the accuracy of the k-nearest neighbor classifier by removing minority class examples from clusters that were extracted from a supervised taxonomy; (3) to meta learning; we developed new algorithms that operate on supervised taxonomies and compute both the distribution of the classes within a dataset, and the difficulty of classifying examples belonging to a particular dataset.