New Approaches to Hierarchical Modeling — Frameworks, Algorithms, and Applications

Date

2015-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Obtaining hierarchical organizations of knowledge is important in many domains. To create such hierarchies, improved techniques for subdividing entities hierarchically ac-cording to similarities and differences are needed. New techniques for organizing docu-ments in hierarchies, for automatic document retrieval and for hierarchical query cluster-ing are being made available at a fast pace. In this work, we investigate new methods to induce hierarchical models with the goal of obtaining better predictive models, to facili-tate the creation of background knowledge with respect to an underlining class distribu-tion, to obtain hierarchical groupings of a set of objects based on background knowledge they share, to detect sub-classes within existing class distribution, and to provide methods to evaluate hierarchical groupings. The results of this effort has led to the development of (1) TPRTI, a new regression tree induction approach which uses turning points, candi-dates split points computed before the recursive process takes place, to recursively split the node datasets; (2) PATHFINDER, a new classification tree induction capable of in-ducing very short trees with high accuracies for the price of not classifying examples deemed difficult to classify; (3) AVALANCHE, a new hierarchical divisive clustering approach which takes as input a distance matrix and forms clusters maximizing inter-cluster distances; (4) STAXAC, a new agglomerative clustering approach which creates supervised taxonomies that unlike traditional agglomerative clustering, which only uses proximity as the single criterion for merging, uses both proximity and class labels infor-mation to obtain hierarchical groupings of a set of objects. We applied the techniques we developed, (1) to molecular phylogenetic-based taxonomy generation and found that this new approach and the obtained supervised taxonomies can help biologists better charac-terize organisms according to some characteristics of interest such as diseases, growth rate, etc.; (2) to data editing; we were able to enhance the accuracy of the k-nearest neighbor classifier by removing minority class examples from clusters that were extracted from a supervised taxonomy; (3) to meta learning; we developed new algorithms that operate on supervised taxonomies and compute both the distribution of the classes within a dataset, and the difficulty of classifying examples belonging to a particular dataset.

Description

Keywords

Decision trees, Regression tree, Classification tree, Supervised taxonomy, Hierarchical clustering

Citation

Portions of this document appear in: Amalaman, Paul K., Christoph F. Eick, and Nouhad Rizk. "Using turning point detection to obtain better regression trees." In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 325-339. Springer, Berlin, Heidelberg, 2013. And in: Amalaman, Paul K., and Christoph F. Eick. "Avalanche: A hierarchical, divisive clustering algorithm." In International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 296-310. Springer, Cham, 2015. And in: Amalaman, Paul K., and Christoph F. Eick. "HC-edit: A hierarchical clustering approach to data editing." In International Symposium on Methodologies for Intelligent Systems, pp. 160-170. Springer, Cham, 2015.