Cluster Validation of WHAN Galaxy Classification Using a Novel Approach To External Cluster Validation



Journal Title

Journal ISSN

Volume Title



The classification of galaxies is traditionally carried out using human-eye analysis of morphology or through information provided by a large survey of galaxies. Clustering methods can reduce the effort of manual classification by automating this process. Out of all the different properties available for galaxy classification, classification based on emission-line spectra is among the easiest to carry out. Once we have clustering output, it is important to evaluate it. Cluster validation involves computing statistics over the clustering structure to derive an estimate of how good the clustering is. When performed using only clustered data points, cluster validation is said to be internal. When an independent external classification scheme is compared to the clustering result, it is called external cluster validation. The disadvantage with using traditional cluster validation metrics is the lack of a probabilistic model. Traditional cluster validation metrics output the average of the similarities obtained between clusters and classes. The novelty exhibited by the individual clusters with classes can be lost when an average is taken over all the similarity values. Our method for external cluster validation computes the separation between individual clusters and its estimated external class by projection of individual clusters and classes onto a dimension which preserves the discriminatory information in the original feature space. Our method uses a probabilistic approach to calculate the cluster separation. This method provides a better understanding of how individual clusters are similar or dissimilar to their external classification.

The Sloan Digital Sky Survey Dataset (SDSS) was used to evaluate our algorithm. The external classification scheme used is the WHAN classification system. We can derive clusters similar to at least one of the external classes. The similarity between clusters and two external classes of galaxies can be explained by domain knowledge, which are classes which can have overlapping properties. The structure derived by the clustering algorithm is supported by the numerical experiments.



External cluster validation, Machine learning