Clustering Individual Entities Based on Common Features



Journal Title

Journal ISSN

Volume Title



Identification of clusters in spatial or other datasets is of interest in many applications including epidemiology, medical image processing, landscape ecology, criminology, archeology, astronomy, and many other fields. In the current work, we propose a general method for clustering individual entities on the basis of a common feature for both a two- and three-dimensional spatial region. Specifically, the method is demonstrated on a dataset obtained from the resolved simulation of falling particles in upward-directed fluid flow. These simulations were conducted in a computational domain in the form of a parallelepiped with a square cross-section and aspect ratio of 3. The boundary conditions on all six boundaries enforced periodicity. The particle feature on which clustering is based is the vertical velocity. The clusters identified group particles that have a velocity larger (in modulus) than a specified multiple of the standard deviation of the vertical velocity of all the particles in the domain.

The method starts by dividing the region of interest into cells. To capture clusters that extend over several cells use is made of “masks” including many cells. The location and size of the masks are randomly generated and their number is such that each cell of the domain has an approximately equal probability to be covered by a mask. Masks are labeled as interesting if they contain a sufficient number of particles with large velocities. Counting the number of times that each cell has been covered by an interesting mask, each cell is assigned a value that is analogous to the intensity value of an image pixel. By using a global threshold, the region is binarized into high-intensity and low-intensity cells. The high-intensity cells are grouped into clusters by a method that integrates the region growing and region merging methods of digital image processing. The method is shown to work well properly accounting for the spatial periodicity of the data and to be able to track the clusters in time.



Clustering, Binarization, Region-growing