Aerosol Data Modeling & Similarity Assessment – a Probabilistic Approach
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Bio-threat detection is a problem of paramount importance to the modern society. One of the major requirements of a system that could detect such threats is to have a low false alarm rate. An important reason behind false alarms is the inability of the system to take into consideration the ambient aerosol background of the location under consideration. This is because aerosol backgrounds typically differ across locations—one agent may be naturally present in one location but might be unusual for some other location. Although work has been done in the past to characterize aerosol backgrounds of certain locations, there is no general algorithm available that creates an aerosol background model at a particular location. This work centers on the design and implementation of a Sensor Modeling Toolbox. The toolbox assumes that the sensor data comes from a Gaussian mixture distribution; therefore, it uses Gaussian mixture models (GMM) to model the sensor data. The Expectation Maximization algorithm is used to estimate the parameters of the GMM and the Bayesian Information Criterion (BIC) is used to select the best model by considering both the log-likelihood and model complexity (number of parameters to be estimated) of the GMM. The toolbox provides various functionalities, the major ones being to create a model for a set of sensor observations, to generate a model for the aerosol background at a particular location, and to assess similarity between two sets of sensor readings by introducing two novel distance functions. The functionalities of the toolbox are evaluated on two real-world datasets that were obtained from aerosol sensors deployed on the campus of the University of Houston. Due to the unavailability of labeled data, this work assesses the quality of the obtained results by comparing them with the characteristics of the raw-input data and shows that they are in good agreement. It is also observed that it is usually better to remove outliers from datasets before creating a GMM. The experimental results demonstrate that the toolbox is useful to model and analyze aerosol data and that it provides important capabilities for building a successful bio-threat detection system in the future.