Data-Driven, Label Consistent, Dictionary Learning Methods for Analysis of Biological Datasets



Journal Title

Journal ISSN

Volume Title



The goal of this thesis is to develop a data-driven, label consistent, and dictionary learning based framework that can be applied on a variety of signal analysis problems. Current methods based on analytical models do not adequately take the variability within and across datasets into consideration when designing signal analysis algorithms. This variability can be added as a morphological constraint to improve the signal analysis algorithms. In particular, this work focuses on three different applications: 1) we present a method for large-scale automated three-dimensional (3-D) reconstruction and profiling of microglia populations in extended regions of brain tissue for quantifying arbor morphology, sensing activation states, and analyzing the spatial distributions of cell activation patterns in tissue; this work provided an opportunity to profile the distribution of microglia in the controlled and device implanted brain. 2) we present a novel morphological constrained spectral unmixing (MCSU) algorithm that combines the spectral and morphological cues in the multispectral image data cube to improve the unmixing quality, this work provided an opportunity to identify new therapeutic opportunities for pancreatic ductal adenocarcinoma (PDAC) from the images collected from humans; and finally, 3) we developed a framework to analyze neuronal response from electroencephalography (EEG) datasets acquired from the infants ranging from 6-24 months. We demonstrated that combining different frequency bands from different spatial locations, yields better classification results, instead of the traditional approach where either one or two frequency bands are used. Using an adaptation of Tibshirani’s Sparse Group LASSO algorithm, we uncovered different spatial and bio markers for understanding a human infant’s brain. These bio-markers can be used for developmental stages of infants and further analysis is required to study the clinical aspects of infant’s social and cognitive development. This work establishes the fundamental mathematical basis for the next generation of algorithms that can leverage the morphological cues from the biological datasets. The algorithm has been embedded into the open source FARSIGHT toolkit with an intuitive graphical user interface.



Machine learning, Image analysis, EEG analysis, Image processing


Portions of this document appear in: Megjhani, Murad, Nicolas Rey-Villamizar, Amine Merouane, Yanbin Lu, Amit Mukherjee, Kristen Trett, Peter Chong, Carolyn Harris, William Shain, and Badrinath Roysam. "Population-scale three-dimensional reconstruction and quantitative profiling of microglia arbors." Bioinformatics 31, no. 13 (2015): 2190-2198.