Machine Learning and Network-driven Approaches for Risk Genes Discovery



Journal Title

Journal ISSN

Volume Title



There is strong evidence that many complex diseases such as mental disorders are linked to genetic variants. Thus, it is crucial to identify the causal genes or variants contributing to disease onset to advance the understanding of disease pathology and inform better treatment. Due to the complexity of the human genome only a few causal genes or variants have been identified. To enhance the disease-associated risk genes discovery in complex diseases, this thesis aims to develop network-based and machine learning methods for explicitly capturing the cell-type-specific gene interactions and integrating these interactions with existing gene-disease association evidence for risk gene prioritization. To achieve this goal, this thesis proposed several innovative techniques. Firstly, a multimodal deep learning model was developed to integrate multi-source and multi-structure data including single-cell gene expression and global gene interactions for predicting cell-type specific gene networks. The effectiveness of the proposed method was demonstrated by comparing its prediction performance with baseline models and downstream analysis for risk gene discovery. Secondly, a supervised machine learning approach was employed to integrate various genomic features and cell-type-specific gene networks’ topological information to prioritize disease risk genes. The method was employed to prioritize autism risk genes and results demonstrated that our gene ranking system provides a useful resource for prioritizing autism candidate genes. Thirdly, an unsupervised ensemble learning model was developed to combine the multisource correlated disease-gene association scores for risk gene discovery. The results of artificial and real datasets demonstrated that the proposed method can efficiently integrate individual scores to an ensembled score without the need of ground truth data. The supervised methods proposed in this thesis can be applied to different complex human disease risk genes discovery problems and be effective to find more novel disease-associated risk genes using the ground truth information. Furthermore, some of the models developed in this thesis are unsupervised methods which are applicable to the problems that the ground truth information is very limited or is not available.



Risk genes discovery, Network-based approaches, Machine learning


Portions of this document appear in: Afshar, Shiva, Patricia R. Braun, Shizhong Han, and Ying Lin. "A multimodal deep learning model to infer cell-type-specific functional gene networks." BMC bioinformatics 24, no. 1 (2023): 47; and in: Lin, Ying, Shiva Afshar, Anjali M. Rajadhyaksha, James B. Potash, and Shizhong Han. "A machine learning approach to predicting autism risk genes: Validation of known genes and discovery of new candidates." Frontiers in genetics 11 (2020): 500064.