Evaluating Machine Learning Approaches for Structural Genomics

Cheung, Margaret S.Pickett, Jonathan2019-01-032019-01-032018-10-18http://hdl.handle.net/10657/3792Modern molecular biology produces large amounts of data, which can be difficult to derive any useful information from. We are investigating correlations that exist between genetic annotations of human DNA and chromosome structural features. Chromatin Immuno-Precipitation Sequencing(ChIP-Seq) data tracks, made available through the ENCODE project, characterize the biochemical nature of chromosomal loci. Chromatin can be categorized into types that we call type A and type B which we further classify into chromatin sub-types(A1, A2, B1, B2, and B3). It has been previously shown that these chromatin structural types are strongly related to the overall genome architecture of cells. Machine learning algorithms have proven to be especially adept at “learning” from correlations in very large data sets. We constructed a number of machine learning models and tested how accurately each performed when identifying chromatin sub-types. Our best approach so far is a recurrent neural network which produced a total error of less than 28% when classifying chromatin sub-types.en-USThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).Evaluating Machine Learning Approaches for Structural GenomicsPoster