Evaluating Machine Learning Approaches for Structural Genomics

Pickett, Jonathan

Evaluating Machine Learning Approaches for Structural Genomics

Files

Pickett_Jonathan_2018URD.pdf (6.83 MB)

Date

2018-10-18

Authors

Pickett, Jonathan

Abstract

Modern molecular biology produces large amounts of data, which can be difficult to derive any useful information from. We are investigating correlations that exist between genetic annotations of human DNA and chromosome structural features. Chromatin Immuno-Precipitation Sequencing(ChIP-Seq) data tracks, made available through the ENCODE project, characterize the biochemical nature of chromosomal loci. Chromatin can be categorized into types that we call type A and type B which we further classify into chromatin sub-types(A1, A2, B1, B2, and B3). It has been previously shown that these chromatin structural types are strongly related to the overall genome architecture of cells. Machine learning algorithms have proven to be especially adept at “learning” from correlations in very large data sets. We constructed a number of machine learning models and tested how accurately each performed when identifying chromatin sub-types. Our best approach so far is a recurrent neural network which produced a total error of less than 28% when classifying chromatin sub-types.

URI

http://hdl.handle.net/10657/3792

Collections

Undergraduate Research Day Student Projects

Full item page

Evaluating Machine Learning Approaches for Structural Genomics

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections