A Partial Statistical Model of the Green Fluorescent Protein
Since its discovery, the green fluorescent protein (GFP) and its variants have found many applications in biological research. Due to the fluorophore’s sensitivity to its environment, many of the protein’s fluorescence properties (brightness, color, sensitivity to pH, etc.) can be configured by mutating the surrounding residues. The major problem with introducing point mutations to the GFP is that the protein is highly sensitive to changes in the sequence. Here, we have developed a statistical model to learn which mutations should be introduced together in pairs. To do so, we trained a Potts model using evolutionary data for the GFP family and then performed Direct Coupling Analysis (DCA) to identify co-evolved pairs of residues. This project was completed with contributions from Lena Simine from Rice University.