A statistical comparison of three methods of data preparation for normal probability plotting



Journal Title

Journal ISSN

Volume Title



The practicing engineer is beginning to recognize more and more the value of, and to utilize, a statistical approach to the solution of many of his problems. One simple technique that has found considerable acceptance is that of probability plotting. This technique is essentially a means of graphically fitting observed data to a distribution form by the use of specially constructed graphing papers. The Normal frequency distribution has been found to be applicable to so many types of information that it becomes the initial try in most cases and, for that reason, is used as a basis for this investigation. In preparing data for probability plotting, the real unknown is the location of the probability coordinates on the cumulative percentile scale. Several formulae have been proposed for determining the plotting position of the individual data points from their rank*position within the data set. In the present study, three of these are examined empirically to determine if any one of them is statistically more desirable than the others. Data for the study was obtained by drawing chips (with replacement) from an urn containing approximately one thousand chips forming a Normal distribution of mean zero and unit standard deviation. The numbers drawn were arranged to form fifty samples of ten each, and twenty samples of twenty-five each. By the method of least squares, each method was used with each data set to form a linear estimating equation. The slope of the equation becomes, In each case, an estimate of the universe standard deviation, while its intercept forms an estimate of the universe mean. The standard error of estimate and the correlation coefficient for each equation was also examined. Final data examined was, in most cases, the absolute difference between the known values of the universe and the replicated estimates from the regression equations. These differences were tested statistically using non-parametric techniques to avoid distribution effects. The relative ability of the three methods to predict extrapolated values was examined by computing values of the variable at five standard deviations for each regression equation. Again, these are known and the absolute differences were tested statistically. The details of the study form the body of the thesis with the resulting conclusion that Method B be recommended for use in probability plotting.