Saturday, June 8, 2019

Effect of Random Errors on the LS Diagnostic Results


  Are we quibbling over minutia with the least squares diagnostic? The information may be useful for the design of experiments. For example if one wanted to measure some chemical rate constant one might try to fit a large number of data points on a line or combine the results of a number of independent researchers. The theory of errors is useful but one also has to be concerned with systematic errors.

The observed error curves for the least squares fits in the last blog were replaced with formulas for the errors but in practice they use expected values which are subject to statistical error. Here is an example of how each pair of points on the curves were determined. For the slopes and intercepts the fits were repeated a large number of times and values calculated. From the convergence of the partial sums we get an estimate of the limiting value but the actual values have a lot of variation in them. A comparison of histograms show that there is a differences in the two fits but there is also a lot of overlap in the values too.





The shift in the peaks of the curves is small compared with the spread of the observed results. The histograms above were not precise enough to compute a mean value for the distributions or the comparison of results. The convergence curves more clearly showed the discrepancy although one can see that the two histograms are shifted by about Δs=0.006.

Supplemental (Jun 8): Note the values for the slope and intercept indicated above are the batch averages for 20 fits. One would expect the spread for individual fits to be about 4.5 times greater and the spread for the set averages to be reduced by a factor of 5.5. So it appears that the discrepancy for random errors in the data less than 5% is negligible. The diagnostic just allowed us to get the formulas for the expected values for slope and intercept as a function of the random error in the data.

Supplemental (Jun 8): books on precision measurements and systematic errors

1897  Holman - Discussion of the Precision of Measurements
1969  NBS - Precision Measurements and Calibration: Statistical Concepts and Procedures

Supplemental (Jun 9): I used an index to tag the x,y data sets for the batches and ended up confusing the number of points per line (20) with the number of lines per batch (25) in this and some preceding posts. The increase in the spread in the histograms above for those of individual line fits would therefore be √25=5 instead of 4.5. I caught the error while running a check to verify that the fit formulas returned the original slope and intercept for exact data points and looking for other errors in the spreadsheet formulas. The data below includes the expected values and standard deviations which would be hidden values if there was error present.


No comments: