Tuesday, October 15, 2013
Confidence Intervals for the Fits and Checking Statistics
Confidence intervals are error bounds used in statistics to decide whether a given set of trials has passed or failed a test. They are deemed to pass if the number of successes are with the chosen limits and fail otherwise. Suppose we want to check N months of anomaly data to see if the number of outliers, the number of months outside the 3σ limits, is acceptable for a given fit and distribution for the data. If p is the probability that a month will be inside the interval then q = 1-p is the probability that it will fall outside. One can find p by integrating the probability function for the data between the 3σ limits. We have to be able to assign probabilities to the range of counts, k, that are possible. One can do this by using the binomial distribution. If N is large and q small the binomial distribution is difficult to calculate because of the large number of multiplications needed but the Poisson distribution is a good approximation and works quite well. Some formulas and probabilities are shown for the Poisson distribution below.
λ = qN is the average number of counts that one can expect to fall outside the 3σ bounds. We can call this event a success since it is the quantity that we are interested in and we have succeeded in observing it. One can find the expected value of some function, x, of the counts, k, by multiplying each value by its probability and adding everything together. The variance is thus the sum of the expected deviation of the counts from the expected value squared. The standard deviation is s, the square root of the variance, and its multiples are used to set the confidence intervals. Since we are working with integer counts we can't arbitrarily choose a the probability intervals we want to work with but have to calculate the probabilities for the confidence interval chosen. The probabilities above were determined for λ = 18.
Supplemental (Oct 16): The statistical checks on the fits for the global land anomaly indicate that they are consistent with the statistical model that we have chosen for them. The observed number of 3σ outliers of the combined probability distribution is near the 1s value for the Poisson distribution. We can apply the same test to the projections but the smaller numbers may require the use of the binomial distribution. The projections pass if future observations fit the same model that we used for the fit. The requirement for acceptance of a projection is that the future observations are part of the same population as the known observations.
Supplemental (Oct 16): In order to avoid confusion over which standard deviation is being referred to one should be consistent in the use of symbols for them. I have been using σ for the measure of the deviation of the anomaly observations from the smoothed 20-year average and s for the Poisson distribution of the count of the 3σ outliers hence the correction to the comment above.