Saturday, December 13, 2014
The problem that we encountered fitting the distribution for the standard deviation is a systematic error due to the fact that the Theory of Errors does not work out exactly because the estimates of the moments are off. Bessel's correction is needed for the standard deviation and we saw that a second correction was needed for the standard deviation's own distribution function. One can plot the residuals for the original plot, the corrected plot and plot with the standard deviation calculated using its known mean.
The original plot shows evidence of a shift of the standard deviation distribution to left while the correction has greater standard deviations towards the sides and lower standard deviations near the peak as already mentioned. The plot using the known mean of the standard deviation shows a non-uniform distribution with greater fluctuations at the center of the plot. The probability distribution is responsible for reducing the counts on the sides.
Wednesday, December 10, 2014
The two extra degrees of freedom in the standard deviation discussed in the last blog appear to be due to the use of the average of the xk in the datasets for the computation of the observed values. If one uses the known value, μx, instead the observed and calculated curves fit quite well with n as the number of degrees of freedom. The observed values shift to the left when the averages are used to estimate the distribution mean.
Monday, December 8, 2014
One can also look more closely at the error distribution for the standard deviation by generating m datasets of n random numbers for a known probability distribution and compute a standard deviation for each dataset.
The computed distribution assuming the standard deviation of the sum is reduced by the square root of n gives a calculated distribution that is slightly offset from the observed distribution.
Using Bessel's correction and assuming two additional degrees of freedom gives a better fit to the observed distribution. The observed distribution appears to have a slightly different shape with a small reduction in the peak value and slightly larger values at the sides sides indicating a broadening of the peak. The sum of the probabilities for histogram intervals in both cases is 1 as expected. One has to use a large number of datasets to notice the differences.
Student published a paper on The Probable Error of a Mean in 1908 in the journal Biometrika. In it he discusses the definition of mean and standard deviation and the distribution of their errors. It is based on Airy's Theory of Errors which contains a discussion of the error for the sum of two random numbers. So what has been presented in these blogs makes use of Airy's approach to the problem. The use of n-1 instead of n in the formula for the standard deviation is known as Bessel's correction.
It is necessary to make two assumptions in order to derive formulas for the mean, μ, and standard deviation, σ. Assuming that a set of random numberss has a mean value and a purely random component imposes a constraint on possible values for them. If we take the average data set, xk, it will be equal to μ plus σ times the average of the z-values. We have two unknowns and the average of the z-values which is subject to some error. The same is true for the mean square of the xk. Both assumptions yield and equation with an unknown sum involving z-values, Eqns (2) and (3) below.
For a normal distribution of errors the expected values of the sum of the z-values and their squares are approximately 0 and n. Making these substitutions we get the usual formulas for the mean and standard deviations. Near the peak the sum of the squares is approximately n-1 instead of n and we get Bessel's correction for the standard deviation.
Using random numbers to check of these two formulas we see that the average for Bessel's correction is closer to chosen value for the standard deviation but its variation is slightly larger.
Friday, December 5, 2014
Determining the mean and standard deviation a set of measurements is very difficult to do exactly. One has to make an additional assumption in order to get an estimate of the mean but that adds a little more error to the measurement errors. The assumption used in least squares is that variance V, the sum of the square of the errors, is a minimum. This gives the average as the best estimate of μ. Once we have μ we can then estimate the errors which in turn the variance and standard deviation, σ.
I generated 2000 sets of 20 random numbers to test the formulas used in statistics and the procedure gives the mean, standard deviation and z-values. We also find that the root mean square of the z-values for each set of numbers is exactly 1. Using n-1 in the denominator of the standard deviation formula causes this to deviate slightly. Setting the rms z value equal to 1 is another starting point for the determination of μ and σ.
Each set of numbers will have an mean and standard deviation and there is a little variation among the results for the 2000 data sets but the over-all average is close to the chosen values for μ and σ. The rms variation in the mean is approximately equal to the standard deviation of x divided by the square root of n, the number of values in each data set.
Using the theory of errors from in the last couple of blogs we can calculate the first two moments and standard deviation for the data sets. The σ standard deviation is very close to the μ standard deviation divided by √2 as predicted.
Here is a comparison of the 2000 σ standard deviations with the two estimates of σ.
Wednesday, December 3, 2014
In the last blog we found two values for the estimate of σ, the expected value, μ1, and the rms error, √μ2 = √n, and we have to ask which is the best choice. The expected value is the mean that one would get for an estimate of the error but it has some uncertainty, sd, associated with it. If we want to combine these two uncertainties we have to add their squares and the result is the second moment, μ2 = n.
If the sum involves a short string of numbers then we are most likely to random values for z near the peak of the distribution where
zpeak = √n-1.
The rms error, s, for the sum is then equal to σ√n-1. We can turn things around to get an estimate of σ for the probability distribution from that of s and we conclude that σ = s/√n-1.
Monday, December 1, 2014
A scientist like any other witness can be impeached for using poor practices. So it is important for him to maintain his reputation for accuracy. When one publishes the results of an experiment it is customary to include the error bounds for numerical estimates. Averaging a number of trials gives a better estimate of an observation and involves the sum of a number of individual observations, xk. What error error would one expect for this sum? The formula that one usually uses is that the error for the sum is equal σ√n, where σ is the standard deviation for xk. The following proof is based on the expected values involved. The xk are assumed to be equal to the mean value, μ, plus a random error which can be expressed in terms of z-values.
The square of the difference between the sum and its mean value, nμ, involves a sum of the products of the individual z-values which can be split up into parts where the indices j and k are equal and unequal. The term that is crossed out makes a small contribution to the sum and on average is zero since the two z-values are uncorrelated.
A more sophisticated analysis makes better use of probability theory. We start by determining the probability distribution for the sum of two and three trials assuming normal distributions for each of the random variables and we need only consider the sum of the z-values, x + y and x + y + z. The square of the difference between the sum and its mean involves a sum of squares which is also found in the exponential of the joint probability distribution and so we can just consider the distribution for the radial length of a vector whose components are the z-values involved in the sum.
Notice that the area and volume contribute powers of r to the joint probability distribution and for n terms in the sum the contribution of the
To test these formulas we can we can generate two sets of random numbers, create a set of sums and observe the resulting distributions. The formulas give a very good fit for the generated data. The mean value for r (or z below) is slightly less than √n.
For a sum of more numbers we can combine a new set of random numbers with the previous sum. For the sum of three numbers we get:
Continuing the process we get for the sum of 10 numbers:
We got n for the sum of the squares vindicating the previous result. Note that the joint probability functions are similar in shape to the Poisson distribution and may be considered continuous analogs. The standard deviation of the mean values is small and levels off after the sum of about 20 numbers and becomes approximately equal to √2/2.
Supplemental (Dec 2): The definite integral above is evaluated in Reif, Fundamentals of Statistical and Thermal Physics (p. 608). It is also found in Boltzmann, Lectures on Gas Theory (p. 64).
Edit (Dec 3): The differential elements are surface lamina of nD hyperspheres.