Monday, December 1, 2014

Expected Error For the Sum of a Number of Trials


  A scientist like any other witness can be impeached for using poor practices.  So it is important for him to maintain his reputation for accuracy. When one publishes the results of an experiment it is customary to include the error bounds for numerical estimates. Averaging a number of trials gives a better estimate of an observation and involves the sum of a number of individual observations, xk. What error error would one expect for this sum? The formula that one usually uses is that the error for the sum is equal σ√n, where σ is the standard deviation for xk. The following proof is based on the expected values involved. The xk are assumed to be equal to the mean value, μ, plus a random error which can be expressed in terms of z-values.


The square of the difference between the sum and its mean value, nμ, involves a sum of the products of the individual z-values which can be split up into parts where the indices j and k are equal and unequal. The term that is crossed out makes a small contribution to the sum and on average is zero since the two z-values are uncorrelated.

  A more sophisticated analysis makes better use of probability theory.  We start by determining the probability distribution for the sum of two and three trials assuming normal distributions for each of the random variables and we need only consider the sum of the z-values, x + y and x + y + z.  The square of the difference between the sum and its mean involves a sum of squares which is also found in the exponential of the joint probability distribution and so we can just consider the distribution for the radial length of a vector whose components are the z-values involved in the sum.




Notice that the area and volume contribute powers of r to the joint probability distribution and for n terms in the sum the contribution of the spatial unit space's differential element will be some constant times rn-1 which can be determined from the condition that the integral of the probability density is 1. Having found the probability function we can determine formulas the 1st and 2nd moments and the standard deviation for r.


To test these formulas we can we can generate two sets of random numbers, create a set of sums and observe the resulting distributions. The formulas give a very good fit for the generated data. The mean value for r (or z below) is slightly less than √n.



For a sum of more numbers we can combine a new set of random numbers with the previous sum. For the sum of three numbers we get:



Continuing the process we get for the sum of 10 numbers:



We got n for the sum of the squares vindicating the previous result. Note that the joint probability functions are similar in shape to the Poisson distribution and may be considered continuous analogs. The standard deviation of the mean values is small and levels off after the sum of about 20 numbers and becomes approximately equal to √2/2.

Supplemental (Dec 2): The definite integral above is evaluated in Reif, Fundamentals of Statistical and Thermal Physics (p. 608). It is also found in Boltzmann, Lectures on Gas Theory (p. 64).

Edit (Dec 3): The differential elements are surface lamina of nD hyperspheres.

No comments: