Saturday, July 6, 2019

The t-test for Averages


  One can use the Excel t-test functions to check a sample of data to see if it is representative of a given population with known mean and standard deviation. Let's say we have a sample drawn from a population with a normal distribution of errors and with mean μ=0 and standard deviation σ=20 for which we want to check the average x̄ using a significance level SL=0.05.


For the data above we get x̄=-5.7149 and an estimated standard deviation s=18.2921. The estimate, s, is just the rms error for the sample. The t-value for the sample computed using σ is t̃(σ)=n̄|x̄-μ|/σ and if σ is unknown we can use its estimate s to get t̃(s)=n̄|x̄-μ|/s.


For SL=0.05 we find a 2-tails limiting value of t̃(SL)=T.INV.2T(SL,n-1)=2.2010. To check x̄ we use the "null hypothesis" t̃(s)<t̃(SL). If this is true we accept the sample and if not we reject it. Using t̃(σ)<t̃(SL) we get a more accurate test.

We can repeat the process by again using an update macro with shortcut ctrl-shft-U to transfer the values of a set of random numbers to the x column and evaluate the resulting t-values. Either both t-values are accepted or rejected or we get a Type I error when the t̃(s) value rejects an acceptable t̃(σ) value or we get a Type II error when a t̃(s) value accepts an unacceptable t̃(σ) value. We can record a set of macros to keep a tally on the outcomes for a set of trials. The trick for the tally macros is to augment a count in the cell below it, paste the augmented count in the original cell and then clear the contents of the second cell. We can then use the statistics collected to estimates the rates at which each event occurs.


Note that the sum of the rejected rates is approximately equal to the significance level α so the test doesn't take the mistakes into account.

Supplemental (Jul 6): I rewrote the update macro to increment the tallies in addition to transferring the random values to the x column. Now one just has to hold down ctrl-shft-U to increase the number of trials so there's less chance of human error but it can be rather tedious. With 10,000 trials one would expect the error in the rates to be about 1 percent. Here are the results for two runs.



Edit (Jul 7): Changed the variable α used to represent the significance level to SL to avoid possible confusion with other uses.

No comments: