Monday, July 8, 2019

A More Detailed Look at the t-test for Averages


  I did the last blog more carefully and got significantly improved statistics for the relative error of the test results. I also cleaned up the notation somewhat. A script "l," ℓ, is now used for the level of significance of 0.05. Subscripts are used to distinguish the various t-values. The spread of the t̃σ only depends on the spread of the x̄, the average value of x, since σ is a constant and can be computed from the t-values shown in the following plot.


The t-value t̃s also depends of the spread of the estimated standard deviation s which makes it broader so more values are rejected. The two hypotheses tested were H1: |t̃σ|≤t and H2: |t̃s|≤t which are either accepted (A) or rejected (R). The level of significance determines the t which is used in the hypotheses.


A FOR loop was added to the update macro to permit the collection of data for a fixed number of trials so one did not have to hold down the keys for the macro shortcut. It transferred the values of the set of random numbers the x column and did the updates for the decision tallies. Each pair of binary decisions for the tests can be represented by a pair of letters consisting of the letter designating the H1 decision followed by that designating the H2 decision.


The formulas used are contained in the following figure. Using n-1 for the estimated standard deviation s gives better results since dividing by n for the rms error is not quite correct.


With the FOR loop I was able to increase the number of trials to 250,000. The calculation took about 3½ hrs. The changes gave surprisingly good results for the rates. The rejection rate for hypothesis H2 involving the estimated standard deviation s agreed very well with the 5% level of significance.


No comments: