Friday, April 28, 2017
I've got a better handle on the error bounds for the fit of a trapazoidal distribution to the deviations of the time of the Equinox. Here's a review and a corrected plot with error bounds showing the expected deviation from expected value for δt.
The error bounds used the values of the estimate of the deviation, δf*, for the probability densities, obs_f*, for the intervals in the table above. One can get a better understanding of what the error bounds mean by looking at the expected relative frequency, fi=ni/n, for the intervals chosen. The x values indicate the center of the interval. Using the values of a and b for the fitted trapazoidal distribution we can compare the observed counts with the expected counts, k=nf and their expected rms deviation of the counts, δk=√[nf(1-f)]. The expected variation in the relative frequency will then be δf=√[f(1-f)/n]. But what does all this tell us about the observations themselves? One can look at the terms of the binomial distribution with p=f and determine the probability of observing exactly k counts in each interval. Then we can add up the probabilities for those values of k which are within a distance of δk from the expected value for k. The last column on the right shows the probability of this occurring for each interval. A calculation shows the odds aren't uniform for the intervals but equal to 0.6252 ± 0.0375. The probabilities associated with the error bounds are less than those for a normal distribution and fluctuate a little because we are dealing with a discrete probability distribution and taking sum of those values of k between <k>-δk and <k>+δk. One can show that probabilities associated with bounds for a given number of standard deviations in a normal distribution is equal to P(k)=erf(k/√2). So we would expect slightly more observations to be outside the error bounds for the binomial distribution than would be the case for a normal distribution
The binomial distributions for each of the intervals can be plotted together for comparison.
Monday, April 24, 2017
I got a little bogged down with some technical details associated with copying formulas from one Excel worksheet to another. It's a little annoying when Excel crashes, restarts and you have to redo the stuff you haven't saved. It may have been how an error crept into one of my previous pages. You literally lose track of what you are doing. One needs to constantly check one's formulas and trace dependencies when transferring material from one page to another.
I used the expected value formulas to do the fit for the deviations of the Equinox times. The results were similar. The frequencies that I've been using were relative frequencies defined in as the ratio of counts for an interval to the total count. One can also define the function f as a probability density or probability per unit interval. I had to use this definition to get the fit to work properly for the Equinox times. The value of f here is the previous value divided by the width of the interval dx.
The error bounds are nominal in the sense that they are typical of the observed variations for a trapazoidal distribution. The fit values for the trapazoidal distribution are a=4.470 min and b=15.089.
Supplemental (Apr 24): The trapazoidal distribution has an interesting series for formulas for its expected values. The pattern holds for higher powers of x. Technically this might be called a folded trapazoidal distribution since the probabilities for the positive and negative values of x are combined.
Supplemental (Apr 25): The variations in the relative frequencies are scaled down versions of the expected variation in the counts for an interval as this derivation shows.
In evaluating f and δf in the table above I used the observed values for the interval's density, obs_f (=ni/n/dx), as an approximation. It was intended as a check of the trapazoidal density formula whose maximum value is 2/(a+b)=0.1023. Using the same letter for the relative frequencies of the counts and the probability density formula may have be a little too confusing. So the error bounds in the plot are a little too large. Using f* for the density the correct formula for the expected rms error in the density would be as follows with Δx=dx.
Friday, April 21, 2017
From the definition of the trapazoidal distribution and its integral one can obtain formulas for the expected values ⟨x⟩ and ⟨x2⟩.
Then given a set of random numbers from a trapazoidal generator one can analyze the set by counting the number of values that fall within chosen intervals and then estimate the distribution for the intervals and the expected values ⟨x⟩ and ⟨x2⟩.
We now have two equations which can be solved for a and b which can be used to fit the observations and compare the results with the original values of a and b. In the example below the original values were a=0.5 and b=1.0 and the fit values were a=0.538 and b=0.993.
This was a lot easier to do since the equation for ⟨x⟩ can be transformed into a quadratic function of ρ=a/b whose solution can be found if a set of values are assumed for b. The set of values for ρ can then be used to find a zero of the expression for ⟨x2⟩.
Wednesday, April 19, 2017
The least squares fit failed primarily because it favored the majority at the expense of a minority. The histogram cells closer to the mean time had the highest probabilities while those beyond the value of b had zero probability. The result was that the data for the last cell on the right could be ignored when computing the rms error of a trapazoidal distribution if the b value was too small.
The additional constraint for the minimum maximum magnitude of the z-scores for the probability distribution assured that an unlikely situation would not occur. The probability of a histogram interval was found by using the difference of the integral of the trapazoidal distribution of its upper and lower bounds.
When setting bounds for curve fits one has to make certain that significant data is not ignored.
The trapazoidal distribution fit for MICA's Spring Equinox time deviations proved to be a little difficult. The b values were difficult to fit since they favored lower values at the expense of large z-scores for the last interval of the histogram. I tried minimizing the maximum absolute value of the z-scores while minimizing the rms error and got what appears to be a better fit.
Here are some statistics for the fit.
Judging by the z-scores its a marginal trapazoidal distribution at best.
Tuesday, April 18, 2017
Just got through doing a rough trapazoidal fit of the deviations in the time of the Spring Equinox.
This fit uses the MICA times of the 251 Spring Equinoxes from 1800 through 2050. We probably shouldn't take the trapazoidal distribution too seriously but it may be wise to keep an open mind about the actual shape of the distribution. The minimum error was for a=2.05 min and b=15.15 min. The statistics hint that a slight deviation from the mean time is the most probable situation.
Supplemental (Apr 19): This fit ended up somewhat off the mark due to an error in evaluating the probability for the histogram intervals. I found the error in data used last night and corrected it in the next blog. Here too I had problems in choosing a good value for b.
Monday, April 17, 2017
I did a video to show the trapazoid fit process in action. I was able to compensate for Google's processing somewhat but not entirely. Excel recalculates the worksheet if a cell's content is changed to moving a cell about is an easy way to recalculate the worksheet. Each time the selected box is moved the worksheet computes 1000 random trapazoidal numbers, does the data analysis and computes a fit for the data.
You can pause the video to study a particular fit. It helps to zoom in a little too.