httprover's 2nd blog: April 2017

Friday, April 28, 2017

Interpreting Error Bounds for the Trapazoidal Distribution Fit

I've got a better handle on the error bounds for the fit of a trapazoidal distribution to the deviations of the time of the Equinox. Here's a review and a corrected plot with error bounds showing the expected deviation from expected value for δt.

The error bounds used the values of the estimate of the deviation, δf*, for the probability densities, obs_f*, for the intervals in the table above. One can get a better understanding of what the error bounds mean by looking at the expected relative frequency, f_i=n_i/n, for the intervals chosen. The x values indicate the center of the interval. Using the values of a and b for the fitted trapazoidal distribution we can compare the observed counts with the expected counts, k=nf and their expected rms deviation of the counts, δk=√[nf(1-f)]. The expected variation in the relative frequency will then be δf=√[f(1-f)/n]. But what does all this tell us about the observations themselves? One can look at the terms of the binomial distribution with p=f and determine the probability of observing exactly k counts in each interval. Then we can add up the probabilities for those values of k which are within a distance of δk from the expected value for k. The last column on the right shows the probability of this occurring for each interval. A calculation shows the odds aren't uniform for the intervals but equal to 0.6252 ± 0.0375. The probabilities associated with the error bounds are less than those for a normal distribution and fluctuate a little because we are dealing with a discrete probability distribution and taking sum of those values of k between <k>-δk and <k>+δk. One can show that probabilities associated with bounds for a given number of standard deviations in a normal distribution is equal to P(k)=erf(k/√2). So we would expect slightly more observations to be outside the error bounds for the binomial distribution than would be the case for a normal distribution

The binomial distributions for each of the intervals can be plotted together for comparison.

Monday, April 24, 2017

A Fit of the Equinox Deviations Using Expected Values

I got a little bogged down with some technical details associated with copying formulas from one Excel worksheet to another. It's a little annoying when Excel crashes, restarts and you have to redo the stuff you haven't saved. It may have been how an error crept into one of my previous pages. You literally lose track of what you are doing. One needs to constantly check one's formulas and trace dependencies when transferring material from one page to another.

I used the expected value formulas to do the fit for the deviations of the Equinox times. The results were similar. The frequencies that I've been using were relative frequencies defined in as the ratio of counts for an interval to the total count. One can also define the function f as a probability density or probability per unit interval. I had to use this definition to get the fit to work properly for the Equinox times. The value of f here is the previous value divided by the width of the interval dx.

The error bounds are nominal in the sense that they are typical of the observed variations for a trapazoidal distribution. The fit values for the trapazoidal distribution are a=4.470 min and b=15.089.

Supplemental (Apr 24): The trapazoidal distribution has an interesting series for formulas for its expected values. The pattern holds for higher powers of x. Technically this might be called a folded trapazoidal distribution since the probabilities for the positive and negative values of x are combined.

Supplemental (Apr 25): The variations in the relative frequencies are scaled down versions of the expected variation in the counts for an interval as this derivation shows.

In evaluating f and δf in the table above I used the observed values for the interval's density, obs_f (=n_i/n/dx), as an approximation. It was intended as a check of the trapazoidal density formula whose maximum value is 2/(a+b)=0.1023. Using the same letter for the relative frequencies of the counts and the probability density formula may have be a little too confusing. So the error bounds in the plot are a little too large. Using f* for the density the correct formula for the expected rms error in the density would be as follows with Δx=dx.

Friday, April 21, 2017

Using Estimated Expected Values to Fit a Trapazoidal Distribution

From the definition of the trapazoidal distribution and its integral one can obtain formulas for the expected values ⟨x⟩ and ⟨x²⟩.

Then given a set of random numbers from a trapazoidal generator one can analyze the set by counting the number of values that fall within chosen intervals and then estimate the distribution for the intervals and the expected values ⟨x⟩ and ⟨x²⟩.

We now have two equations which can be solved for a and b which can be used to fit the observations and compare the results with the original values of a and b. In the example below the original values were a=0.5 and b=1.0 and the fit values were a=0.538 and b=0.993.

This was a lot easier to do since the equation for ⟨x⟩ can be transformed into a quadratic function of ρ=a/b whose solution can be found if a set of values are assumed for b. The set of values for ρ can then be used to find a zero of the expression for ⟨x²⟩.

Wednesday, April 19, 2017

Why the Least Squares Fit Failed

The least squares fit failed primarily because it favored the majority at the expense of a minority. The histogram cells closer to the mean time had the highest probabilities while those beyond the value of b had zero probability. The result was that the data for the last cell on the right could be ignored when computing the rms error of a trapazoidal distribution if the b value was too small.

The additional constraint for the minimum maximum magnitude of the z-scores for the probability distribution assured that an unlikely situation would not occur. The probability of a histogram interval was found by using the difference of the integral of the trapazoidal distribution of its upper and lower bounds.

When setting bounds for curve fits one has to make certain that significant data is not ignored.

A Better Trapazoid Fit For the Equinox Time Deviations

The trapazoidal distribution fit for MICA's Spring Equinox time deviations proved to be a little difficult. The b values were difficult to fit since they favored lower values at the expense of large z-scores for the last interval of the histogram. I tried minimizing the maximum absolute value of the z-scores while minimizing the rms error and got what appears to be a better fit.

Here are some statistics for the fit.

Judging by the z-scores its a marginal trapazoidal distribution at best.

Tuesday, April 18, 2017

Trapazoid Fit For the Equinox Time Deviations

Just got through doing a rough trapazoidal fit of the deviations in the time of the Spring Equinox.

This fit uses the MICA times of the 251 Spring Equinoxes from 1800 through 2050. We probably shouldn't take the trapazoidal distribution too seriously but it may be wise to keep an open mind about the actual shape of the distribution. The minimum error was for a=2.05 min and b=15.15 min. The statistics hint that a slight deviation from the mean time is the most probable situation.

Supplemental (Apr 19): This fit ended up somewhat off the mark due to an error in evaluating the probability for the histogram intervals. I found the error in data used last night and corrected it in the next blog. Here too I had problems in choosing a good value for b.

Monday, April 17, 2017

Trapazoid Fit Video

I did a video to show the trapazoid fit process in action. I was able to compensate for Google's processing somewhat but not entirely. Excel recalculates the worksheet if a cell's content is changed to moving a cell about is an easy way to recalculate the worksheet. Each time the selected box is moved the worksheet computes 1000 random trapazoidal numbers, does the data analysis and computes a fit for the data.

You can pause the video to study a particular fit. It helps to zoom in a little too.

The Trapazoidal Distribution

Over the weekend I've been studying the Trapazoidal Distribution. One only needs two numbers a and b to define this distribution. Its height, h, can be found since the area under the curve equals 1.

I wrote an Excel user function to generate 1000 random numbers that fit this distribution with a=0.5 and b=1 then determined the observed frequencies for the following set of intervals.

To check the trapazoidal random number generator it was necessary to determine the distribution which best fit the data. To do this a search was necessary to find the values for a and b which minimized the root mean square error for the fit. Since the trapazoidal distribution is symmetric it can be folded horizontally to reduce the number of intervals needed in the calculation.

The fit turned out to be fairly good.

Above the blue dots represent the observed frequencies and the blue line is the fit.

Thursday, April 13, 2017

Getting Control of Excel's Histogram Intervals

I've solved the problem I've been having with setting the intervals on Excel's histograms. I tried adjusting the bounds by setting the Overflow and Underflow bins to ±18.0 but they were ignored. Apparently, these bounds need to be less than the maximum and minimum for the data series used for the histogram. It's debatable whether or not one needs empty columns at the ends of a histogram. These entries set up the histogram in an acceptable manner.

Again, the observed counts from the histogram can be used for comparison with the statistically expected counts for a normal distribution.

The distribution above looks similar to a normal distribution and the z-scores for the intervals can't be rejected on statistical grounds but one cannot always get a clear impression of the actual distribution for deviations from a mean value with a single dataset. In another histogram the distribution appears to be more like a trapazoid.

We would need more data to come to a conclusion about what the actual distribution for the Equinox times is.

Monday, April 10, 2017

Using the Error Function to Evaluate Expected Values

Excel's error function, ERF(x), can be used to evaluate the probabilities for the intervals in a histogram and their expected values. The probability integral, p, below is simplified by the use of a z-score eliminating the standard deviation found in a normal probability distribution. The error function is simplified even further since the factor of 1/2 is missing from the exponential so its arguments require the z-scores a and b of the interval to be divided by the square root of 2. This can be done for the deviations for the times of the Spring Equinoxes. Formulas for binomial distribution can be use to compute the expected value, k=np, for an interval once the probability is known and also the deviation, δk, for the expected value. δk²=np(1-p). The z-scores for the observed counts is found in the last column and gives us a measure of how good the counts for the intervals are. One would expect nearly all the observations to be within 3z for the interval as is the case.

So even though the Earth's orbit deviates in a complicated spiral motion from the Keplerian ellipse, the observed deviations time of the Equinoxes appears to be random.

Supplemental (Apr 10): Another way of looking at the analysis above is that we compared the distribution for the deviations in time with a normal distribution. The z-scores in the last column indicate the central peak is somewhat flattened and the "outliers" at the ends are under represented in our sample. I had a part-time job in the Physics Shop while attending university and one of my teachers used a bean machine to study the statistics. He said that the distributions he was getting were not what was expected. We considered the possibility that the balls might be gaining a little momentum as they fell and were more likely to bounce off the pins and move outwards or the pins may get bent after prolonged use. But rare events are less likely to be represented in a sample and the peak may be more likely to lose its balls than gain them if the total number of balls is small.

Friday, April 7, 2017

Another Look at the Spring Equinox Statistics

I redid the statistics for the deviations from fit formula and got something a little more normal using fewer intervals

Microsoft's histograms are a little difficult to do. For this one I set the underflow and overflow values at -16 and 16 respectively and used 6.4 for the interval width. One can compare the observed values with the expected values and their standard deviations.

One would expect most of the observed values to be within 3 standard deviations of the expected values. Under the circumstances one cannot rule out a normal distribution for the deviations.

Supplemental (Apr 8): I used numerical integration to evaluate the probability integrals for the expected values and their deviations. A more accurate calculation shows that for the 16< interval the expected value is 1.6.

Thursday, April 6, 2017

The Mean Time of the Spring Equinox

I converted the MICA times for the Spring Equinox between 1800 and 2050 to Julian Dates and got a linear equation for the fit. A plot of the deviations appears to be fairly random with a maximum deviation of about 16 minutes.

The "resonance" seems to have disappeared.

Supplemental (Apr 7): The probability distribution for the deviations from the linear fit doesn't appear to be normal. Its peak appears to be somewhat flattened.

Wednesday, April 5, 2017

Keplerian Ellipse Apsides Needed For Calculating Positions

I tried using the apsides from MICA's 2017 positions for the Earth to calculate positions on the Keplerian ellipse but got a 0.02 radian difference between the curves for MICA's radii and the ellipse radii. Using apsides based on the fit brought both curves together.

These angular separation between these two apsides was very close to π as it should be.

Deviations in the Orbital Plane

One can plot the deviations of MICA's Earth positions from the best fitting Keplerian ellipse positions. The result is rather curious. The motion appears to be that of compounded spirals. The smaller cycles are due to the Moon's motion.

The blue dot at the end point marks the last data point.

Supplemental (Apr 6): A 0.0002 AU distance deviation along the ecliptic path of the Sun is enough to account for the observed 16 min deviation in the time of the Spring Equinox.

Tuesday, April 4, 2017

More on the Shifted Time of Spring Equinox

The formula to the shift in time of the Spring Equinox could be clarified somewhat. The Sun has a mean position and mean path. The Moon and other bodies can shift the Earth's position a little and this will affect the apparent position of the Sun which travels parallel to and at the same speed as the mean Sun. So for some arbitrary deviation, δ, relative to the mean position we can calculate the extra distance the Sun needs to travel in order to cross the Celestial Equator and the maximum value.

Note that the error is magnified by small angles of inclination for the path. As stated previously the maximum error is for θ = 0 or a deviation perpendicular to the plane of the Celestial Equator. Of course when the deviated position is above the Celestial Equator the Sun will cross before the Mean Sun does.

Sunday, April 2, 2017

Moon's Effect on the Earth's Positions & the Times of the Equinox

I used MICA to compute the Earth's heliocentric positions for 2017 and then used Excel to fit a Keplerian orbit to the resulting data.

The angle θ is in radians. The resulting orbital elements were,

The epoch is Jan 1, 2017 00:00 UT. If you look at the difference between the two curves you get a small periodic error that fits very well to 12.50 per year which corresponds to the Draconic month.

The deviations account for most but not all of the observed variation in the time of the Spring Equinox.

The speed of the Earth was approximated by dividing the circumference of the Earth's orbit, 2π AU, by the length of the year in minutes. One needs to take the tilt of the Earth's orbit into account to estimate when the Earth will cross the celestial equator.

Edit (Apr 2): Redid the last figure. Here the horizontal line is the Celestial Equator. The sloped line is the Ecliptic along which the Sun travels. The length x is proportional to the distance of a point on the circle from the equator so it is maximum for the right triangle. The numerical results remain unchanged. This calculation would be conclusive if based on the actual angular deviations. As is, the calculation shows how a vertical deviation can affect the time of the equinox.

Supplemental (Apr 2): The times of perihelion and aphelion give a little more information about the Earth's orbit. The values obtained from least squares cubic fits of r are in good agreement with published values.

Edit (Apr 2): Found an error in the linear interpolation for the aphelion. It may have been due to back stepping a fraction of a step.