httprover's 2nd blog: November 2013

Thursday, November 21, 2013

The State Diagram & The Missing Permutations

You may have noticed that there are no lines linking the states 0 and 4 or 0 and 5 on the state diagram in the last blog. These permutations require a combination of two successive elementary permutations that can be defined as Y = GR and M = BR. If we allow Y to operate repeatedly on state S₀ = (0,1,2) it produces the states S₄ = (1,2,0), S₅ = (2,0,1) and again S₀ = (0,1,2). Y operating repeatedly on S₁ = (1,0,2) produces S₂ = (0,2,1), S₃ = (2,1,0) and again S₁ = (1,0,2). If you study the changes in the states you will see that Y is the left-shift operator which moves the second two objects one space to the left and tacks the first object on the end. M is the right-shift operator and the inverse of Y so MY = I. There are a total of 6 permutation operations altogether which consist of the identity operation I, the 3 elementary pairs of exchanges and the 2 shift operations. The products of these operations form the group seen in the complete product table below where the operations in the upper row are followed by those in the left column giving the entry in the corresponding row and column of the table. Checking we see that R in the upper row followed by G in the left column is GR = Y and similarly BR = M.

Two sets of triangles in the state diagram are formed by the lines corresponding to the repeated action of Y and M which could be represented by yellow and magenta lines. They also point out that the lines in state diagrams are directional so we would have yellow lines going one way and magenta lines going the other.

This has nothing to do with global warming but shows how complicated state diagrams can get. The permutations have an interesting state diagram and form a good introduction to what is called group theory.

Supplemental (Nov 21): The complete state diagram (arrows indicate the action of a permutation):

The state diagram gives a better picture of the changes produced by the permutation operations than the product table does since it has the labeled states on it. Group theory helped to generate the state diagram for the permutations.

Tuesday, November 19, 2013

State Diagrams

For the global temperature anomaly the correlation between one month and those following it disappears rapidly and aside from a slight annual effect is essentially nonexistent after a couple of months. The drift function, the changes from the mean value, appears to be random and memoryless so it can be represented by a Markov process. To simplify the analysis we had to divide up the range of the anomaly change into a number smaller sections and use frequencies instead of probabilities. No change in the anomaly is the most probable outcome but finite positive and negative changes are also possible with the distribution being symmetric about the mean. So to understand changes in the anomaly we need to know what to expect for the drift, the change in the mean, due to this Markov process in order to distinguish it from a steady and persistent change that we have to associate with long-term global warming.

A Markov process can be represented by a transition matrix giving the probability of the various changes from one state to another which in turn can be represented by a state diagram. A simple example of a state space is the permutation of three objects which can be represented by the triple (a,b,c) which in turn can be represented by the Euclidean coordinates of a point such as (0,1,2). For three objects we can define three elementary permutations consisting of the exchange of just two of the objects. Let R represent the exchange of the first two objects, G that of the second two, and B that of the first and last. If we start with (0,1,2) and apply R we get (1,0,2), G gives (0,2,1) and B (2,1,0). Applying R, G and B again to these new states we find that each elementary permutation is its inverse, taking us back to (0,1,2) and applying G and B to (1,0,2) gives (1,2,0) and (2,0,1) respectively. So the 3 elementary exchanges give a total of 6=1+3+2 permutations. We can label the states by the order in which we found them calling them 0 through 5. A state diagram makes it easier to keep track of all the changes that can take place when we apply R, G, and B to these states.

In the state diagram above the elementary exchanges R, G, and B are represented by the colors red, green, and blue.

One can trick Mathcad 11's 3D plotter into drawing the figure above using sets of matrices containing the data for the points and lines. It took ten separate plots altogether and the labeling numbers had to be added later with MS Paint. A "next generation" plotter would only need 4 plots, one for the points and one for each set of lines.

Moving along the axes the required steps one can see that the point at the top of the figure is (0,1,2). The set of points that represent the states turn out to be in the same plane.

Tuesday, November 12, 2013

Comparison of the Anomaly Distribution with Other Probability Distributions

One can compare the estimated probability distribution for the global land anomaly with that of the normal and Cauchy distributions. It is an intermediate of the two distributions with a r.m.s. deviation of 0.0867 for the best fitting Cauchy distribution and 0.0904 for a normal distribution with the same standard deviation.

The observed deviations of the anomalies from the annual average has a sharper peak and more distant outliers than a normal distribution but it is not as extreme as the Cauchy distribution having a well defined mean and standard deviation. An example of a process for which the Cauchy distribution applies is that of the Lorentzian function for the distribution of radiation in a spectral line of an atom.

Supplemental (Nov 13): Lorentz discusses the absorption distribution in The Theory of Electrons (1916) in §133-136 and Notes 60-62.

Sunday, November 10, 2013

Seasonal Anomaly Drift & Improved Diffusion Function

I've been working on separating the diffusion function from the drift function for the monthly global land anomaly for the years 1880 through 2012 and found a small steady seasonal change in the monthly anomalies that amounts to about 0.2 °C per 100 years. The change during the year is nearly sinusoidal producing relatively warmer winter months and cooler summer months.

β'=a·φ(q)

To get a better estimate of the diffusion function for the anomaly I determined the yearly averages for use as the drift function and subtracted the anomaly values from these. Then I subtracted the seasonal change above. The differences were sorted by month to get the monthly means and the standard deviations for normal distributions in order to produce an equivalent probability function that was the weighted sum of the monthly normal distributions. The estimated probability distribution, p, gave a good fit for the diffusion data. The scale for the anomaly, x and ξ, between -2 and 2 was divided into 300 parts to obtain the frequencies for the diffusion histogram Φ. The calculated frequencies, F, were found by integrating the probability distribution, p(x), over the sub-intervals and multiplying this by the number of diffusion data points, N=1596.

The procedure above removed some of the "contaminants" present and consequently reduced the width of the peak of the diffusion function slightly. The result is a better measure of the short term random fluctuations in the anomaly.

Supplemental (Nov 11): The diffusion function by definition has a mean of zero and the seasonal change was measured relative to this average. The seasonal change tells us that the difference between summer and winter temperatures has been slowly decreasing over time but says nothing about their changes relative to the reference temperatures of the anomaly. Changes in the annual mean are associated with the drift function.

Thursday, November 7, 2013

Using Data Prior To 1980 To Predict The Shape Of The Entire Drift Function

The accurate expected values from the first 100 years of anomaly data can be used to predict the shape of the entire drift function. I extracted the anomalies prior to 1980 from the monthly global land anomaly data and determined a separate set of points for the drift function. Then a fit was found using the most accurately known points for the curve. The prediction of what the entire drift function would look like is consistent with the curve found by using all the data. The expected values were slightly different for the earlier data since there was less data to work with and the frequencies were reduced somewhat. The earlier data and fit are shown in blue in the plot below.

Wednesday, November 6, 2013

Need To Use Weighted Fits For The Drift Function

The frequency table indicates that not all the data points for the expected next value of the anomaly have the same number of points contributing to the estimate. The points near the center of the plot are more accurate than those at the ends. Weighted least squares polynomial fits give a better fit near for the central portion of the graph. A derivation of the procedure that I use for function fits is shown below. For polynomial fits the functions are just the powers of x. The data points are (x_k,y_k) which have weights w_k associated with them.

The function φ2a uses this method to find the coefficients of the polynomial which gives the fit with the least variance. The weights used for the x values are the sum of the frequencies in its column of the frequency table. For ordinary least squares polynomial fits the weights used are w_k = 1 for each data point and so W is just the identity matrix and consequently it is not needed to find the coefficients.

In the plot above most of the data to the left of x = 0 is prior to the beginning of 1980 while that to the right is after that. The data points between x = -1 and x = 1 have greater weights and less uncertainty in the estimated expected values. The estimates are better to the left since they are based on about 100 years of data while those to the right are based on about 33 years of data. It appears to be the same curve on both sides of the center and more data would help confirm this. The fit does not appear to be symmetrical about the stable point but there is a relatively flat section just to the right of it. The curvature is present in the most accurately known portion of the drift function just to the left of center.

Tuesday, November 5, 2013

Another Look At The Drift Function For Galton's Stature Data

The linear fit for Galton's stature data seems to indicate a deviation of the mean height from the stable point and this would be a violation of the regression to the mean. A cubic fit gives better agreement with the law of the regression to the mean. The scales below have been adjusted to show the difference in inches from the stable height.

There are more data points near the origin of the plot and consequently one would expect greater certainty in their position than at the ends. Still the data indicates a weakening of the tendency to regress to the mean. Perhaps this shows more a preference for taller spouses by taller people.

People are taller now than they were thousands of years ago. This suggests that the stable point has shifted to the right over time and so the drift function must be a function of time. We can allow for this change by replacing the constant coefficients in the drift function by functions of time. Changes in the drift function would result in changes in the observed heights of the population. Some sources of change might be changes in the environment, the selection of stronger, taller men through combat or some bias in the preferences in the population. The drift function may be what's controlling evolution.

The drift function for the temperature anomaly may be affected by environment factors and consequently be responsible for some change in the mean but with a cubic drift function the tendency to return to the stable point is reduced at points nearby. Random walks would have more of an influence on the observed anomalies making it difficult to tell if there actually is some global warming occurring. This point out a need to be clear about what we mean by global warming. The semantics may be politically correct but are they scientifically correct?

Monday, November 4, 2013

The Drift Function For Galton's Stature Data

One can find a drift function for the data in Table I of Galton's Regression towards Mediocrity in Hereditary Stature (1886). The columns designate the average height of the parents, x, and the rows those of the children, x'. The table is reproduced here with nominal heights assigned where they are missing in Galton's table.

One has to use the row heights to compute the expected value of x' and one can find a simple linear fit for the drift function.

The slope is close to -2/3. For the drift function we can define ΔX = x - x₀ where x₀ is the average height of the children.

There is a stable point at approximately x = 68.7 in. and parents whose average height is above this tend to have shorter children and those below tend to have taller children. Note the similarity of the drift function to the difference between the lines for the parents and the children in Plate IX in Galton's paper.

Sunday, November 3, 2013

Calculating Expected Values From A Frequency Matrix

A frequency table contains nearly the same information as a transition matrix for a stochastic process and can also be used to estimate the expected value of a succeeding state given an initial state. For the monthly global land anomalies the values ranged between x_i = -2 and x_f = 2 and this interval was subdivided into 20 parts as follows. The center of each sub-interval is given by x_j.

In the following table the entry in a row shows the number of times a value, x', of the row followed the value, x, of its column. You may recognize the similarity of this table to those that Galton used for the inheritance of characteristics of children from those of their parents.

The frequencies can be used to compute the expected value of x' by summing the product of x' and the probability or relative frequency for a column using the formula below.

The differences of the expected values of x' from the initial x values were used to estimate the drift function.

Saturday, November 2, 2013

The Drift Function Could Be Exactly Cubic

The anomaly values used for fitting the drift function in the last blog were those of the lower ends of the sub-intervals. Using the centered x values for the sub-intervals and shifting the center of the anomaly to x = 0.0313 one finds that the coefficients are within 2% of a purely cubic equation.

The Drift Function For The Monthly Global Land Anomaly Appears To Be Stable

To determine the drift function for the monthly global land anomaly the interval from -2 degrees to 2 degrees was divided up into 20 parts then the anomaly values were scanned to determine the intervals they fell in and the interval of the following anomaly. The result of the scan was a 20 x 20 matrix in which the columns represented the sub-interval, x, the first anomaly fell into and the rows, x', that of the next anomaly. Next an expected value was computed for each column to determine the most likely value of x' given that of x. Finally the expected change, Δx = x' - x, was computed for each value of x. The results are plotted below along with a least squares polynomial fit.

The anomaly data indicates that at higher anomaly values the anomaly is likely to decrease and at lower values it is likely to increase while remaining relatively stationary near the center. This indicates that the anomaly fluctuates about a stable point.