Tuesday, December 31, 2013
The last post gave the rules for a down-sized version of the St Petersburg Lottery with a maximum of n=4 tosses of a coin. Instead of stopping on "tails" one can flip the coin 4 times each time the game is played ignoring the tosses after the first tail. With 4 tosses there are 16 possible outcomes which we can number 0 through 15 and we can use the digits of the corresponding binary numbers to represent the flips with "0" being a "tail" and "1" a "head". The order of the tosses can be read from right to left and the winnings can be determined and are shown in the following image.
The relative frequencies of an initial run of 0, 1, 2, 3 and 4 heads are seen to be 8, 4, 2, 1 and 1 respectively so the probabilities, 1/2, 1/4, 1/8, 1/16 and 1/16, come out right for the game. N=220 (1M) trials were run and the results shown below are typical.
The mean value for the wins, μ, is about 3 and is approximately equal to the wager, w=3 as expected for a net gain of zero. The statistics for wins and losses shows that the odds against winning anything is 3:1 as predicted. This allows money to be collected to cover the winnings.
So instead of tossing coins we can spin a wheel with the numbers 0-15 on it once and pay out the winnings in the table each time but we cannot watch them snowball as in la boule de neige in Roulette. A French song, La Boule, is about the boule de neige system for trying to beat the odds in Roulette.
Saturday, December 28, 2013
It is interesting to look at the win-loss ratio for the n-toss St Petersburg game. We start by determining the value k for which the pot equals the wager or when 2k-1 = w and then define κ as this value of k lowered to its nearest integer. The determination of the win-loss ratio is as follows.
The key formulas with 2κ approximately equal to n are,
So as the number of tosses becomes indefinitely large as in the St Petersburg paradox the win-loss ratio approaches zero. The contradiction is that one can expect to win an infinitely large amount while at the same time have relatively no chance of winning. The win-loss ratio needs to be considered in addition to the expected value in order to determine a game's fairness. Games with higher win-loss ratios are more alluring to prospective players.
Friday, December 27, 2013
The St. Petersburg paradox is presented in Daniel Bernoulli, Exposition of a New Theory on the Measurement of Risk (1738) found in Econometrica, Jan 1954 (see §17 on p. 31). The problem was not first formulated by him and is similar to the wheat and chessboard problem. A coin is repeatedly tossed until a tails first appears. If this occurs on the first toss the player gets 1 coin, on the second he get 2 coins, on the 3rd: 4 coins, etc., with the number of coins increasing exponentially. What is the expected number of coins that he will get for playing this game? Surprisingly the number is infinite.
Instead of allowing the number of tosses go on indefinitely suppose that it is agreed before the start that the player stops on the nth toss. The expected value for the game is then,
Additionally, one can offer a bonus, b, for getting n heads in a row. If each player has to make a wager, w, to play then the net gain for the kth trial is 2k-1 - w and for n heads in a row it is b - w. The expected value then becomes,
For a fair game the expected value is zero and the expression for the wager is simplified if the bonus is 2n so,
If one sets the limit at 8 tosses of the coin one does not win anything until one gets four tosses in a row. The maximum that one can win is 251 coins for 8 heads in a row.
The paradox is that the expected winnings is infinite for an unlimited number of tosses. Experience will show that the usual outcome for playing is quite low. When the limit on the number of tosses is not too large, the wager is affordable but with larger wagers one can expect to lose more often. The game is one of survival.
Suppose we consider a game of losses instead of winnings. Experience might show that something is profitable based on a limited amount of experience but there are cases where the risks can snowball.
Monday, December 23, 2013
By a stationary distribution for a random process with transition matrix P we mean a system state vector π which remains unchanged after operating on it with P, i.e., Pπ=π. Given P, how do can we determine π? A simple method is to compare the definition of the stationary state with the definition of a eigenvector and deduce a relation connecting the state vector with the eigenvector.
Setting the eigenvalue μ=1 makes the two equations nearly identical. So if we can find an eigenvector of P which has an eigenvalue equal to 1 then we can find the direction of π. To find the magnitude in the given direction we use the fact that the sum of the components of π is one to determine the multiplier for e. We can illustrate how this works by determining the stationary state for a simple random walk problem. In the state diagram below q=1-p:
Looking at the eigenvectors for the transition matrix we see that there are two with eigenvalues equal to 1.
Here each column of π is a stationary state corresponding to final states 1 and 2.
Thursday, December 19, 2013
Last month I posted some information on the rate of change for seasonal anomalies and thought I'd give a little more detail about the calculations involved. One starts with the NOAA Global Land Anomaly and computes the annual averages for the years for which the data is available (1880 to 2012).
One then computes the differences from the monthly averages and sorts the results by month. One can then do a linear fit for each month to get the coefficients for the lines of best fit.
If one looks at the fit for Aug one sees that there is a steady rate of decrease for the 133 years that were used.
As shown in the previous post the rates for the year roughly follows a sinusoidal curve. The maximum rate of change appears to be 0.24 °C per century.
The peak rate of increase in Feb and the rate of decrease in Aug are of the same magnitude for the sinusoidal curve and the average for the curve is 0. The sum of alpha and beta also average to zero since the yearly average is by definition equal to zero. The steady rates of change are evidence of a long term decrease in the difference between the Aug and Feb anomalies.
Thursday, December 12, 2013
It's fairly easy to show that in general the distribution for a function, f, of a normal variate, x, is not a normal distribution. We only get a normal distribution if the derivative of f is a constant.
We can also find a function that has some of the characteristics of the distribution that we found for the global land anomaly.
g(x) = f'(x)=df/dx
Note that the chosen function changes the rate of change in x for a given change in y at the center of the plot. This results in a compression of the distribution near y=0.
Tuesday, December 3, 2013
The normal sum theorem also works for a linear combination z=ax+by of normal variates x and y. Multiplication of a normal variate by a constant positive number produces another normal variate with proportional changes to the mean and standard deviation so a change in scale will reduce the problem to two cases z = x ± y but the minus sign doesn't affect the joint probability density.
Consequentially the probability density for z in both cases above will be the same after the integration over w irregardless of the sign of y and the variance of z will remain the sum of the two individual variances. For the linear combination var(z) = a2 var(x) + b2 var(y).
Monday, December 2, 2013
The reason that I haven't been working on posts lately is that I've been doing random walk simulations in order to determine the relative contribution to the global temperature anomaly. A good introduction to the subject of random walks is Don S. Lemons, An Introduction to Stochastic Processes in Physics. In the book Lemons discusses probability distributions, their properties and gives the normal sum theorem which states that a variable z = x + y which is the sum of two variables with normal distributions also has a normal distribution. His justification for the theorem is that the mean and variance for z is the sum of the means and variances for the two distributions. This lacks some rigor since he doesn't prove that the resulting distribution is actually a normal one. It is not very difficult to do so.
To simplify the proof we set the mean of the two distributions to zero and let x and y vary independently over their possible values. The density for joint distribution, P(x,y), is defined for an element of area with sides dx and dy and is the product of the two individual normal probability densities. Instead of using combinations of x and y it is easier to use z and w = x - y as our variables and express P(x,y)dxdy in terms of them. The directions for the axes of z and w are perpendicular to each other but at angles to x and y so we find that the element of area has a factor of 1/2 associated with it.
We next replace x and y in the exponential with their formulas in terms of z and w and simply the result defining the new constants λ, μ and α .
To get the probability distribution for z we integrate the joint distribution over all values of w. We can simply the integral by factoring out the z2 term in the exponential leaving an integral, I(λ,α,z), which can be further simplified by completing the square and evaluated using the standard formula for the Gaussian integral.
It is fairly easy to show that φ(z) is a normal distribution with variance σeff2 = σ2 + s2.
Thursday, November 21, 2013
You may have noticed that there are no lines linking the states 0 and 4 or 0 and 5 on the state diagram in the last blog. These permutations require a combination of two successive elementary permutations that can be defined as Y = GR and M = BR. If we allow Y to operate repeatedly on state S0 = (0,1,2) it produces the states S4 = (1,2,0), S5 = (2,0,1) and again S0 = (0,1,2). Y operating repeatedly on S1 = (1,0,2) produces S2 = (0,2,1), S3 = (2,1,0) and again S1 = (1,0,2). If you study the changes in the states you will see that Y is the left-shift operator which moves the second two objects one space to the left and tacks the first object on the end. M is the right-shift operator and the inverse of Y so MY = I. There are a total of 6 permutation operations altogether which consist of the identity operation I, the 3 elementary pairs of exchanges and the 2 shift operations. The products of these operations form the group seen in the complete product table below where the operations in the upper row are followed by those in the left column giving the entry in the corresponding row and column of the table. Checking we see that R in the upper row followed by G in the left column is GR = Y and similarly BR = M.
Two sets of triangles in the state diagram are formed by the lines corresponding to the repeated action of Y and M which could be represented by yellow and magenta lines. They also point out that the lines in state diagrams are directional so we would have yellow lines going one way and magenta lines going the other.
This has nothing to do with global warming but shows how complicated state diagrams can get. The permutations have an interesting state diagram and form a good introduction to what is called group theory.
Supplemental (Nov 21): The complete state diagram (arrows indicate the action of a permutation):
The state diagram gives a better picture of the changes produced by the permutation operations than the product table does since it has the labeled states on it. Group theory helped to generate the state diagram for the permutations.
Tuesday, November 19, 2013
For the global temperature anomaly the correlation between one month and those following it disappears rapidly and aside from a slight annual effect is essentially nonexistent after a couple of months. The drift function, the changes from the mean value, appears to be random and memoryless so it can be represented by a Markov process. To simplify the analysis we had to divide up the range of the anomaly change into a number smaller sections and use frequencies instead of probabilities. No change in the anomaly is the most probable outcome but finite positive and negative changes are also possible with the distribution being symmetric about the mean. So to understand changes in the anomaly we need to know what to expect for the drift, the change in the mean, due to this Markov process in order to distinguish it from a steady and persistent change that we have to associate with long-term global warming.
A Markov process can be represented by a transition matrix giving the probability of the various changes from one state to another which in turn can be represented by a state diagram. A simple example of a state space is the permutation of three objects which can be represented by the triple (a,b,c) which in turn can be represented by the Euclidean coordinates of a point such as (0,1,2). For three objects we can define three elementary permutations consisting of the exchange of just two of the objects. Let R represent the exchange of the first two objects, G that of the second two, and B that of the first and last. If we start with (0,1,2) and apply R we get (1,0,2), G gives (0,2,1) and B (2,1,0). Applying R, G and B again to these new states we find that each elementary permutation is its inverse, taking us back to (0,1,2) and applying G and B to (1,0,2) gives (1,2,0) and (2,0,1) respectively. So the 3 elementary exchanges give a total of 6=1+3+2 permutations. We can label the states by the order in which we found them calling them 0 through 5. A state diagram makes it easier to keep track of all the changes that can take place when we apply R, G, and B to these states.
In the state diagram above the elementary exchanges R, G, and B are represented by the colors red, green, and blue.
One can trick Mathcad 11's 3D plotter into drawing the figure above using sets of matrices containing the data for the points and lines. It took ten separate plots altogether and the labeling numbers had to be added later with MS Paint. A "next generation" plotter would only need 4 plots, one for the points and one for each set of lines.
Moving along the axes the required steps one can see that the point at the top of the figure is (0,1,2). The set of points that represent the states turn out to be in the same plane.
Tuesday, November 12, 2013
One can compare the estimated probability distribution for the global land anomaly with that of the normal and Cauchy distributions. It is an intermediate of the two distributions with a r.m.s. deviation of 0.0867 for the best fitting Cauchy distribution and 0.0904 for a normal distribution with the same standard deviation.
The observed deviations of the anomalies from the annual average has a sharper peak and more distant outliers than a normal distribution but it is not as extreme as the Cauchy distribution having a well defined mean and standard deviation. An example of a process for which the Cauchy distribution applies is that of the Lorentzian function for the distribution of radiation in a spectral line of an atom.
Supplemental (Nov 13): Lorentz discusses the absorption distribution in The Theory of Electrons (1916) in §133-136 and Notes 60-62.
Sunday, November 10, 2013
I've been working on separating the diffusion function from the drift function for the monthly global land anomaly for the years 1880 through 2012 and found a small steady seasonal change in the monthly anomalies that amounts to about 0.2 °C per 100 years. The change during the year is nearly sinusoidal producing relatively warmer winter months and cooler summer months.
To get a better estimate of the diffusion function for the anomaly I determined the yearly averages for use as the drift function and subtracted the anomaly values from these. Then I subtracted the seasonal change above. The differences were sorted by month to get the monthly means and the standard deviations for normal distributions in order to produce an equivalent probability function that was the weighted sum of the monthly normal distributions. The estimated probability distribution, p, gave a good fit for the diffusion data. The scale for the anomaly, x and ξ, between -2 and 2 was divided into 300 parts to obtain the frequencies for the diffusion histogram Φ. The calculated frequencies, F, were found by integrating the probability distribution, p(x), over the sub-intervals and multiplying this by the number of diffusion data points, N=1596.
The procedure above removed some of the "contaminants" present and consequently reduced the width of the peak of the diffusion function slightly. The result is a better measure of the short term random fluctuations in the anomaly.
Supplemental (Nov 11): The diffusion function by definition has a mean of zero and the seasonal change was measured relative to this average. The seasonal change tells us that the difference between summer and winter temperatures has been slowly decreasing over time but says nothing about their changes relative to the reference temperatures of the anomaly. Changes in the annual mean are associated with the drift function.
Thursday, November 7, 2013
The accurate expected values from the first 100 years of anomaly data can be used to predict the shape of the entire drift function. I extracted the anomalies prior to 1980 from the monthly global land anomaly data and determined a separate set of points for the drift function. Then a fit was found using the most accurately known points for the curve. The prediction of what the entire drift function would look like is consistent with the curve found by using all the data. The expected values were slightly different for the earlier data since there was less data to work with and the frequencies were reduced somewhat. The earlier data and fit are shown in blue in the plot below.
Wednesday, November 6, 2013
The frequency table indicates that not all the data points for the expected next value of the anomaly have the same number of points contributing to the estimate. The points near the center of the plot are more accurate than those at the ends. Weighted least squares polynomial fits give a better fit near for the central portion of the graph. A derivation of the procedure that I use for function fits is shown below. For polynomial fits the functions are just the powers of x. The data points are (xk,yk) which have weights wk associated with them.
The function φ2a uses this method to find the coefficients of the polynomial which gives the fit with the least variance. The weights used for the x values are the sum of the frequencies in its column of the frequency table. For ordinary least squares polynomial fits the weights used are wk = 1 for each data point and so W is just the identity matrix and consequently it is not needed to find the coefficients.
In the plot above most of the data to the left of x = 0 is prior to the beginning of 1980 while that to the right is after that. The data points between x = -1 and x = 1 have greater weights and less uncertainty in the estimated expected values. The estimates are better to the left since they are based on about 100 years of data while those to the right are based on about 33 years of data. It appears to be the same curve on both sides of the center and more data would help confirm this. The fit does not appear to be symmetrical about the stable point but there is a relatively flat section just to the right of it. The curvature is present in the most accurately known portion of the drift function just to the left of center.
Tuesday, November 5, 2013
The linear fit for Galton's stature data seems to indicate a deviation of the mean height from the stable point and this would be a violation of the regression to the mean. A cubic fit gives better agreement with the law of the regression to the mean. The scales below have been adjusted to show the difference in inches from the stable height.
There are more data points near the origin of the plot and consequently one would expect greater certainty in their position than at the ends. Still the data indicates a weakening of the tendency to regress to the mean. Perhaps this shows more a preference for taller spouses by taller people.
People are taller now than they were thousands of years ago. This suggests that the stable point has shifted to the right over time and so the drift function must be a function of time. We can allow for this change by replacing the constant coefficients in the drift function by functions of time. Changes in the drift function would result in changes in the observed heights of the population. Some sources of change might be changes in the environment, the selection of stronger, taller men through combat or some bias in the preferences in the population. The drift function may be what's controlling evolution.
The drift function for the temperature anomaly may be affected by environment factors and consequently be responsible for some change in the mean but with a cubic drift function the tendency to return to the stable point is reduced at points nearby. Random walks would have more of an influence on the observed anomalies making it difficult to tell if there actually is some global warming occurring. This point out a need to be clear about what we mean by global warming. The semantics may be politically correct but are they scientifically correct?
Monday, November 4, 2013
One can find a drift function for the data in Table I of Galton's Regression towards Mediocrity in Hereditary Stature (1886). The columns designate the average height of the parents, x, and the rows those of the children, x'. The table is reproduced here with nominal heights assigned where they are missing in Galton's table.
One has to use the row heights to compute the expected value of x' and one can find a simple linear fit for the drift function.
The slope is close to -2/3. For the drift function we can define ΔX = x - x0 where x0 is the average height of the children.
There is a stable point at approximately x = 68.7 in. and parents whose average height is above this tend to have shorter children and those below tend to have taller children. Note the similarity of the drift function to the difference between the lines for the parents and the children in Plate IX in Galton's paper.
Sunday, November 3, 2013
A frequency table contains nearly the same information as a transition matrix for a stochastic process and can also be used to estimate the expected value of a succeeding state given an initial state. For the monthly global land anomalies the values ranged between xi = -2 and xf = 2 and this interval was subdivided into 20 parts as follows. The center of each sub-interval is given by xj.
In the following table the entry in a row shows the number of times a value, x', of the row followed the value, x, of its column. You may recognize the similarity of this table to those that Galton used for the inheritance of characteristics of children from those of their parents.
The frequencies can be used to compute the expected value of x' by summing the product of x' and the probability or relative frequency for a column using the formula below.
The differences of the expected values of x' from the initial x values were used to estimate the drift function.