Wednesday, October 30, 2013

Stochastic Drift And The Central Limit Theorem


  I've been going through some books on stochastic processes such as Gillespie's Markov Processes and Kittel's Elementary Statistical Physics. The two characterizing functions of a continuous Markov process are the drift function and the diffusion function. These seem to be the analogs of what I have called "signal" and "noise" in the preceding blogs. It may be that the Earth's equilibrium temperatures are subject to drift over time which would be difficult to distinguish from global warming. If a drift is persistent it will have consequences over time but would not be as serious as a run-away greenhouse effect.

  Global warming does not appear to be a well studied as a stochastic process. Maxwell, Boltzmann and Gibbs introduced the concepts of statistical physics to the study of gases and Einstein's explanation of Brownian motion extended this to liquids. The Earth can be considered as a physical system in contact with an external heat source and it is normal for there to be fluctuations in temperature but I doubt that we need a Lyapunov control system to clamp the Earth's mean temperature to some arbitrarily chosen fixed value.

  The weighted sum of normal distribution functions that was found to be a good fit for the variation in the monthly global land anomaly may have a variation of the Lyapunov central limit theorem associated with it but it's definitely not a normal distribution.

Wednesday, October 23, 2013

NCDC September Monthly Climate Reports Out Today


  NOAA's National Climate Data Center regularly releases monthly climate reports for the US and the entire globe and the September 2013 reports are now available online.

Monday, October 21, 2013

Standard Deviation For A Sum Of Normal Distributions


  It can be shown using the mathematical tools found in textbooks on Statistical Mechanics that the standard deviation for a sum of m normal distributions whose individual standard deviations are σj is,


So, we don't need to calculate the expected value integral E(x2) with the weighted sum of the individual normal distributions if the individual stds are already known. One can also show that the mean for the summed distribution is the weighted sum of the means for the normal distributions.

Saturday, October 19, 2013

Coherence in Statistical and Physical Processes


  We all know how variable the weather from time to time and from place to place and the global means reflect this to a smaller extent. Thermodynamics and Statistical Mechanics tell us that a large physical system can experience fluctuations from its equilibrium state. The Earth's weather also has seasonal variations due to the inclination of the Earth relative to its orbit and annual changes in its distance from the Sun. Cloudy weather can block the amount of sunlight that a location can receive throughout the year. The Earth's rotation produces a daily cycle of heating and cooling. Consequentially the Earth's weather is in a constant state of flux. There may be some coherence between one location and another nearby but there is no guarantee that this is so for large separations of time and space. There are statistical and physical processes for which the mean and variation of a distribution vary with time and there is no long term pattern to observations. We can check to see if there is a "coherence time" for the global anomalies and the test that we designed for predictions might be useful for this. I have put together a list of readings that might help to give another view on global warming along with two references on statistics.

Cyclostationary process

Stationary process

Thermal fluctuations

Gaussian noise

Ornstein-Uhlenbeck process

An Introduction to Scientific Research - Wilson

The Statistical Analysis of Experimental Data - Mandel

Friday, October 18, 2013

Using Statistics to Reject a Projection


  Instead of using an integer multiple of the standard deviation, s, for the limits of a test one can use an arbitrary multiple known as a z-value defined by z = [a-E(a)]/s where a is a limiting value for the difference of the anomaly from the smoothed anomaly or its fit. Using z = 2.0 for the monthly global land anomaly with the three harmonic fit we find that the probability for observing an outlier is q = 5.848% so out of the 1604 months we would expect about 94 outliers. The observed number for the anomaly data of 92 is close to the expected number which is reasonable since we chose the probability function to fit the data. For z =2.0 we would expect the observed number of outliers to range from 75 to 113.


  How can we decide whether or not it is safe to reject a projection when we have collected a number of observations after the prediction? In ten years one would have 120 months of data to work with and suppose for arguments sake we observe kobs outliers. Using the binomial distribution we can compute the probability, Pk, of observing k outliers and also the probability, 1- Pk, of not observing that number. If the level of certainty is set at 95% and the probability of not observing kobs outliers is greater than this value we can confidently reject the projection. We would have to pass the projection if kobs was between 4 and 10 inclusively.


Wednesday, October 16, 2013

Confidence Intervals for Projections and Their Statistics


  With the projections and new data we are likely to have fewer observations to work with. Ten years after a projection we will have 120 months of anomaly data and we can use the binomial distribution to check the counts. For better statistics we can use the 1σ outliers and for the 3 harmonic fit of the global land anomaly evaluating the probability integral indicates q = 0.259. For the binomial distribution we get the following results.


For the 2σ q = 0.0585 and for 3σ q = 0.0112 so one will get fewer outliers.

Tuesday, October 15, 2013

Confidence Intervals for the Fits and Checking Statistics


  Confidence intervals are error bounds used in statistics to decide whether a given set of trials has passed or failed a test. They are deemed to pass if the number of successes are with the chosen limits and fail otherwise. Suppose we want to check N months of anomaly data to see if the number of outliers, the number of months outside the 3σ limits, is acceptable for a given fit and distribution for the data. If p is the probability that a month will be inside the interval then q = 1-p is the probability that it will fall outside. One can find p by integrating the probability function for the data between the 3σ limits. We have to be able to assign probabilities to the range of counts, k, that are possible. One can do this by using the binomial distribution. If N is large and q small the binomial distribution is difficult to calculate because of the large number of multiplications needed but the Poisson distribution is a good approximation and works quite well. Some formulas and probabilities are shown for the Poisson distribution below.


λ = qN is the average number of counts that one can expect to fall outside the 3σ bounds. We can call this event a success since it is the quantity that we are interested in and we have succeeded in observing it. One can find the expected value of some function, x, of the counts, k, by multiplying each value by its probability and adding everything together. The variance is thus the sum of the expected deviation of the counts from the expected value squared. The standard deviation is s, the square root of the variance, and its multiples are used to set the confidence intervals. Since we are working with integer counts we can't arbitrarily choose a the probability intervals we want to work with but have to calculate the probabilities for the confidence interval chosen. The probabilities above were determined for λ = 18.

Supplemental (Oct 16): The statistical checks on the fits for the global land anomaly indicate that they are consistent with the statistical model that we have chosen for them. The observed number of 3σ outliers of the combined probability distribution is near the 1s value for the Poisson distribution. We can apply the same test to the projections but the smaller numbers may require the use of the binomial distribution. The projections pass if future observations fit the same model that we used for the fit. The requirement for acceptance of a projection is that the future observations are part of the same population as the known observations.

Supplemental (Oct 16): In order to avoid confusion over which standard deviation is being referred to one should be consistent in the use of symbols for them. I have been using σ for the measure of the deviation of the anomaly observations from the smoothed 20-year average and s for the Poisson distribution of the count of the 3σ outliers hence the correction to the comment above.

Monday, October 14, 2013

Averaging Distorts The Original "Signal" In The Anomaly Data


  Just like an integrator or low pass filter in Electronics the averaging mechanism distorts the signal present in the data. Averaging will produce both a reduction in amplitude and a phase shift for a sinusoidal signal. To determine the dampening factor for 20-year averaging I generated a number of sine waves of unit amplitude for a number of frequencies and subjected them to 20-year averaging. The resulting curve obtained by taking half the peak-to-peak amplitude is similar to what one would get for a low-pass filter. Lower frequencies corresponds to longer periods. T is the period in years.


The period for the three harmonic fit was T = 123.054 Julian years. The damping factors, D, were 0.957, 0.835 and 0.652 for the 1st, 2nd and 3rd harmonics respectively. Increasing the amplitude of the harmonics by dividing the corresponding damping factors produced small increases to the deviation from a straight line in the direction of the center of the data.

  It may be difficult to find higher order harmonics in the data since they are more strongly damped and may be masked by the "noise."

Sunday, October 13, 2013

Comparison Of Higher Order Fits For The Monthly Global Land Anomaly


  The "residuals" for the second harmonic fit of the global land anomaly showed what look like a third harmonic so I did two higher order fits. The first assumed there was a third harmonic present in the data and the second was just a polynomial fit. The resulting residuals showed an improvement along the smoothed anomaly for the first fit but the second still showed what looked like the presence of a third harmonic.








The higher order fits seem to favor periodic terms over non-periodic terms. Both projections are roughly in agreement for a period of about 10 years. Each fit had 24 months outside the 3σ limits for a total of 1604 months which is 1.50%. The expected value for the fitted probability distribution is 1.15%.

Supplemental (Oct 14): The statistics above were based on deviations of the anomaly from the unsmoothed 20-year average. I checked the statistics for deviations from the smoothed 20-year average and the results were slightly different. The standard deviation was σ = 0.315. The fit for the 3 harmonics had 19/1604 months (1.185%) outside the 3σ limits. The polynomial fit had 23/1604 months (1.434%) outside the 3σ limits. For a normal distribution with σ = 0.315 one would expect the number of outliers to vary from 13.5 to 21.9 with the expected number being 17.7 (1.103%). The polynomial fit may already be encountering some problems. The terms with the higher powers have more influence away from the center and affect the long term projections more.

A Time Scale For Fits Based On The Julian Calendar


  One problem with the Gregorian calendar is that the years are not all the same length and this results in a wee bit of nonlinearity in the time scale. One can avoid this by using the Julian year as the unit of time.The figure below shows the formula used to determine the median time for the months and compares the angular frequency and function coefficients with the subscript G12 indicating the results using the Gregorian year and twelfths for the months and the subscript J indicating the results for the Julian year with decimal fractions. JD is a function that converts the Gregorian year, month, day and UT hour to the Julian Date which is used for astronomy.


The new time scale has a more noticeable effect on the coefficients for the functional form with the second harmonic and fitting the global monthly land temperature anomaly.

A More Precise Time Scale Has Little Effect On Anomaly Fits


  I tried a more precise time scale using the day of the year for the center of the months divided by the number of days in the year for the position of a month in fractions of a year instead of just dividing the number of the month by twelve and got a barely noticeable change in the coefficients of the terms for the global land anomaly fit. One reason was that the difference in years from the center of the range of the 20-year average for the anomaly data was used in the formula to be fitted so a 15 day shift in time wouldn't have much of an effect. This justifies the use of the simpler approximation.


The day-of-year function above for the Gregorian Calendar was a variation of the one used in an earlier post and uses the year, month and day as variables and first decides if the year was a leap year before using the older doy function. One line of NOAA's anomaly data gave the year, the month and the mean anomaly for the month.

Supplemental (Oct 13): For the middle of the month one should use the average of the first and last days of the months which requires the day before the second doy and so one has to subtract one from the numerator in the second line above. (Sorry, it was a typo made in the wee hours of the night.)

Friday, October 11, 2013

Confidence Limits For The Global Land Anomaly


  Since the yearly land probability function is different from a normal probability function the confidence limits of σ, 2σ and 3σ also have different probabilities associated with them than those used for the normal distribution.


The probabilities on the left and the right are for the normal distribution and the yearly distribution respectively. The land distribution is more likely to have monthly mean anomalies outside its 3s limits than one would expect for the normal distribution's 3σ limits.

Thursday, October 10, 2013

Nonlinear Least Squares Fits


  The angular frequency, ω, is one of the unknowns for the fit functions used and complicates finding the best least squares fit. However if ω is known we can do an ordinary linear least squares fit since the functions used are linear functions of the coefficients. One can compare the variance for different values of ω and look for the one with the lowest variance. I was doing the fits using trial and error and got an optimal fit for function with the second harmonic that had problems at both of the smoothed curve for the data. I tried again and got one at a lower value for ω. This spurred a rewrite of the program that I was using to allow a scan of the ω values. This is the result that the scan produced.


It may be possible that the same functional form has more than one good fit and this can contribute to the uncertainty about what is happening w.r.t. global warming. Another possibility is that the smoothed curve may be slightly off since the signal may not be absolutely perfect and that could throw off the fit off as well.

Global Monthly Land Fits And Deviation


  To complete this little study of the global temperature anomalies we need to include NOAA's monthly land anomaly for both hemispheres. Again the fit is for the smoothed 20-year average plus the two unsmoothed 10-year end segments. The same functional forms were used again and the best least squares fits were found for these.






I've included a calculation of the standard deviation for the yearly probability distribution. The probabilities found in the "wings" (red curve) have larger squares and consequentially make more of a contribution to the integral.



The land anomaly does not appear to have as much future cooling as the ocean anomaly. The temperature difference between the two might affect the strength of storms somewhat in the near future.

Wednesday, October 9, 2013

Some Dichotomies To Consider W.R.T. The Anomalies And Their Fits


  There are hidden assumptions that were made when we chose procedures to analyze the temperature anomaly data. The first is what is "noise" and what is "signal?" Another question that we might ask is whether the anomalies are analytic in the sense that we can use statistics and calculus to determine fits for the smoothed anomalies of are they capricious in nature and there is no real pattern to what can happen? We also need to ask if we are trying to seek what the anomaly data can tell us about climate change or are we trying to impose some preconceived notion on what is there? We need to keep an open mind. What we call Science is knowledge of the world that consists of facts supported by evidence. I think the jury is still out on global warming.

  I must admit I have been a little driven lately first by concerns about the upcoming IPCC report and by some personal problems that may limit my ability to do the things that I would like to.

Ocean Anomaly Projections And Variation


  Again one can use the same functional forms to make predictions for NOAA's ocean anomaly including both hemispheres. The predictions appear to be similar for the near future. The variation of the random portion for the oceans is the closest to a normal distribution of the three anomalies considered so far: land northern hemisphere, land ocean both hemispheres and ocean both hemispheres.









Monday, October 7, 2013

Comparison Of TheCubic Plus Sinusoidal Fit With The Anomaly Data


  The fit for the land ocean temperature anomaly assuming a cubic term plus a simple sinusoidal term has a very small value for the cubic coefficient.



The projection was made with the smoothed 20-year average plus its unsmoothed 10-year end segments. The data is still withing the 3σ error bounds of the projection so it can't be rejected at this time but this is likely to change since the coefficients of the polynomial indicate that the projection will start to rise and then later decrease over time. One could also rate projections on how well they follow the data in order to determine the best method.

Distribution of the Variation in the Land-Ocean Anomaly


  The probability distribution for the variation of the land-ocean anomaly is more uniform and much closer to a normal distribution.



This distribution can be used to decide whether the observed temperatures for the land-ocean anomaly are within expectations for the fit or if the prediction has to be rejected. The 3σ bound would be 0.390 degrees which is close to 1/2 the width of the variation.

Monthly Land-Ocean Anomaly Projection


  One can do a similar fit for NOAA's monthly land-ocean temperature anomaly for both hemispheres. I included some of the unsmoothed 20-year average anomaly in the fit to increase the number of years of data included and so the resulting rms error was a little higher for the best fit. The form for the function chosen was a linear term plus the sinusoidal term and a second harmonic.



It definitely indicates a decrease in the anomaly and a repeat of what happened between 1880 and 1920. Again the damped random walk and the oscillation are inconsequential to long term global warming. The linear term indicates only a 0.52 degree rise per century. It looks like we have reached a turning point in the temperature anomaly.

Saturday, October 5, 2013

Comparison Of The Fit Projections With The Anomaly Data


  Combining both anomaly fit projections from the 20-year average curve with the original land northern hemisphere anomaly data allows us to compare what's currently happening with the projections. Deviation of the projection from the data too far from the center of the data would exclude the projection from consideration.


  The data may already be favoring the fit with the two sinusoidal terms since the fit with the cubic polynomial is starting to deviate from the center of the data. The 5-year average indicated a slight decrease at towards the end. We should have a better idea of what is happening in a decade or so.

  Not everything is of consequence when it comes to long term global warming. Damped random fluctuations would not be of any consequence nor would the sinusoidal oscillations. A linear or cubic term would be of consequence. The linear term appears to be about a 1 degree rise per century.

Supplemental (Oct 6): The two fits may represent the extremes in what can happening with the first being rapid global warming and the second an historical pattern. If the anomaly levels off we may see a repeat of what happened between 1880 and 1920. The anomaly may take a middle path between the two projections indicated if there is a fair amount of "unnatural" global warming taking place. The two fits represent simple solutions but the actual case may prove to be more complicated.

Problems With Fits And Making Anomaly Projections


  A fit works for the time interval for which we have data to work with but fits of comparable accuracy can have quite different projections. I used the previous fit with a functional form consisting of a cubic polynomial plus a simple sinusoidal function and made a projection of the land northern hemisphere anomaly based on it.



When we compare this projection with one with a functional form consisting of a linear term plus the simple sinusoidal function and its second harmonic we get a quite different projection.



The outcome is not at all certain and we are faced with a problem like that of describing what we see in a set of ink blots. The global warming people may be trying to err on the side of caution but are we in control of the situation if we do not actually know what is happening with respect to global warming?

Friday, October 4, 2013

A Surprisingly Good Fit For The Anomaly Using A 20 Year Average


  To reduce the effect of a damped random walk on the smoothed land northern hemisphere anomaly I increased the period for the average to p = 20 years and separated "signal" from "noise." When I smoothed the difference between the anomaly and the 20-year average with another 20-year average and compared it with smoothed anomaly it appeared to be present in the smoothed anomaly. Subtracting the smoothed difference from the smoothed anomaly improved the separation of signal and noise and resulted in a very smooth curve for this adjusted averaged anomaly.


  Assuming a cubic curve plus a sinusoidal curve for the form of the function to be fitted produced a very accurate fit for the adjusted averaged anomaly. The value for t used in the fit was the difference between the time in years and the midpoint of the 20-year average data which was 1946.7917.


 The values for the monthly standard deviations were similar to those found previously.


Using the weighted sum of the monthly probabilities for the yearly distribution of the variable part of the anomaly gave essentially the same distribution as before.


Thursday, October 3, 2013

Autocorrelation of the Anomaly


  A study of the autocorrelation of the difference between the land northern hemisphere anomaly and its 5-year average shows some correlation between one month and the next amounting to about 0.291. This may indicate a damped random walk in the anomaly data. There is also an annual correlation which may be due to the variation in the standard deviation for the months.


Tuesday, October 1, 2013

Anomaly Probability Distributions for the Months and the Year


  We can fit histograms to the monthly data to estimate the standard deviation of a normal distribution for each month and combine these to get a distribution for the entire year. The combined distribution does not appear to be a normal distribution.



  This is why there was a problem fitting a normal distribution to the data for the difference between the land northern hemisphere temperature anomaly and the 5-year average.

Supplemental (Oct 2): For a more accurate distribution for year one can replace the equal weight of 1/12 by multiplying each monthly probability distribution in the sum by a statistical weight equal to the number of days in the month divided by the number of days in the year. Since there is a leap day in Feb roughly every four years one could use 28.25 days for Feb and 365.25 for the number of days in the year. The resulting changes to the yearly distribution are very small.

Variation in the Anomaly by Month


  One can define ΔAnomaly = Anomaly - 5-year Average for those months in which the 5-year Average of the Land Northern Hemisphere Anomaly is known. We can then sort the data by month and year to get a matrix which shows the variation over time. Here are some different views of this matrix.




  The variation changes throughout the year and is least for the month of June and greatest from Dec through Mar. This shows that the weather is most unpredictable during the winter months.