Sunday, January 31, 2010


I should probably point out that I am not writing as an expert on earthquakes in this blog but rather as a "student" of them. The expert speaks for a known science. The student is in search of knowledge while for the expert the facts have already been proven.

It is not at all certain that the probability distribution associated with earthquakes is a Poisson distribution. It seems to fit the observations with a similar peak and trailing edge. The same could be said about a binomial distribution. Even if it is a Poisson distribution there is no guarantee that the total number of earthquakes is constant over time. One point in favor of a Poisson distribution is that whole numbers are involved and the number of events is never negative.

And, I am approaching the limit of what I know or can surmise about earthquakes. I did not know about the peak in magnitude when I started. So I am still learning. The Poisson distribution involves multiplicities but the probability involved may be that something will not happen.

So to avoid babbling I will probably have to take a little more time to think about what I am doing before making new posts.

Probabilities & Rates for Different Time Periods

The probability of a single event over different time intervals was shown not to be linear. One can make estimates of the probabilities by counting the number of events in a given time but there is a problem with measuring small probabilities the events are not likely to be counted. One can get a better estimate of a small probability by observing over a long period of time and then use the formula to convert to a probility over a smaller period of time. It is easier to use the probabilities that the events will not occur in order to do the conversion.

The number of events do not scale in the same way. Suppose we expect on average n=RΔt events in time Δt. The same rate applies to each Δt so for ΔT=mΔt the number expected is N=mn=mRΔt=RΔT. The rate, R, is independent of the size of the period of time involved.
The number of earthquakes of different magnitude which occur in a given period of time can be treated as an urn problem.

Thursday, January 28, 2010

Earthquake Probabilities vs Probability Distributions

There is a potential source of confusion over what we mean by the probability of an earthquake. One can ask what the probability is that an earthquake will fall within a given area, interval of time and magnitude interval. But this probability can be broken down into two parts. The first is the probability that an earthquake of any magnitude will occur and then the probability that it will then fall within a given magnitude range. The second part is associated with the distribution of earthquakes. It is a conditional probability since it assumes that the earthquakes have occurred. It is also a relative measure of the number of earthquakes that will occur in the possible magnitude ranges.

If we assume a Poisson distribution then the expected number of all earthquakes for the intervals involved would be some factor N0 = R0 ΔT p(M, ΔM) or,

where k=M/ΔM. More generally, one can assign intervals for the numbers themselves.

Whatever the intervals involved, one can consider a particular partition of the intervals and ask the probability or rate at which earthquakes will fall within the partition. One can also consider partitions of different sizes and compare rates and probabilities. For example, if the probability that an earthquake will occur in an interval of time of width Δt is rΔt one can ask the probability that it will occur in some larger interval ΔT. The probability that it will occur in the first Δt subinterval of ΔT is just p. For the second Δt the probability is the probability that it will not occur in the first interval, 1-p, times the probability that it will occur in the second, also p, or (1-p)p. For the third interval we have (1-p)(1-p)p and so on for all of ΔT. This is a partial sum of a geometric series and the total is the difference between two infinite sums.

The subintervals of time do not combine linearly. The total probability, P, is not a simple sum. And, since 1-p is less than one, for large values of m the total probability will be very near to 1 and the circumstances are likely to happen. So we have verified a form of Murphy's Law that if something can possibly go wrong it will.

Wednesday, January 27, 2010

Peak Magnitude for Recent CA-NV Earthquakes (corrected)

I rechecked my calculations today after noticing that an M 3.4 earthquake appeared to be missing and made some changes to the program which optimized the fit. I got a slightly better result and found that the best fit was for a peak magnitude of M 1.08 which is surprisingly close to the earlier prediction based on global earthquakes. The M3.4 error was a result of using the wrong magnitudes for the data in the plot.

At first glance it seems that there are many more points of the histogram below the fitted curve than above it but it should be remembered that the number of earthquakes associated with each point is N_k = 10^logN_k.

(Apparently, I am not at my best at 2 am in the morning.)

Peak Magnitude for Recent CA & NV Earthquakes

I have just completed a Poisson fit for the recent California and Nevada earthquakes and the peak Mw is 0.90 which confirms an earlier estimate based on global earthquakes. The fit includes the earthquakes which occurred in both states during the last week. The range in magnitudes was from about M 0.0 to M3.5 and the sample included 533 earthquakes.

(edit: The rms error indicated was based on the deviations for ln(N) so the rms error for log(N) is 0.163.)

Monday, January 25, 2010

Earthquake Probability Distributions

There is a linear fit for the logarithm of number of earthquakes vs magnitude in Modern Global Seismology by Lay and Wallace. Their "b value" is related but not the same as the "base" in the empirical formula for my fit.

If the number of earthquakes in an interval is proportional to b^(-M) as in the empirical linear fit the probability distribution will be proportional to (B-1)B^(-k) where B=b^ΔM (my b equal to 10^s, where s is the absolute value of the slope for log(N) vs M) and k is the number of the histogram interval starting with k=1. Note that k=(M-M_0)/ΔM. This distribution is based of the relative number of events in each interval and is vaguely similar to a Poisson distribution which led me to attempt a fit for that too.

The Poisson distribution is often used in situations where the occurrence of the events is statistically independent but with each event having the same probability of occurrence and one needs the probability of a number of events occurring simultaneously. One can also ask what the probability is for a number of sections of a fault failing at the same time. If there is a key section then its failure could be responsible for the failure of a number of others with the actual number involved depending on the circumstances. The factor k! in the denominator is the number of ways in which the same multiple event can occur. There is only one multiple event for all combinations of the individual events. But how do we explain that a unit step in Mw corresponds to a factor of 30.1 in energy? Mw seems more closely related to the probability of failure while on the other hand M0 is a better measure of energy released. So we have to ask if the probability of a fault failing proportional the area involved. The answer depends on the strength of materials.

One can reject a probability distribution if the data falls too far outside the 3σ bounds where σ is the standard deviation of the expected value. If earthquakes with magnitudes less than M 4.5 were included in the global earthquake data one could more easily tell which distribution was the better fit.

(note on notation: _ is often used to indicate a subscript and ^ a superscript. The square of a can be written as a^2.)

A Poisson Fit to 2009 Earthquake Data

The histogram containing the number of earthquakes from the beginning of 2009 to mid January 2010 fall primarily within the 3σ error bounds of a Poisson Distribution as seen in the plot below. The red dots are the number of earthquakes for each M 0.1 interval. The magnitude assigned to an interval was the magnitude of its lowest bound. Logarithms were used to facilitate the fit which was complicated by the fact that k_0 was difficult to determine and the peak magnitude was also needed. The method of least squares was used to determine the values of M_peak, N_0, λ, and k_0. Each data point was weighted by the number of earthquakes in the interval.

The equation for N(k) is the Poisson Distribution. Stirling's Formula was used to simplify the computations. The deviations of the data points from the solid blue line appear to statistical in nature. The largest earthquakes were omitted because of their low probabilities. The fit indicates the the peak magnitude is at approximately M 1.175.

(edit: In the plot what was referred to as the variance (Var) was not the average but the weighted sum of the square of the deviations of the natural logarithms of the number of earthquakes in each interval. It is the sum of the square of the errors for all the events which was the function that was optimized in the fit.)

Thursday, January 21, 2010

M+ Function and Consistency

The "hist" function that I used for the interval numbers included the lower bound of an interval in the count. So M6.0 would be included in the interval M6.0 - M6.5 but M6.5 would not be included. One has to be consistent in defining intervals for fits and doing histograms but the differential rate, R0, is independent on how the intervals are specified.

On can find the number of events greater than or equal to a given magnitude by setting ΔM equal to infinity in the first set of formulas in the last blog which would yield an "M+" function. The number in each interval would be the difference of the function values for its bounds. The differential rate is more useful since it allows one to use different time periods.

An Extension of the Empirical Formula

The formula for the expected number of earthquakes in the histogram intervals can be extended to other cases. The simple rate law assumed below can be used to derive a general expression for the proportional factor, N0, if the time period, ΔT, and the interval width, ΔM, are known. The value for R0 can found if the other values are known for one histogram.

R0 = 2.033 x 10^8 per year assuming the interval if from M to M+ΔM.

If instead the interval is from M-ΔM/2 to M+ΔM/2, i. e., M is a central value, the following formulas should be used and again R0 = 2.033 x 10^8 per year.

The fit in the last blog assumed that M was a central value so the second formula is required to find R0.

Wednesday, January 20, 2010

There are two populations in the data

The data in the earthquake plots does indeed contain two populations. M2.5+ for the US and M4.5 + for the rest of the world. There seems to have been a lull, a period with fewer earthquakes, at the end of 2009 and in the beginning of 2010. I pieced together some data for different sources the plot looks a little strange. The earthquakes below M 4.0 occurred in the US while those above this magnitude are from mainly outside the US.

I did a fit for the histogram based on the data above for the period from the beginning of 2009 to the present and when logs of the counts are used the curve is fairly linear. The fit yields an empirical equation for the annual number of earthquakes in M 0.25 intervals.

The numbers can be seen in the table below. The first two columns specify the width of the interval, the third column is the expected number of earthquakes in the interval per year and the last column is the expected number of days between earthquakes in the given interval. There are very few M 7+ earthquakes in any given year.

Tuesday, January 19, 2010

Missing Magnitude 4 Earthquakes

The statistics for earthquakes for the whole world in the last week give a rough impression of what is happening. There is a relative absence of earthquakes about magnitude 4.0. It seems to be indicative of two populations at the present time.

Another way of looking at the data is with a histogram which plots counts for equal intervals of magnitude. More data might give improved estimates of the probabilites of an earthquake of a given magnitude. That's one reason for studying all the earthquakes at this time. The accuracy of results depends on the number of events in a particular interval so one can't really trust the curve at higher magnitudes. There is no doubt that there are fewer 4.0s present.

Monday, January 18, 2010

Recent Global Seismic Activity

Everyone has been saddened by the recent earthquake in Haiti and the subsequent loss of life. Earthquakes are extremely unpredictable and there is not much in the way of a warning as the data for the last week indicates. If one adds up the energy released by all the earthquakes worldwide during an interval of time and converts this sum into a magnitude one gets a plot similar to the one below. Time is given in days from 00:00 hours on 1 Jan 2010 UTC. The highest dot indicates the time at which the Haiti earthquake occurred. The horizontal gap in the data at about 2.0 seems to be due to a gap in the magnitude of earthquakes at about M4.0. Seismic activity seems to switch off and on. It may be that the 4s are blocked with any stress just being absorbed or they are promoted by a rupture to a higher magnitude. It acts like stress is accumulated and the faults suddenly break down. The crackling of an electric discharge caused by the breakdown of air seems to be just as volatile. Compare for example the energy released by bolts of lightning during a thunder storm.

One can smooth the data by averaging over time to give a plot similar to the following. The boxy nature of the plot is due to major events contributing to the averaging time of 1/5th of a day. The blue dots are the major earthquakes that occurred in the last week and give an impression of what the peak magnitudes are. The average itself seems to be unstable.

( the files are mislabeled. the values are the magnitudes of the sums, i.e., magΣE(M).)

Friday, January 8, 2010

Eddington on the Effect of Gravity on Light

In a popular book discussing Relativity from the early 20s Eddington points out that the bending of light by the Sun is similar to that by a refracting medium. One wonders if any attempts were made to combine gravity and wave theory at that time.

"The wave-motion in a ray of light can be compared to a succession of long straight waves rolling onward in the sea. If the motion of the waves is slower at one end that the other, the whole wave-front must gradually slew round, and the direction in which it is rolling must change. In the sea this happens when one end of the wave reaches shallow water before the other, because the speed in shallow water is slower. It is well known that this causes waves proceeding diagonally across a bay to slew round and come in parallel to the shore; the advanced end is delayed in the shallow water and waits for the other. In the same way when the light waves pass near the sun, the end nearest the sun has the smaller velocity and the wave-front slews round; thus the course of the waves is bent.

"Light moves more slowly in a material medium than in vacuum, the velocity being inversely proportional to the refractive index of the medium. The phenomenon of refraction is in fact caused by a slewing of the wave-front in passing into a region of smaller velocity. We can thus imitate the gravitational effect on light precisely, if we imagine the space round the sun filled with a refracting medium which gives the appropriate velocity of light."

-Sir Arthur Stanley Eddington, Space, Time and Gravitation, 1921