httprover's 2nd blog: February 2010

Thursday, February 25, 2010

Poseidon, Sisyphus & Existentialism

It might help to take a little pause and look at look at seismology from a classical perspective. The Greek god of earthquakes was Poseidon who caused earthquakes by striking the ground with his trident. He was also responsible for the creation of tsunamis which caused floods. He was brother to Hades, who ruled the underworld, and Zeus, the Sky-Father who held power over the the weather and perhaps glory too. All three shared dominion over the Earth and were often extremely volatile. They did not tolerate deceit well as can be seen by the fate of Sisyphus who for his hubris was required to perform the ceaseless task of pushing a rock up a hill in Hades only to have it slide back down on him.

The Existentialist view is that life can often be absurd and is exemplified by Camus' The Myth of Sisyphus. Camus' conclusion is that although we face trials but must "see Sisyphus happy." The lesson is that we must learn to march to the beat of a different drummer.

Wednesday, February 24, 2010

Earthquake Thermodynamics?

The fits that were obtained for both global earthquakes and those of Southern California both indicated peaks at about M 1.1. In thermodynamics if two bodies had the same peak in their radiation spectrum we would consider them to be at the same temperature and it they were in contact with each other we would conclude that they were in thermal equilibrium. So we need to consider the possibility that the earthquake faults are in some kind of thermal equilibrium. This would imply that the faults can exchange energy in some manner.

With respect to the unexpected deviations from the distributions for Southern California it may also be that there is some coupling among the faults or the statistics may be affected to some extent by absorption by the medium. The seismologists make assumptions about the nature of the location of the earthquake, that of the location of the seismograph and the path connecting them. In optics the medium can absorb some of the radiation and we think of it as a filter. The same may be true with the Earth acting as a filter. It is less likely that a discrete spectrum is present in the earthquake data.

Tuesday, February 23, 2010

Are the Data and the Fit Holographic?

One has to ask if the data is holographic, that is, does it fit a single curve and follow a single rule? This would allow us not only to extrapolate the fit beyond the existing data but also to make predictions about the future. As with optical holograms, a part of the hologram contains information about the entire image. As a result, the fit may be useful outside the range of magnitudes for which there is data. We may be able to see "over the horizon" but the curve for the data would have to be continuous with no jumps in it.

One of the problems with the fit that was obtained is that the deviations of the histogram points from the fit curve are greater than expected. Near the peak they are way beyond the 3σ bounds. This indicates that the fluctuations are probably not statistical in nature. One has to consider the possibility that there is some correlation among events in the occurrance of earthquakes. We can no longer assume that they are statistically independent events. Among the possibilities are a common cause, aftershocks following a major earthquake or an earthquake at one location may stimulate earthquakes elsewhere. Is there an earthquake equivalent of a laser? If so, we should probably be more concerned about our neighbors.

One item that I forgot to mention is that the peak for the fit is at M 1.086. I also need to add a correction or maybe just a clarification concerning the failure of the "equilibrium distribution." The assumption that the distribution we were looking for was connected with the equilibrium distribution of the mechanism failed. Each mechanism and its transition matrix has an equilibrium solution. We can't exclude mechanisms just because they have equilibrium solutions. It is more likely that the simple feedback mechanism considered failed to give a distribution of the correct shape with a peak near M 1.1*.

*edit: It was possible to find a g function which produced a peak for an earthquake distribution but it was not where g = 1 which was a contradiction with the initial assumptions.

Monday, February 22, 2010

A Simplified Fit for the Earthquake Distribution

I wrote a new routine to improve on a fit for the gain function and adopted a form for g that would guarantee a linear slope for the log of the number of earthquakes at the larger magnitudes. The rms error for the fit is 0.105546 which was only a slight improvement on the fit done by eye assuming a polynomial form for the log of g. The new empirical formula for g was suggested by the previous fit. One would also expect a continuation of the linear slope for larger magnitudes from what we know about the number of global earthquakes above M 5.0.

Sunday, February 21, 2010

Assumptions for the Markov Process

I started working on fitting a Markov process for an earthquake mechanism about two weeks ago and derived the formulas relating the number of states, N, the number of earthquakes, E, the gain function, G, and the fractional rates, r and f, as shown in the image below. The first line includes an assumption that the distribution of states is unaffected by the action of the transition matrix. It is an assumption that there is a stable distribution for the states which is more likely to hold in a long term average.

The most recent fit shows that the assumption of a g function provides a useful method in doing a curve fit if no formula is known for the distribution. The failure of an equilibrium distribution makes the assumptions in the formulas for r and f less certain and casts doubt on the assumed Markov process giving a complete picture of what is actually happening. If a distribution function exists this needs to be verified over time. If it doesn't exist then the implication would be that the conditions are rather changable and current conditions would not be a good indicator of future events. Whatever the mechanism behind earthquakes, it is a hidden mechanism and the occupation numbers of the states are less certain than the actual number of earthquakes.

Saturday, February 20, 2010

Some Hindsight on the Assumption of an Equilibrium Distribution

Although the assumption of an equilibrium distribution for the transition matrix led a good fit to the data it ultimately failed because its gain function did not agree with the assumed gain function. I am willing to admit that it was a blunder but it also points out the need to be skeptical about what one is doing. Sleep deprivation can make someone more error prone and takes its toll on results.

Curve fits can be difficult since there is often more than one minimum and a number of them close together can interfer with each other and prevent a computer program from converging to a solution even if one has an approximate solution. Even if one can find a local minimum that does not guarantee that the minimum is a global minimum, that is, the lowest of all minimums. So there may be competing mechanisms.

Conclusion: Equilibrium Distribution = FAIL

A New Fit for the Southern California Earthquake Data

I was able to find a approximate fit for the Southern California earthquake data but had to fit by eye again. This time I assumed that log(g) was a polynomial and used the values of g computed from the polynomial to compute the distribution for the expected number of earthquakes to avoid repeating my mistake. The program that I wrote to improve on the fit didn't work this time. The feedback mechanism appears to work.

The gain function is what one would expect being greater than one to the left of the peak and less than one on the right where the distribution steadily decreases. Where the slope of the plot of logN approximately a constant the ratio of the consecutive number of earthquakes would be the same and one would expect g to be a flat line for these magnitudes. In the plot below the solid line represents the g values computed from the polynomial and the points are the computed ratios for the consecutive number of earthquakes and are a check on the results.

The fraction of states, r, for which there is an earthquake is 1/2 at the peak as one would expect since g = 1 at the peak. r(1+g)=1. I borrowed the term gain for consecutive ratios from the analogy with the ratio of the input and output of a feedback amplifier. An amplifier is stable if the gain is not -1 which is the case for earthquakes being whole numbers. So the earthquake mechanism is probably stable too. One would expect any fluctuations to be damped out.

Friday, February 19, 2010

A Review of the Earthquake Mechanism's Fit to the Observed Earthquake Numbers

I should probably attempt to clarify the interrelations between the quantities involved in the earthquake mechanism. Of primary importance is the number of earthquakes that occur in a given time. There is also the number occupying each state which needs to be determined. For each set of numbers there is a distribution and therefore a set of ratios of consecutive terms, g, associated with it which was referred to as the gain or g function. The g function searched for in the fit was for the expected number of earthquakes* and so was unknown. The valuses of g do not contain enough information to determine the actual numbers observed but are more closely related to the relative probabilities of the earthquakes*. To determine the expected values for each interval of the histogram one needs to know the total number of earthquakes as well as the g values. From the g values one can determine the fraction of earthquakes, r, that will happen for each histogram interval and the fraction, f, that will not happen. From r and f one gets the transition matrix which in turn gives the relative probability of an earthquake being in a particular interval which is assumed to be the distribution which is not changed under multiplication by the transition matrix. Multiplying the relative probabilities of the states by the r values and the total number of earthquakes gives the expected number of earthquakes for each interval. The expected number of earthquakes is then compared with the observed distribution and the search for the coefficients of the log-log g polynominal is performed to minimize the root mean square error between the two distributions. The result is the fit for the observed number of earthquakes.

For different intervals of time one can estimate the total number of earthquakes in a particular interval as N = R ΔT. This number is then multiplied by the relative probabilities to estimate the number of earthquakes for each histogram interval.

*edit The g function was assumed to be for the number of earthquakes. The appears to be a discrepancy between the fit g function and the ratio of consecutive number of expected earthquakes. They should agree and I don't know what went wrong. The fit may have compensated for an error so one cannot assume that the earthquake mechanism works properly.

Wednesday, February 17, 2010

Improved Fit for Southern California Earthquake Data (2000-2009)

I was successful in automating the fit process and obtained an improved fit for the SCEDC earthquakes of the last decade. The fit involved a search for the coefficients of log-log g and this time I used the magnitudes of the histogram intervals as the variable in the polynomial for the fit to avoid the large numbers that result from using k. The function minimized was the rms error of log-log g and the histogram points were unweighted. The fit shows some curvature above M 4.5 and thinking that this was an end effect I extrapolated g from M 4.0 but there was not any noticable change in the results.

The new fit shows higher values for the earthquake rates above the peak in r.

edit: From the occupancy of states point of view the model indicates that more M5 earthquakes occurring means that there fewer dormant M6+ earthquakes waiting to happen. So higher rates may bode better for the future.

Tuesday, February 16, 2010

Southern California Earthquake Data Fit for 2000 - 2009

During the last couple of days I was able to download some earthquake data for the last decade from the Southern California Earthquake Data Center and obtained a better fit. With more data the errors tend to cancel out more and are relatively smaller. It was found that the gain function could be treated as a simple polynomial with g on a log-log scale. I did the fit by trial and error and later could not get my computer to find a better fit for some reason. A solution was computed for a set of coefficients for the polynomial and fitted by eye to minimize the deviation from the data. The mechanism works fairly well for Southern California.

The earthquake rates for the Markov process of the assumed mechanism indicate a central peak and are relatively smaller at both large and small magnitudes.

The data was collected by the Southern California Seismic Network which is a colaborative effort by Caltech and the USGS.

Friday, February 12, 2010

WISE NEO and filtering images

The WISE infrared telescope was launched on Dec 14, 2009 into Earth orbit and its mission is to find relatively cool objects which include asteroids and comets. Wise recently announced the discovery of a NEO and a comet. It is sometimes difficult to see these objects among a field of stars as in the WISE images but it is not difficult to design a filter to select the objects of interest. A typical color image consists of separate red, green and blue images. The image below was filtered by selecting pixels with some red and in addition 10 times the amount of blue was less than the red. If one looks closely one can see a large number of faint red objects. The bright object in the center of the image is the NEO.

Tuesday, February 9, 2010

A Mechanism Fitting Earthquake Activity with a Peak

It is possible to design a crude feedback mechanism that gives a good good fit to the earthquake data. The "Poisson fit" to the 2009 earthquake data was used to smooth the data and used to deduce an approximate "gain" function, g_k, for the mechanism. The gain function is the ratio of consecutive occupation numbers for the intervals of the histogram. Knowing the gain function one can compute the fraction of numbers in the states, r, which will result in an earthquake or the fraction, f, which will pass on to the next state.

The values of r and f are all that is needed to describe the mechanism and are the components of the transition matrix. The transition matrix can be used to find the equilibrium values for the occupancy of the states. These values are unchanged when multiplied by the transition matrix. To get the number of earthquakes in each magnitude interval one multiplies the equilibrium values by r_k. In the plot below logN is the logarithm of the number of earthquakes observed and logN_fit is the logarithm of the number computed from the equilibrium values. Note that this mechanism does have a peak unlike the linear fit found previously. The Poisson distribution does not appear to be an equilibrium distribution. k is the value of k used in the Poisson fit and is approximately 140 + M/ΔM with ΔM = 0.1. The peak is at M 1.1 (k = 151) and there are 10 intervals per M 1.0 step.

The gain function equals 1 at the peak and is a steadily decreasing function of k. The values of r and f indicate that the fraction of states experiencing a earthquake decreases with magnitude.

This model is crude but gives a good fit to the data. A more complicated model would allow for transitions to intermediate states instead of completely resetting when an earthquake occurs.

Saturday, February 6, 2010

What Went Wrong with the Poisson Fit?

The Poisson fit resulted with too large a value for λ. How can we explain this result? Let's look at what happens if we try to fit just two points, say, a peak where the slope (s) is 0 at M 1.1 and a slope of -1 at M 5.0. One can derive an approximate formula for the slope. The unknowns are λ and k_0 which can be determined by two points on the curve. The peak tells us that λ approximately equals k_pk. The second point gives us another fix and the intersection of the two lines is the solution we seek. There is only one solution and we can surmise that the problem is that the shape of the distribution is not consistent with the interpretation of λ as a probability.

The derivation of the Poisson distribution indicated that P_k =P^k/k!. If we compare the ratio of the next P_k to P_k we get λ/(k+1). The ratio of succeeding numbers in the fit is the same. This ratio is not constant but steadily decreases with k (or M). At the peak the ratio is 1 and prior to that it is greater than 1 which explains the growth in numbers. So the Poisson fit suggests a mechanism in which there are changes in what happens for each state of the Markov process. In the simple mechanism considered the ratio between succeeding states was 1-r. If r is a fraction this the ratio does not exceed 1 and so there is no peak. Any mechanism would also have to explain the magnitudes as well as how the number of earthquakes depend on the occupancy of the states. Perhaps a delay in advancement or some blockage in the advancement would give a more realistic fit.

The computed distribution gave equilibrium values. Blockage at some magnitude would prevent advancement from one state to the next and one would expect the occupancy of the states below the point of blockage to increase relative to the equilibrium values and those above to decrease somewhat. One needs to consider processes which might compete with the occurrance of earthquakes. The goal is a better explaination of the process behind earthquakes based on evidence. It is a search for knowledge.

Thursday, February 4, 2010

A Simple Markov Process for Earthquakes

There is a simple Markov process which agrees very well with the linear fit for the 2009 earthquake data. The states in the transition diagram below are labeled 1 through 3. In a given interval of time a fraction of the states, r, will produce earthquakes while the remainder will undergo transition to the next higher state. The exceptions are the initial and final states. The first state will return to itself while all of the last will return to the original state. The transition matirx for the process is T. If one knows the numbers for each state one can then calculate what the changes will be.

The length of the chain can be extended to any number of states. If one chooses 15 states to correspond to the histogram intervals in the table computed for the fit to the 2009 earthquakes then the number of earthquakes for the given interval of time will be r times the number occupying each state for all except the last. The numbers agree surprisingly well for such a simple model. The model is not sophisticated enough to explain a peak in the earthquake distribution.

A Problem with the Poisson Distribution.

The λ in the Poisson distribution is the probability that a single event will be observed in a given time interval. Its values range from 0 to 1. In the earthquake fits the values of λ were 151.227 for the global earthquakes and 43.412 for the California earthquakes which fall outside the 0 to 1 range. One can't simply rescale the magnitudes to reduce the λs. A normal distribution is similar in form but more scalable. Replacing λ with r Δt or r ΔM would give a larger value but that runs contrary to the assumption that the probability is a small value so the results are still doubtful.

Conclusion: Poisson = FAIL

Tuesday, February 2, 2010

Multiple Events from a Single Source

Since an event can occur anywhere within the observing interval it is possible that multiple events will be seen from the same source. Let P be the probability of a single event being observed within the interval and P_k be the probability of observing k events. The P_k are easily calculated in nD, a space with arbitrary number of dimensions. For P_k imagine an space of k dimensions in which the time of observation of each of the k events is represented by an axis. The first observation occurs at t_1, the second at t_2 > t_1, the third at t_3 > t_2, etc. Using the continuous distribution given in terms of the probability for a single observation (P) the P_k can be represented by a multiple integration. Each time the probability that the event happens at time t_k is the probability that it doesn't happen before it times the probability that it happens in a small interval of time Δt. We end up integrating higher powers of the same funtion over and over again and can deduce a general expression for P_k.

It is easily seen that the probability distribution for multiple observations is the Poisson distribution.

The P_k can be used to determine the expected number of observations from a single source. The number will range from 0 to e, 2.71828... .

The expected number of events is an increasing function of P. It is not a linear function of the probability. So it is possible that more than one earthquake will happen on the same fault in a given interval if they occur at random. Like lightning don't count on it not happening in the same place twice.

edit (3 Feb): The regions of integration for P_k are "triangular" sections of kD polytopes which are extentions of the square and cube. The P_k are the extentions of the area of a triangle (1/2 x altitude x base) and the volume of the triangular pyramid (1/3 x altitude x base), i. e., 1/k x altitude x base in kD. Each base is previous P_k. The origin of the axes and the points on the the axes at P(ΔT) mark the boundaries of an equivalent section and all the points on the lines joining these extreme points are within the equivalent region of integration.

Continuous Probability for a Single Event

Time is a continuous variable so in general one needs a continuous probability distribution to determine the probability that some event will occur prior to a given time if an infinitesimal rate, r, for the probability is known. It can also be shown that the probability that a single event will occur in a time interval, dt, at time t is dP=Q r dt where Q=1-P is the probability that it will not occur.

A plot helps visualize the continuous distribution (solid red curve) and its relation to the infinitesimal rate. In the plot it was assumed that r=0.5 (per unit of time). The infinitesimal rate is just the slope of the continuous probability distribution at t=0 (dotted blue line). This is the justification for using Δp = r Δt as an estimate of the probability for a small interval dt. The probability (dP) that an event will occur after a long period of time decreases because it probably already has happened. The distribution is determined by the infinitesimal rate and the average time for an occurance is 1/r. It should be noted that this is the probability that just one event will occur before time t (or in a time interval of width t).