Wednesday, July 31, 2019

Update on Recent Ridgecrest Area Earthquake Activity


  The pattern of aftershocks of the M7.1 Ridgecrest earthquake appears to be continuing. If one uses the standard deviation of the activity A to estimate error bounds on the daily number of earthquakes one gets a good fit to the data except for the time of the major earthquake.


The daily peak magnitudes continue to steadily decrease.


If the trend continues one would expect the number of earthquakes above M2.5 to range from about 3 to 11 and the peak magnitudes to range from about M2 to M4 over the next few days. But we are approaching similar stress index (SI) values to that of the time of the M7.1 earthquake and the same is true for a supermoon index (SMI) for the coincidence of lunar perigee and the alignment the Moon with the Sun over the next few days.



Anything above M4 would be unlikely based on the statistics from the series of aftershocks.

Friday, July 26, 2019

Update on the Latest Earthquakes in the Ridgecrest Area


  There was a M4.7 earthquake at the beginning of UTC doy 207 that one might consider unusual since it was about 3 standard deviations from the most likely number of earthquake and the count for the day is not complete. It is difficult to characterize this since the some other source of deviation in the magnitudes other than random error.


The peak magnitude of this earthquake also appears to be unusual since it is greater than 2 standard deviations from the most likely magnitudes.


It is also within 2 standard deviations of the of the most likely peak magnitudes of the stress index correlation curve.


It looks like the faults in the area are having a little trouble deciding which way they want to go. It should be noted that one might be able to improve on the stress index to get better agreement with observations. There still seems to be problems with it and its interpretation.

note: I seem to have gotten the doy of year wrong. What was previously posted was the number of elapsed days since the beginning of 2019. I've now corrected to the usual doy.

Tuesday, July 23, 2019

Why We Still Need to Monitor the Ridgecrest Earthquakes


  A error in one of the averages cropped up as a result of adding data to existing tables. The result is that correlation for the activity and the daily tallies of earthquakes agrees slightly better with the observed values.


The formulas for the correlation of activity with time suggest some sort of relaxation phenomenon taking place.


The stress index had to be eliminated as a good indicator of daily earthquake activity but it still might play a role in indicating peak earthquake magnitudes. There is a lot of variation in the daily peak magnitudes so it is difficult to rule out the possibility. This morning's M4.13 earthquake in the Ridgecrest area points out that the peak magnitudes need to be watched.


The M7.1 earthquake appears to be somewhat exceptional for the area since its deviation from the SI peak magnitude is greater than two standard deviations.

Edit (Jul 23): Miscalculated the 2σ error bounds. Used the std dev of pmagobs when I should have used that for the difference of the two pmags. Replaced the pmag plot with a corrected version. The M7.1 earthquake appears to be rather unusual.

Supplemental (Jul 24): An alternative hypothesis is the peak magnitudes are part of the population of aftershocks and are declining exponentially. If so, one would conclude that yesterday's M4.13 earthquake was within expectations.


Monday, July 22, 2019

Rejection of the SI Hypothesis and Provisional Acceptance of the Aftershock Hypothesis


  On UTC doy 202 there were just 8 earthquakes in the Ridgecrest area data from the USGS earthquake catalog. This appears to be a rare event for the stress index hypothesis that it is an accurate indicator of general earthquake activity.


The observed counts still appear to be following the aftershock hypothesis so it is provisionally accepted. Note that initially there were significant deviations from this hypothesis.

Sunday, July 21, 2019

An Alternative Hypothesis for Recent Earthquake Activity in the Ridgecrest Area


  Let's say our first hypothesis is that the stress index SI is an accurate indicator of earthquake activity, A, and an alternative hypothesis is that the activity is determined by the elapsed time from that of the M7.1 earthquake. We find that there is a negative correlation although it is quite good because its magnitude is close to unity. The correlations for the primitive hypotheses are approximately 0.9 for the same interval of time which excludes the M7.1 earthquake.


But the correlation is related to the slope of a straight line drawn through the data from which we can estimate the activity and the number of earthquakes since the means from which the correlation was determined are known. When we compare the alternative hypothesis with the first hypothesis we find that the observed number of earthquakes is within about two standard deviations of both curves so we don't have a valid reason for rejecting either hypothesis. We are still in suspense about their validity.


In a few days the two hypotheses will become mutually exclusive so it's likely one will have to be rejected. The alternative hypothesis can be interpreted as all the earthquakes since the M7.1 are aftershocks.

Supplemental (Jul 21): Note that the correlation is the slope in standardized coordinates. The black diagonal lines are Astd=±tstd.


Supplemental (Jul 21): Today's search of the USGS earthquake catalog of the Ridgecrest area gave 14 earthquakes for UTC doy 201. This is slightly outside the 2σ bounds but the σ used is for a normal distribution and equal to n̄. The bounds are likely to be slightly larger since for smaller counts the spread is larger than that of a normal distribution. The number of earthquakes is probably inside 3σ bounds so at the risk of being over cautious we are still reluctant to reject the first hypothesis.

Saturday, July 20, 2019

A Factor Analysis of Earthquake Activity in the Ridgecrest Area


  The combined factors or stress index used in the last blog didn't correlate well with the earthquake activity defined as the base 10 log of the number of earthquakes in a day.


Modifying the angle factors by adding one and dividing the result by two worked better.


One can use the correlation coefficient to estimate the number of earthquakes per day for a given stress index, SI. When we compare this estimate with the earthquake histogram we get a fairly good agreement for the background activity.


There is a deficit of earthquakes before the M7.1 earthquake and a surplus afterwards that may be associated with aftershocks. The new stress index lacks the peaks for when Moon's declination is negative.


The definition of the modified factors is as follows.


The stress index suggests that we might start to see a noticeable increase in earthquake activity in the next few days. The correlation coefficients for the individual factors with the earthquake activity have values similar to that of the SI.

Thursday, July 18, 2019

Forming Hypotheses About the Chances of an Earthquake


  One can study the USGS Ridgecrest area earthquake data and try to come up with some hypotheses about factors indicating the likelihood of the occurrence of an earthquake. The horizontal axis is the day of the year (doy).




When we compare this data with JPL's Horizons Lunar data we encounter some factors that seem to correlate well with the occurrence of the M6.4 and M7.1 and we get a number of coincidental events such as the relative force acting on the Earth's surface, the declination of the Moon and the Moon-Earth-Sun angle (∠M-E-S). The ratio of the average range of the Moon and its value at some time can be represent by the dimensionless factor φ and ϕ² is a measure of the relative force acting on a portion of the Earth's surface. The relative force was near a maximum at the time of the two earthquakes.



The sine of the Moon's declination and the cosine of M-E-S angle were also near maximums.



The product of the first and third factors might be used to indicate the lunar perigee and syzygy alignment which tells us when there will be a Supermoon. The product of the first and second is a measure of the relative torque acting on the Earth's equatorial bulge. If we combine all three factors we also get a good agreement of their maximum with the time of the two earthquakes.


If we look at this indicator over the entire year we see that it was near a global maximum at the times of the two earthquakes.


It will be interesting to see what the plots of the earthquake data look like over the next couple of months or so.

Supplemental (Jul 19): Replaced the histogram above with one that included some missing days at the end. One can modify the definitions of the indices to let them represent relative quantities better such as replacing sin(dec) with μ=sin(dec)/sin(ι) where ι=23.473 deg is the inclination of the ecliptic. This makes the combined indicator a little more meaningful.


The doy of the indicator peaks can be converted into calendar dates.


It should be noted that one cannot accurately evaluate the indices based on a small sample of data over a short period of time. The peaks of the indices are likely to diverge as the time increases since there are different periods associated with each index. We may be able to determine which of the indices best match the pattern of earthquake activity over time.

Wednesday, July 17, 2019

Using Macros to Assist Plotting USGS Earthquake Data


  The macros that one uses in worksheets can be quite sophisticated. One can use them to record one's actions to simplify doing a complicated repetitive task. For example, one can search the USGS earthquake catalog for the earthquakes that occurred in a specified region in given time span and save the results in a .csv which Excel can open. The macros need to be saved in an otherwise blank worksheet and to which the .csv data can be copied and pasted to. The first macro was designed for eliminating unneeded files and converting the date and time of an earthquake to decimal UTC day of year (doy). The next macro does the data needed for the histogram. The blank worksheet also contains macros to plot earthquake magnitudes vs doy and one to plot the histogram data.


Wednesday, July 10, 2019

Recording and Modifying Macros in Excel


  One can record macros in Excel to perform repetitive tasks. If you've never done this before you may need to look up "record macro" in Help. To record a macro activate the Developer tab and go to the code section. When one clicks on "Record Macro" one is asked to supply a name and complete a shortcut by adding a letter that will help activate it. Here is the VBA code for a recorded macro that copies a range of numbers and pastes the values into another column.


The " _" at the end of the 4th line indicates that the line is continued on the next line. The update macro used to repeat the copy and paste values operation and increment the tallies a given number of times uses VBA code to modify a similarly recorded macro that was give the name "update."


A macro that will zero the tallies is also useful.


If you've done everything correctly this is the what one can do.


Monday, July 8, 2019

A More Detailed Look at the t-test for Averages


  I did the last blog more carefully and got significantly improved statistics for the relative error of the test results. I also cleaned up the notation somewhat. A script "l," ℓ, is now used for the level of significance of 0.05. Subscripts are used to distinguish the various t-values. The spread of the t̃σ only depends on the spread of the x̄, the average value of x, since σ is a constant and can be computed from the t-values shown in the following plot.


The t-value t̃s also depends of the spread of the estimated standard deviation s which makes it broader so more values are rejected. The two hypotheses tested were H1: |t̃σ|≤t and H2: |t̃s|≤t which are either accepted (A) or rejected (R). The level of significance determines the t which is used in the hypotheses.


A FOR loop was added to the update macro to permit the collection of data for a fixed number of trials so one did not have to hold down the keys for the macro shortcut. It transferred the values of the set of random numbers the x column and did the updates for the decision tallies. Each pair of binary decisions for the tests can be represented by a pair of letters consisting of the letter designating the H1 decision followed by that designating the H2 decision.


The formulas used are contained in the following figure. Using n-1 for the estimated standard deviation s gives better results since dividing by n for the rms error is not quite correct.


With the FOR loop I was able to increase the number of trials to 250,000. The calculation took about 3½ hrs. The changes gave surprisingly good results for the rates. The rejection rate for hypothesis H2 involving the estimated standard deviation s agreed very well with the 5% level of significance.


Saturday, July 6, 2019

The t-test for Averages


  One can use the Excel t-test functions to check a sample of data to see if it is representative of a given population with known mean and standard deviation. Let's say we have a sample drawn from a population with a normal distribution of errors and with mean μ=0 and standard deviation σ=20 for which we want to check the average x̄ using a significance level SL=0.05.


For the data above we get x̄=-5.7149 and an estimated standard deviation s=18.2921. The estimate, s, is just the rms error for the sample. The t-value for the sample computed using σ is t̃(σ)=n̄|x̄-μ|/σ and if σ is unknown we can use its estimate s to get t̃(s)=n̄|x̄-μ|/s.


For SL=0.05 we find a 2-tails limiting value of t̃(SL)=T.INV.2T(SL,n-1)=2.2010. To check x̄ we use the "null hypothesis" t̃(s)<t̃(SL). If this is true we accept the sample and if not we reject it. Using t̃(σ)<t̃(SL) we get a more accurate test.

We can repeat the process by again using an update macro with shortcut ctrl-shft-U to transfer the values of a set of random numbers to the x column and evaluate the resulting t-values. Either both t-values are accepted or rejected or we get a Type I error when the t̃(s) value rejects an acceptable t̃(σ) value or we get a Type II error when a t̃(s) value accepts an unacceptable t̃(σ) value. We can record a set of macros to keep a tally on the outcomes for a set of trials. The trick for the tally macros is to augment a count in the cell below it, paste the augmented count in the original cell and then clear the contents of the second cell. We can then use the statistics collected to estimates the rates at which each event occurs.


Note that the sum of the rejected rates is approximately equal to the significance level α so the test doesn't take the mistakes into account.

Supplemental (Jul 6): I rewrote the update macro to increment the tallies in addition to transferring the random values to the x column. Now one just has to hold down ctrl-shft-U to increase the number of trials so there's less chance of human error but it can be rather tedious. With 10,000 trials one would expect the error in the rates to be about 1 percent. Here are the results for two runs.



Edit (Jul 7): Changed the variable α used to represent the significance level to SL to avoid possible confusion with other uses.

Thursday, July 4, 2019

Additional Student t Distribution Functions in Excel


  Excel has two additional functions related to the cumulative Student t distribution for use in hypothesis testing, T.DIST.RT and T.DIST.2T. The values in the table below are calculated two ways, one using an Excel function and the second using the cumulative t distribution. The function T.DIST.RT computes values of the single tailed probability, p(t̃<t), that is, the probability that Student's t will be greater than t̃ which is seen to be equal to the probability that t is not less than t̃ which can be computed using the cumulative distribution.The two tailed function T.DIST.2T gives the probability
p(not -t̃<t<t̃) or the probability that t is not in the interval bounded by -t̃ and t̃. Note that this function is only defined for 0<t̃. Note that if B=not A then p(B)=1-p(A) since the probability of both p(A)+p(B)=1.


As t̃ increases the probabilities decrease since the area under the curve for the t probability density function decreases.



The following plots help to visualize the probabilities associated with one tail and two tails.