Thursday, June 1, 2017

Newton's Temperature Scale

  The thermoscope, a bulb containing air with a long tube that was immersed in water, was developed by Galileo and others to measure temperature during the first half of the 17th Century. Boyle studied similar "weather-glasses" and introduced the hermetically sealed thermometer in England by 1665. In 1701 Newton anonymously published an article, Scala graduum caloris, which described a temperature scale ranging from the freezing point of water to that of a fire hot enough to make iron glow. An English translation of Newton's article can be found in Magie, A Source Book in Physics, p. 225.

Newton's temperature scale has a geometric series and an arithmetic series associated with it. The geometric series corresponds to the temperatures and the arithmetic series is associated with cooling times.

 "This table was constructed by the help of a thermometer and of heated iron. With the thermometer I found the measure of all the heats up to that at which lead melts and by the hot iron I found the measure of the other heats. For the heat which the hot iron communicates in a given time to cold bodies which are near it, that is, the heat which the iron loses in a given time, is proportional to the whole heat of the iron. And so, if the times of cooling are taken equal, the heats will be in a geometrical progression and consequently can easily be found with a table of logarithms."

After finding a number of temperatures with the aid of a thermometer, Newton describes how the hot iron was used.

"...I heated a large enough block of iron until it was glowing and taking it from the fire with a forceps while it was glowing I placed it at once in a cold place where the wind was constantly blowing; and placing on it little pieces of various metals and other liquefiable bodies, I noted the times of cooling until all these bodies lost their fluidity and hardened, and until the heat of the iron became equal to the heat of the human body. Then by assuming that the excess of the heat of the iron and of the hardening bodies above the heat of the atmosphere, found by the thermometer, were in geometrical progression when the times were in arithmetical progression, all heats were determined."

Newton's temperature scale can be constructed mathematically as follows where I've noted some corresponding temperatures on the Fahrenheit temperature scale for comparison.

The temperature point between the melting point of wax and the boiling point of water is an average. I used the geometric average which works best. One can put together a table as follows to compare the Fahrenheit temperatures with the index number, k, above.

A graphical comparison shows that the logs are fairly linear. Using 66°F for the temperature difference gave the best fit for human body temperature at the lower left of the plot.

The slope of the fitted line can be used to convert Farenheit temperatures to points on Newton's scale.

Newton's law of cooling can be in be expressed as the difference between the temperature of an object at some time and the ambient temperature being proportional to an exponential term involving time. This can to shown to be equivalent to the differential form of the law.

Supplemental (Jun 1): Leurechon Thermometer (1627)

Supplemental (Jun 2): 65°F gives a better fit for body temperature. Was this the ambient temperature at which the experiments were done? It's doubtful there was a standard temperature yet in Newton's time. For more on the history of early thermometers see Bolton, Evolution of the Thermometer, 1592-1743.

Supplemental (Jun 2): The average of the freezing point of water and body temperature is (32+98.6)/2= 65.3. Did this originate with Accademia del Cimento?

Supplemental (Jun 4): Corrected conversion formula for k.

Monday, May 22, 2017

Fermat's Problem in Three Dimensions

  Verified the Newton's method works in three dimensions. I choose the  four vertices of a tetrahedron as the given points. The Fermat point which makes the sum of the distances from the given point a minimum turned out to be mean of the given points.

I used Excel to create an anaglyph. You will need red-cyan glasses to view it properly.

Saturday, May 20, 2017

Is the Minimum for the Four Point Fermat Problem Where We Thought?

  I've been trying to convince myself that minimum in the four point Fermat problem of the last blog is not slightly displaced from the point c. The plot below shows changes in the sum L=Σℓi of the lengths of the links from the known points to the unknown point x for changes along two lines, u and v through the point (0.700,0.700) in the plane of x.

Below 0.700 both lines decrease and increase above this value. The slopes are fairly linear on each side. But it's difficult to be certain that point c is the actual minimum just going by the data because of the discontinuity in the slope. Notice that the angle from horizontal is not the same for both lines. It may be possible for the slope on the right to be decrease also but at a lower rate. But under the circumstances it does look like c is the actual point of intersection for the two line segments.

Supplemental (May 21): Obviously we can't use an extension of Newton's method to solve this type of minimum problem since the gradients are not zero at the minimum. Fermat's theory of maxima and minima is not a general theory. When doing searches for curve fits one often encounters local minima that appear to be line segments. This might happen if the minimum is paraboloidal in shape and the contour lines are elliptical.

Friday, May 19, 2017

An Insoluble Fermat Problem for the Method

 There's a four point Fermat problem that can't be solved by linearizing the function for the sum of the distances of the unknown point.

If one tries one ends up with division by zero. The gradient of L at point c is not continuous.

The distance function near a point is cone shaped.

The individual gradients are not well behaved near a given point as this plot shows.

For a problem like this one can compute the gradient function for two points displaced from the minimum and try to find where two lines in their directions through the chosen points intersect to get a better estimate of the minimum.

Thursday, May 18, 2017

An Oversight on the Fermat Point Solution

  I just noticed an error in my the solution for the Fermat Point in the last blog but it didn't affect the results. One can solve the f correction equations for dx directly and the normal equations are not needed.

One can read f|x≠0 as "f evaluated at x is not equal to zero." The normal equations are useful when one has more equations than unknowns which often occurs when one is doing least squares fits. This method might be considered the equivalent of Newton's method for finding the zero of an equation in higher dimensions.

Wednesday, May 17, 2017

Finding the Fermat Point Given Three Arbitrary Points

  At the end of a letter to Mersenne in about 1640 concerned with finding maxima and minima Fermat proposed this problem:

  "Datis tribus punctis, quartum reperire, a quo si ducantur tres rectæ ad data puncta, summa trium harum rectarum sit minima quantitas."

  "Given three points, the forth to be found, from which you draw three lines to the given points, the sum of these three lines is to be a minimum quantity."

So, given three arbitrary points, and using the method in the previous blogs, we can find the Fermat point as follows. The distances are the ℓi whose sum is to be minimized. Taking the derivative we find for an assumed value of x that dL=fTdx where f is the sum of three unit vectors pointing to x. For the position of x for the minimum value of L the change dL has to be zero for arbitrary changes in position, dx, and the only way that this can happen is if f is equal to zero too. But the value of f at the assumed point is not necessarily zero so we look at changes in f with position and see find the value of dx for which f+df=f+Mdx=0. These are the correction equations for f. The matrix M is found by extracting the derivative of the vector function f(x). Using the method of least squares one can show that dx is a solution of the normal equations.

Using the above equations in Excel and repeatedly correcting the value for x we arrive at the Fermat point after just a few iterations.

Checking the angles between the lines from x to the given points we find they are all 120° which was deduced from the minimum condition.

The correction equations for Fermat's problem are simpler than the reflection problem since we do not have a constraint on the change for dx.

Tuesday, May 16, 2017

Reflection as an Example of the Shortest Path for Light

  There's a simpler version of the Steiner Tree Problem and that is Hero's problem of finding the shortest path for a reflected ray of light. Again, for the general problem, we have the "gradient" equal to the sum of two unit vectors pointing to the unknown point, x. An additional complication is the constraint of the motion of x along a line so that dx=îdx'.

A solution for the reduced normal equations verifies that the angle of incidence equals the angle of reflection.

Reflecting the second point above the line illustrates Euclid's Prop. XX in his Elements Bk 1 asserting the sum of any two sides of a triangle is greater than the third or, equivalently, a straight line is the shortest distance between two points.

One can see that the triangles in the two problems are similar and the math works out the same.

Saturday, May 13, 2017

Solving a Steiner Tree Problem in Excel

  Solving a Steiner tree problem can be challenging but I managed to get Excel to do this using an iterative process for correcting the positions of the unknown points. The problem seeks to find a set of links between a number of points that has a minimal sum for the lengths. One can derive the minimum conditions as follows. The required condition is that the sum of a set of unit vectors toward or away from the unknown points x and y is equal to zero.

Note that the conditions imply that these unit vectors can be arranged to form the sides of an equilateral triangle making the angles at the points x and y equal to 120°.

The conditions for minimum are not linear making them difficult to solve for x and y but we can linearize them by assuming values for x and y and seeking corrections dx and dy for which the above functions are zero.

The values for x and y allow one to find the corrections dx and dy which produce the more accurate solutions x' and y'. One nice thing about Excel is that one can use a macro to replace the original values of x and y with the new ones, x' and y', and rapidly recompute them using a shortcut key such as Ctrl-Shift-R.

Tuesday, May 9, 2017

Binomial Distribution Fit Curvature

  If one takes the natural log of the probabilities for the binomial distribution and the fit in the last blog one gets the curves below. The 2nd differences which are a measure of the curvature of the curves are also given. The 2nd differences for the fit are constant as expected.

The relatively large differences at the ends are less critical since they correspond to relatively small values for the probabilities.

Supplemental (May 9): The 2nd differences for the normal distribution function are also uniform and equal to -0.04 or 1/λ exactly.

Supplemental (May 10): Technically, curvature depends on changes in the tangent of a curve with path length but I think it's fair to say that deviation from a straight line is a form of curvature even if it is not constant. For a parabolic arc the rate of change of the slope with a change in the "horizontal" distance is constant. I got Excel to find a center for the circular arc of the lower curve of the second plot above and radius of curvature turned out to be a little over 25,000. We don't have to worry about units here since both axes are just real numbers.

Monday, May 8, 2017

The Normal Dist. as an Empirical Fit to the Binomial Dist.

  The binomial distribution is a rather complicated function and the factorials are difficult to deal with so one might be tempted to seek a simpler function that approximates it. The binomial distribution is fairly symmetric about the mean value, μ, and a logarithmic plot reveals an approximate quadratic function so we might try to fit a function of the form,

This is a discrete probability function and its sum over all values of k is equal to 1 so A is actually a function of β too. We could try a search for β that minimizes the mean square error or find an approximate solution and try to improve on it. The second method gave the following fit.

The first plot compares the fit with the binomial distribution which is quite good in this example. The red points in the second plot show the deviation of the fit from the binomial distribution and the blue points give the same for the normal distribution formula using the  expected values <k> for μ and <k2> for λ. For most values of p the two error "curves" are nearly equal but for p near 1/2 the least squares fit has lower bounds on the error.

Supplemental (May 8): Recomputed the normal distribution error evaluating the erf function at k±0.5 to get:

The curvature of the binomial distribution near the peak and wings may be responsible for the deviations of the two approximation functions.

Friday, April 28, 2017

Interpreting Error Bounds for the Trapazoidal Distribution Fit

  I've got a better handle on the error bounds for the fit of a trapazoidal distribution to the deviations of the time of the Equinox. Here's a review and a corrected plot with error bounds showing the expected deviation from expected value for δt.

The error bounds used the values of the estimate of the deviation, δf*, for the probability densities, obs_f*, for the intervals in the table above. One can get a better understanding of what the error bounds mean by looking at the expected relative frequency, fi=ni/n, for the intervals chosen. The x values indicate the center of the interval. Using the values of a and b for the fitted trapazoidal distribution we can compare the observed counts with the expected counts, k=nf and their expected rms deviation of the counts, δk=√[nf(1-f)]. The expected variation in the relative frequency will then be δf=√[f(1-f)/n]. But what does all this tell us about the observations themselves? One can look at the terms of the binomial distribution with p=f and determine the probability of observing exactly k counts in each interval. Then we can add up the probabilities for those values of k which are within a distance of δk from the expected value for k. The last column on the right shows the probability of this occurring for each interval. A calculation shows the odds aren't uniform for the intervals but equal to 0.6252 ± 0.0375. The probabilities associated with the error bounds are less than those for a normal distribution and fluctuate a little because we are dealing with a discrete probability distribution and taking sum of those values of k between <k>-δk and <k>+δk. One can show that probabilities associated with bounds for a given number of standard deviations in a normal distribution is equal to P(k)=erf(k/√2). So we would expect slightly more observations to be outside the error bounds for the binomial distribution than would be the case for a normal distribution

The binomial distributions for each of the intervals can be plotted together for comparison.

Monday, April 24, 2017

A Fit of the Equinox Deviations Using Expected Values

  I got a little bogged down with some technical details associated with copying formulas from one Excel worksheet to another. It's a little annoying when Excel crashes, restarts and you have to redo the stuff you haven't saved. It may have been how an error crept into one of my previous pages. You literally lose track of what you are doing. One needs to constantly check one's formulas and trace dependencies when transferring material from one page to another.

I used the expected value formulas to do the fit for the deviations of the Equinox times. The results were similar. The frequencies that I've been using were relative frequencies defined in as the ratio of counts for an interval to the total count. One can also define the function f as a probability density or probability per unit interval. I had to use this definition to get the fit to work properly for the Equinox times. The value of f here is the previous value divided by the width of the interval dx.

The error bounds are nominal in the sense that they are typical of the observed variations for a trapazoidal distribution. The fit values for the trapazoidal distribution are a=4.470 min and b=15.089.

Supplemental (Apr 24): The trapazoidal distribution has an interesting series for formulas for its expected values. The pattern holds for higher powers of x. Technically this might be called a folded trapazoidal distribution since the probabilities for the positive and negative values of x are combined.

Supplemental (Apr 25): The variations in the relative frequencies are scaled down versions of the expected variation in the counts for an interval as this derivation shows.

In evaluating f and δf in the table above I used the observed values for the interval's density, obs_f (=ni/n/dx), as an approximation. It was intended as a check of the trapazoidal density formula whose maximum value is 2/(a+b)=0.1023. Using the same letter for the relative frequencies of the counts and the probability density formula may have be a little too confusing. So the error bounds in the plot are a little too large. Using f* for the density the correct formula for the expected rms error in the density would be as follows with Δx=dx.

Friday, April 21, 2017

Using Estimated Expected Values to Fit a Trapazoidal Distribution

 From the definition of the trapazoidal distribution and its integral one can obtain formulas for the expected values ⟨x⟩ and ⟨x2⟩.

Then given a set of random numbers from a trapazoidal generator one can analyze the set by counting the number of values that fall within chosen intervals and then estimate the distribution for the intervals and the expected values ⟨x⟩ and ⟨x2⟩.

We now have two equations which can be solved for a and b which can be used to fit the observations and compare the results with the original values of a and b. In the example below the original values were a=0.5 and b=1.0 and the fit values were a=0.538 and b=0.993.

This was a lot easier to do since the equation for ⟨x⟩ can be transformed into a quadratic function of ρ=a/b whose solution can be found if a set of values are assumed for b.  The set of values for ρ can then be used to find a zero of the expression for ⟨x2⟩.

Wednesday, April 19, 2017

Why the Least Squares Fit Failed

 The least squares fit failed primarily because it favored the majority at the expense of a minority. The histogram cells closer to the mean time had the highest probabilities while those beyond the value of b had zero probability. The result was that the data for the last cell on the right could be ignored when computing the rms error of a trapazoidal distribution if the b value was too small.

The additional constraint for the minimum maximum magnitude of the z-scores for the probability distribution assured that an unlikely situation would not occur. The probability of a histogram interval was found by using the difference of the integral of the trapazoidal distribution of its upper and lower bounds.

When setting bounds for curve fits one has to make certain that significant data is not ignored.

A Better Trapazoid Fit For the Equinox Time Deviations

  The trapazoidal distribution fit for MICA's Spring Equinox time deviations proved to be a little difficult. The b values were difficult to fit since they favored lower values at the expense of large z-scores for the last interval of the histogram. I tried minimizing the maximum absolute value of the z-scores while minimizing the rms error and got what appears to be a better fit.

Here are some statistics for the fit.

Judging by the z-scores its a marginal trapazoidal distribution at best.

Tuesday, April 18, 2017

Trapazoid Fit For the Equinox Time Deviations

  Just got through doing a rough trapazoidal fit of the deviations in the time of the Spring Equinox.

This fit uses the MICA times of the 251 Spring Equinoxes from 1800 through 2050. We probably shouldn't take the trapazoidal distribution too seriously but it may be wise to keep an open mind about the actual shape of the distribution. The minimum error was for a=2.05 min and b=15.15 min. The statistics hint that a slight deviation from the mean time is the most probable situation.

Supplemental (Apr 19): This fit ended up somewhat off the mark due to an error in evaluating the probability for the histogram intervals. I found the error in data used last night and corrected it in the next blog. Here too I had problems in choosing a good value for b.

Monday, April 17, 2017

Trapazoid Fit Video

  I did a video to show the trapazoid fit process in action. I was able to compensate for Google's processing somewhat but not entirely. Excel recalculates the worksheet if a cell's content is changed to moving a cell about is an easy way to recalculate the worksheet. Each time the selected box is moved the worksheet computes 1000 random trapazoidal numbers, does the data analysis and computes a fit for the data.


You can pause the video to study a particular fit. It helps to zoom in a little too.

The Trapazoidal Distribution

  Over the weekend I've been studying the Trapazoidal Distribution. One only needs two numbers a and b to define this distribution. Its height, h, can be found since the area under the curve equals 1.

I wrote an Excel user function to generate 1000 random numbers that fit this distribution with a=0.5 and b=1 then determined the observed frequencies for the following set of intervals.

To check the trapazoidal random number generator it was necessary to determine the distribution which best fit the data. To do this a search was necessary to find the values for a and b which minimized the root mean square error for the fit. Since the trapazoidal distribution is symmetric it can be folded horizontally to reduce the number of intervals needed in the calculation.

The fit turned out to be fairly good.

Above the blue dots represent the observed frequencies and the blue line is the fit.

Thursday, April 13, 2017

Getting Control of Excel's Histogram Intervals

  I've solved the problem I've been having with setting the intervals on Excel's histograms. I tried adjusting the bounds by setting the Overflow and Underflow bins to ±18.0 but they were ignored. Apparently, these bounds need to be less than the maximum and minimum for the data series used for the histogram. It's debatable whether or not one needs empty columns at the ends of a histogram. These entries set up the histogram in an acceptable manner.

Again, the observed counts from the histogram can be used for comparison with the statistically expected counts for a normal distribution.

The distribution above looks similar to a normal distribution and the z-scores for the intervals can't be rejected on statistical grounds but one cannot always get a clear impression of the actual distribution for deviations from a mean value with a single dataset. In another histogram the distribution appears to be more like a trapazoid.

We would need more data to come to a conclusion about what the actual distribution for the Equinox times is.