httprover's 2nd blog: May 2021

Saturday, May 22, 2021

LS Fit Using Orthogonal Polynomials over the Data Interval

One can make the vectors or functions used to compute the fit coefficients more symmetrical if one defines a set of orthogonal polynomials on the interval covered by the data. The data for the 9 sets of measurements covered the interval [0,2]. We can start by setting p₀=1 and letting p₁=ax+b. When we integrate the product of the two polynomials over the interval [0,2] and set it equal to zero, we are left with one undetermined constant so to be more definite we can choose the coefficients so they are small integers. We then get p₁=x-1. Setting p₂=ax²+bx+c and setting the integrals of the products of p₂ with the other two polynomials equal to zero we are again left with one undetermined coefficient which we can again set be equal to small integers. And we get

p₂=3x²-6x+2.

Using the same formula for λ we get the following results.

The λ functions look more symmetrical but the accuracy of the fit doesn't seem to be improved.

Tuesday, May 18, 2021

Validity of the Formula for the Uncertainty in the Fit Coefficients

My derivation of the uncertainty in the fit coefficients assumed that the expected value <δy_jδy_k>=0 for k≠j.

It is not obvious that this is so but we can demonstrate it in the following manner. One starts by creating an array of random numbers with mean μ=0 and standard deviation σ=0.3. In this case the array contained 30 random numbers.

Next we multiply two numbers in the array with each other. Since there are 30 choices for each number we end up with a 30x30 grid of products.

We can now compare the sum of all the products with the sum of the squared products, the diagonal terms in the 30x30 grid. This process was repeated 100 times to get the mean values and the variations.

The two averages for the 100 trials are approximately equal which suggests that the expected value for the product for two uncorrelated random numbers is negligible. Only the sum of the squared terms appear to contribute to variations of the a_i and we conclude that the two formulas are approximately equivalent. The lower values for the variation of simpler sum suggests that it is the better estimate. One would expect these results to hold for a large number of trials.

Supplemental (May 18): Perhaps a simpler way of demonstrating the expected value of the product of two normally distributed random numbers is negligible is to do a large number of products and compute the average.

The bottom column shows the averages of 200 products repeated 50 times with the average and standard deviation of the 50 averages.

Supplemental (May 19): One can show that the expected value for the product of two independent normally distributed random errors is,

Thursday, May 13, 2021

Determining the Uncertainty in the Coefficients

One might wonder what the uncertainty in values of the coefficients in the last blog might have been. Repeated "experiments" shows there is some variation present in the results. Can we get a better measure of it?

The ith coefficient is equal to the scalar product of a vector, λ_i, and the y value. For the polynomial fit,

Since only the squared terms of (λ_i^Tδy)² contribute to σ_ai², the average of each δy_k being approximately zero and each δy_k² approximately σ², one can show,

So for the 9 experiments we then get,

Note that errors of a are bounded by 3σ. With this information we can properly design an experiment.

Supplemental (May 14): I failed to mention that the σ_y used for the y averages was σ/√9 with σ being the value for a single measurement of y so the values in the table above are σ_a=|λ|σ/3.

Tuesday, May 11, 2021

Combining Multiple Experiments

If one repeats the same experiment a number of times the errors become more uniformly distributed.

A plot of the data shows a more uniform spread.

We would expect the error of the average value of y for each x value to be lower. So, instead of fitting the y values of the individual experiments we can fit the average value of y.

The normal equation table is computed as before from which one can obtain the fit coefficients.

It is seen that the result is an improved fit. If σ is the standard deviation for an individual experiment one would expect the standard deviation for the average of n experiments to be about σ/√n.

Doing Simulated Experiment to Verify a Fit Procedure

Suppose you want to do an experiment but you're not sure if the results will be as accurate as needed. One can do a simulation first to evaluate the results. The first thing we need is some simulated experimental date. To be more specific let's assume the data will approximate the equation of a parabolic arc with random measurement errors having a normal distribution with mean μ and standard deviation σ.

We can generate a set of random errors and add them to the computed y values of a given parabola as follows.

The second column of random numbers are pasted values of the first column and is needed to preserve the data in the spread sheet since the random numbers and the worksheet are recalculated every time the value of a cell is changed.

We can use Least Squares to find the coefficients of the parabola that gives the best fit to the data. To do so we create a table containing the simulated y values along with the powers of the x values, the fit functions. Putting y on the left makes it easier to do a higher order polynomial fit later on if so desired.

In order to use the Least Squares formula for the coefficients we evaluate a table corresponding to the normal equations which is linear and easily solved. The calculation of the table can be done in one step.

A plot shows that the fit is a good one for the data but the coefficients are slightly off. The random errors can bias the fit if they are not uniformly distributed and consequentially the coefficients have some error associated with them.

Copying and pasting the values of a new set of random numbers allows one to "repeat" the experiment and observe the variations in the coefficients.