Sunday, April 28, 2013
Effect of the Number of Fit Power Series Terms on the Predictions
The quality of a fit depends more on the number of terms used than the number of data points used. The plots in the following projections are nearly identical for 101 and 100001 data points fitted between x=0 and x=1. As the maximum power of x increases from 7 to 9 the curve more closely approachs the original function. How far out one can go with the prediction depends on the limit that one wants to place on the errors.
As one increases the number of terms a point is reached where the process breaks down as seen when one adds the term corresponding to x10 to the fit with 100001 data points.
Extrapolating a Power Series Fit to Make Future Predictions
One can use the power series fit for the exponential function over the interval x=0 to x=1 to predict values of the function outside this range but as one gets farther out from the endpoints of the fit the errors become noticable even if we have perfect data to work with.
Friday, April 26, 2013
Best Procedure for Polynomial Regression
There are subtle differences for the shifted Legendre polynomials that prevent them from being used to find a set of vectors that are orthogonal to one another. So one has to find a specific set of functions represented by the matrix A that works for the problem to be fitted. Finding the matrix A is just as difficult as finding the inverse of the correlation matrix so using the matrix of the powers of xk appears to be the simplest procedure.
One needs to see if computation errors affect the results. Usually one can check this by plotting the difference between the fitted function and the calculated or observed function. The error should look something like the next order orthogonal polynomial.
Thursday, April 25, 2013
Orthogonal Polynomials for Polynomial Regression
If one can find a set of orthogonal polynomials for which the function correlation matrix is diagonalized, it is easier to solve for the coefficients of a fit. We can represent the coefficients of the unknow set of polynomials by a set of column vectors A<k> and we generate the polynomials one at a time. When each new polynomial is correlated with the previous set we the result is defined to be zero. But the number of equations is one less than the number of unknowns. We can avoid this by arbitrarily setting the constant term equal to one and solve for the remainder of the coefficients. The solution is similar to the Gram-Schmidt Process for solving a set of equations. The function X2A below does this. The V<p> are the set of orthogonal polynomials.
If one examines the set of coefficients that results one can see that they approach something that is closely related to the shifted Legendre polynomials. Note how the new polynomials correlate with one another. The diagonal terms are approximately 1/(2p+1).
The coefficients needed to fit a function Y by the orthogonal set is just c=VTY/VTV for each polynomial. One can then use the set of coefficients to transform the set of coefficients for a fit to get the power series coefficients as can be seen using coefficients found for the fit of the exponential function.
Polynomial Regression Problems
One can run into problems with polynomial regression as illustrated in the following example. An exponential function is fitted between x= 0 and x=1. A large number of data points are used so that the sums will be proportional to integrals over the inverval. The coefficients for the power series are calculated using the least squares fit procedure.
The coefficients for the 8 terms that result are approximately equal to the coefficients of the series expansion.
The difference between the fit and the function, δ, is quite small since the series converges rapidly.
with one another and a large number of terms is needed eliminate them. Another problem is that since squares are involved in the correlations the precision of the calculation is half the maximum precision of 15 decimal places. After 8 terms the errors in the coefficients start to become noticable.
So we need a large number of terms to get accurate values for the fit but are limited somewhat in the precision that we can use. We can't get as good of a fit as that of the Fourier series fit. One wonders if problems like this affect studies of global warming. They might complicate the problem of making predictions of future temperatures.
Sunday, April 21, 2013
A "Fourier Regression" Example
To see the "functional regression" formula in action we can solve for the Fourier coefficients of a simple square wave.
The Variance appears large but this is due to the large number of terms. The fit is shown below and the ringing is the Gibbs phenomenon due to the limitation in the number of terms used.
The set of functions use are orthogonal so the function correlation matrix is diagonal. The first few "fit" coefficients compare well with the computed Fourier coefficients.
Coefficients for a Series of Known Functions
One can generalize the least squares fit for a Taylor series by replacing the functions xp with an arbitrary set of functions fp(x). The set of functions could be the sines and cosines of a Fourier series for example. The derivation is much the same as before but one can see the advantage of using the dot products for the correlations. The components of the fp vectors are fp(xk) for each xk in the dataset.
Saturday, April 20, 2013
The Least Squares Polynomial Regression Formula
My first encounter with linear fits was the simple formula for a line fit found in the Probability and Statistics section of the CRC Standard Mathmatical Tables, 16th Edition, 1968, p. 532 where two equations for the coefficients involving sums and products of the data are given without proof. There was a more general formula for fitting a polynomial function which is the same formula that is used in polynomial regression. One often wonders where these formulas come from and it can be shown that they are least squares fits for the coefficients of the polynomial. One can simplify the equations using vectors containing the data and reduce everything to a simple matrix equation. Knowledge of a derivation often helps in understanding what one is doing when following some procedure. x and y are the data points, δ is the difference between the observed value of y and its calculated value and V is the variance which is minimized. The procedure is known as the Method of Least Squares.
The matrix X contains information on how the set of vectors containing the powers of xk correlate with one another and Y contains information on how they correlate with the vector of the yk values.