Wednesday, January 19, 2011

Uncertainties

It was shown that the standard deviation of the angle of the fitted line was proportional to the standard deviation of the distance of the data points from the line. A small angle approximation was used to derive this result and the relation was also true for the tangent of the angle of the line which is its slope. If we take the Δξ to be the maximum distance along the line from the center of the data then we can rewrite the relation as shown in the image below. We get a relation between the uncertainty in the slope of the line and the spread of the data along the line. This relation can be used to show how the uncertainty in the measurement of velocity is related to the interval of time over which the measurements are taken.

One might notice the similarity of this "uncertainty relation" with that of the Heisenberg Uncertainty Principle of Physics. The momentum, p = mv, is defined as mass times velocity. The connection between the two relations is that if there is an intrinsic source of error in the motion of a particle then there will be a minimum value of the product of the two intervals. This minimum occurs because not all the deviations involved are a result of the process of measurement but some are related to the motion of the particle itself.

Monday, January 17, 2011

Errata

I found some minor errors in the second part of the last entry. The corrections are given below.

The errors didn't affect the main result. The uncertainty in σ^2 is σ^2/n and there was an error in the sign of δθ which would account for some lack of confidence in what I had done. Notice that all the components of δCov' now have the units of length squared and that δθ is unitless.

Angular Deviation for the Covariance Fit

I was able to estimate the deviation in the angle of the line for the method using the largest eigenvector as the direction of the line in two dimensions. One starts with the quadratic equation for the tangent of the angle of the fitted line and using small angle approximations arrives at a formula in terms of the covariance matrix and its estimated deviations.

One then estimates the components of the covariance matrix and their deviations. ξ_max is the maximum distance of the data points from their center along the line.

ξ is assumed to be the distance of a data point along the fitted line and η is its distance perpendicular to the line. In this coordinate system the covariance matrix is diagonalized. One gets a result which is smaller than that obtained for the direction of the center of the inverted distribution of data points and approximatly what was expected.

Note: The components of the covariance matrix are squared "deviations" and one would expect their values to vary by plus or minum the components of δCov' squared for a set of data fits. The sum of squares suggests that the deviations will add vectorially.

Supplemental (Jan 17): In the estimates deviations were substituted for derivatives. Was this procedure valid? I don't seem to have a good handle on this. In the first fundamental form for lengths we relate squares of differentials. We define ds^2 in terms of dx^2, dy^2, etc. What we really want is ds or ds in terms of dx or dx, etc. Also, for the sets of fits we would expect components of the covariance matrix to vary about some mean value. The negative differences would suggest imaginary quantities for some of the deviations. One needs to be careful about confusing deviations and differentials as one would when dealing with vectors and magnitudes. If one is not precise in one's thinking the results are in doubt.

Saturday, January 15, 2011

Error Intervals and Formulas

A discussion of linear fits would not be complete without actual values for the error intervals. One would expect the center of the data to be within σ/√n of the true line. One can also estimate the expected deviation of the angle of the fitted line from the true line, Δθ. If ctr' is the center of the inverted distribution then cos(Δθ) = |ctr'| and
sin(Δθ) = σ/√n, approximately·

The values shown are the expected deviations for the original linear fit. One would expect most datasets to have a value for the center within 3σ/√n of the line and the direction to be within 3Δθ of the direction of the true line. The direction assumed was that of the center of the inverted distribution of data. The correction using the covariance matrix may be a little better but its formula is much more complicated.

Supplemental (Jan 15): The two centers have the same uncertainty, σ/√n, which would double the uncertainty in the direction of the line to 2Δθ. The correction was about equal to Δθ so the error in the angle of the eigenvector may be approximately Δθ. One could check this numerically. The components of the covariance matrix are estimates which contribute to the uncertainty in the eigenvector. Since the covariance matrix is diagonalized when the horizontal axis is rotated to the line the smallest eigenvalue contributes most to the error. Its uncertainty is σ^2/n so the standard deviation is σ/√n. For small angles the base of a right triangle is approximately equal to its hypotenuse.

Wednesday, January 12, 2011

Estimating the Angle of the Fitted Line for the Original Dataset

The moment function is fairly linear near its zero so one can use Newton's Method to estimate the angle of the fitted line. If we let θ represent the angle from the horizontal axis to the line, ε'(θ) the direction of the line and ε(θ) the normal direction for increasing θ then we can compute the moment function and its derivative to find improved estimates of the angle of the line. We started by finding the direction to a point farthest from the center of the distribution of datapoints, then computed the projections of the Δx onto this direction and inverted through the center those with negative projections. This resulted in a new distribution, Δx', whose direction from the original center was e_0. The angle for this direction is the first approximation of the angle of the fitted line, θ_0.

Using Newton's Method gives improved estimates of the angle which rapidly approach the angle of the eigenvector obtained using Mathcad 11's eigenvector function. Since we divide the moment, M, by its derivative, M', we can ignore any multiplicative constants. If we define the moment as the derivative of the variance it will have an additional factor of 2. Using expected values adds a factor of 1/n to both the variance and the moment.

Cramér's Orthogonal Mean Square Regression

Harald Cramér in Mathematical Methods of Statistics refers to the fit for deviations normal to the line as the orthogonal mean square regression line in both two and higher dimensions (see p. 309). His definitions are in terms of probability distributions and he includes higher order moments.

Monday, January 10, 2011

We Need to Discount the Fit Somewhat

Our data is only an indication of the line and subject to error so we need to discount the fit accordingly. Statistically one can expect the fitted line to deviate from the "true" line but be within a few standard deviations of it. The standard practice is to associate confidence intervals with the estimated values.

One may also ask how to find the eigenvectors for dimensions greater than two. Given a distribution of points one can easily determine its center and the covariance matrix. One can get a first estimate of the direction of the line by determining the direction of a point farthest from the center and then inverting through the center those points whose projection onto the estimated direction is negative. One can compute the direction of the center of the reflected distribution to improve the estimate for the direction of the line. One can then estimate the zeros of the moments for a best estimate of the direction of the line.

Supplemental: Finding the centers of the points with positive and negative projections separately would probably work just as well for estimating the the direction of the line. All that is needed is an estimate of two points on the line.

Sunday, January 9, 2011

Why The Moment For The Alternative Fit Is Zero

The following is a short proof showing that the moment for the balanced least squares fit is zero. We start with the equation for the line and let δ' be the distance to a point on the line and then determine the value of t for the minimum distance. It is equal to the projection of Δx onto the line.

δ is the same as before. e and e' are eigenvectors of the covariance matrix and the product of it with each of them results in a vector in the same direction whose magnitude is its eigenvalue. Since the eigenvectors are perpendicular to each other the product, M, is zero.

Supplemental: The moments are proportional to the change in the variance. One might define them as dV/dθ where θ represents the angle through which the direction of the line and one of the other eigenvectors are rotated through their common plane. One could use the product of the variance and the direction of the line, Ve', to define a surface about the center of the distribution. Since d(Ve') = (dV)e' + V(de') = (dV)e' + Vedθ both the direction and magnitude of Ve' will change with the rotation. Assuming all the eigenvalues are different, the change is zero for all rotations if and only if the direction of the line, e', is one of the eigenvectors of the covariance matrix.

Saturday, January 8, 2011

OLS Is Still Useful

The alternative least squares fit was intended for experimental situations where there were errors in the measurements of both dimensions of a two dimensional fit. There are situations where ordinary least squares gives a better fit and that is when the errors in one dimension dominate that of the other. If we treat y as a function of x then y is the dependent variable and x the independent variable. If the errors are predominately in the dependent variable then OLS is probably the method of choice. This can occur when the independent variable is precisely known. But one cannot rightfully assume that this is true when there are measurement errors also present in its values. If one uses two data points equally spaced above and below some line at a number of fixed points along the line then OLS will give a better estimate of the slope of the line. The newer method will try to get a better fit by assigning some error in the horizontal direction and the variance that results is lower but the fit is poorer when compared with the original line. So one needs to exercise some judgement in deciding which method to use.

Friday, January 7, 2011

Covariance Matrix Example

Before getting off the topic of data fits I thought I'd give a simple example to illustrate how a set of points is formed into a matrix and how its covariant matrix is calculated. If the data points are represented by a set of column vectors then one places them side by side in a data matrix, x. One then averages these points to find their center, x_ctr, and subtracting it from each of the points gives Δx which in this case is a 2×6 matrix. Its transpose is 6×2.

Multiplying Δx by its transpose gives the covariance matrix. Some people define the covariance matrix as an expected value which involves averaging but doing this is not necessary in order to find the eigenvector and so the calculation is simpler.

To multiply two matrices together one multiplies a row of the of the first by a column of the second for all possible combinations and in this case gets a 2×2 matrix for the covariance matrix. Notice that the covariance matrix is equal to its transpose which makes it symmetric.

Data Analysis as a Form of Perceptiion

The derivations for the linear fit made use of planes of projection and a normal line through the data. The line was the intersection of the a number of planes. A dual approach is involved in perspective graphics and stereoscopic vision where images are formed on planes normal to a line of sight. In the fit the data was reduced to the covariance matrix. One may ask if there is a similar but more elaborate process involved in forming a mental picture of the real world. There seems to be something very fundamental here and it may form the basis of some general theory of perception.

Ancient Egypt was familiar with both measuring rods and scales for use in weighing. They also seem to have been fascinated with obelisks which are pillars erected normal to the plane of the surface. The imagery is interesting and there even seems to be a connection there with perception in the use of Eye of Horus fractions. The Egyptian term for slope was seked which was used for the slope of the pyramids and that for specifying approximate proportions was the pesu which was used for the ratio of ingredients in baking and brewing. The history of science is long and fragmented.

Thursday, January 6, 2011

What's Missing?

You may have noticed that there is no term for the deviation along the direction of the line. The reason is that any error is included in the position along the line. And there are no error estimates for the position of the center of the distribution and the angle of the eigenvector. The error estimate for determining the position of a point is a function of the standard deviations and the number of data points. One would also expect that the error in the direction of the line would be of the order of angle of a right triangle whose height is the standard deviation normal to the line and base is the maximum length of the fitted line from the center as the base. The results would depend on the error distribution and could be checked by numerical computations. By assuming a center and direction for the line and an error model and then comparing the estimates of the fit with the initial assumptions one gets feedback on the method used to fit the data.

Wednesday, January 5, 2011

Vectors, Moments and Forms

Variance is the principal fundamental form in statistics. The model for the first fundamental form that of the length of a vector and is defined in terms of an inner product. A moment is basically the idea behind the balance which involves the weighted sum of distances. The moments used in Physics and Engineering are associated with exterior products. It also seems natural to associate moments with the deviations of statistics.

The condition for a minimum is associated with a nonchanging value of the variance in least squares fits. The numerical computations seem to suggest that there may be more general truths associated with variances and moments. It's something to ponder on.

Supplemental: Consider for example the process of normalization in statistics.

Tuesday, January 4, 2011

Derivation for Linear Fit in nD

In nD a line is an intersection of n-1 planes so we can let the total variance be the sum of variances for the deviations from each plane. The distance of closest approach to the origin for each plane is then a projection of the average of Δx and we get a number of terms for the variance similar to that in two dimensions.

For the changes of the e's to be independent they would have to be perpendicular to each other. Each e is then an eigenvector of the covariance matrix.

Monday, January 3, 2011

Equation for the Eigenvalues in 2D

In two dimensions one can derive a simple formula for the angle of the eigenvector of a matrix if it is symmetric as in the case of the covariance matrix.

Since the equation is quadratic there are two solutions and one can substitute these into the equation for V to find the eigenvalues.

In three dimensions the initial equation in the derivation is that of a plane whose parametric equation is the composition of two independent unit vectors and corresponding parameters. One can use the eigenvectors corresponding to the two largest eigenvalues. The deviations from the plane are in the direction normal to it which corresponds to the smallest eigenvalue. For a line the parametric equation is the same as in two dimensions but there are deviations in the two directions normal to that of the line.

Connection with the Covariance Matrix

The matrix used to find the eigenvector for the "Balanced Least Squares" fit is the covariance matrix without averaging. One gets the expected value of a quantity by taking the average of its sum. The proof essentially states that the direction of the line giving the best fit is an eigenvector of the covariance matrix of the data.

I forgot to mention that the reason for using the difference between a point on the line and a data point for the deviation is that under the circumstances the fitted line is a function of the data. The justification is that the data maps onto the line. The point of view is that the derived rule is a product of experience. This is less hypothetical than assuming a rule, deriving its consequences and correlating the results with experience in order to justify the rule.

Supplemental note: Data reduction is a special case of e pluribus unum.

Sunday, January 2, 2011

Derivation for the "Balanced Least Squares" Fit

The following is a short proof of the condition for the direction of the "Balanced Least Squares" fit line in 2D. One starts with a general equation for a line. The deviations are defined as the relative differences between a line through a data point parallel to the fitted line and the origin. x is a 2 by n matrix consisting of the data points. Like the deviations that I used for the previous fits these are from the data point to a point on the line which is opposite to the standard definition which is usually used for residuals.

Since V and its terms are scalars one can transpose them without changing the result. The formulas for the deviations and the variance turn out to be quite simple.

Since e is a unit vector and its changes are perpendicular to it the conclusion to be drawn is that the direction to the closest point of the fitted line to the origin is an eigenvector of the indicated matrix.

The eigenvector corresponding to the smallest eigenvalue is chosen for the direction to the point of closest approach to the origin. The one for the larger eigenvalue corresponds to the direction of the fitted line. So one can find parametric equations for the line with the best fit.

For a linear fit in nD the smallest eigenvalue will not necessarily be along a line through the origin. If the data is spread out well enough there will be a largest eigenvalue and the direction of the point of closest approach to the origin can be found by eliminating the direction of the fitted line from the direction to the center of the distribution of data points.

An Alternative Least Squares Data Fit

When fitting a straight line to a set of data using Ordinary Least Squares (OLS) one often finds that the resulting fit deviates slightly from the expected "best fit". An example of a "misfit" can be seen in the image below. The data for the example was found by randomly selecting a line, points on it and then applying a random normal error to the data points. The fit appears more skewed because the scales are not equal.

An OLS fit seeks to minimize the squares of the vertical deviations from the line. A plot of these deviations for the fit shows that they are larger near the center of the distribution of errors and smaller at larger distances away from the center along the x-axis. Apparently OLS favors minimizing the deviations of points with extreme x distances from the center of the error distribution. If one computes the moment which corresponds to the sum of the "weights" δy at distance Δx from the center of the distribution one finds that it is zero. Similarly, if one exchanges the directions x and y one finds that this moment is quite large since the δxs were ignored. One also finds that the moment about the origin is zero and that the moment of normal deviations of points along the fitted line is quite large.

If one does a least squares fit using simultaneously both the x and y deviations from the line one gets a more balanced fit. This is probably closer to what one would get by drawing a straight line through the data by hand using a straight edge.

This "Balanced Least Squares" method has a moment which is zero for normal deviations along the line using distances from the center of the data. The moments for deviations normal to the x and y axis tend to cancel themselves out resulting in small values which appear to be the same.

It can shown that the direction of the fitted line is an eigenvector of a matrix which is the product of a matrix deviations Δr of the data points from the center of the distribution and its transpose and that the fitted line passes through the center of the data points. An eigenvector, e, of matrix, M, has the property that M e = m e, that is, the vector produced by multiplying a matrix by one of its eigenvectors has the same direction as the original eigenvector but the magnitude may be subject to change.