Sunday, January 2, 2011

Derivation for the "Balanced Least Squares" Fit

The following is a short proof of the condition for the direction of the "Balanced Least Squares" fit line in 2D. One starts with a general equation for a line. The deviations are defined as the relative differences between a line through a data point parallel to the fitted line and the origin. x is a 2 by n matrix consisting of the data points. Like the deviations that I used for the previous fits these are from the data point to a point on the line which is opposite to the standard definition which is usually used for residuals.



Since V and its terms are scalars one can transpose them without changing the result. The formulas for the deviations and the variance turn out to be quite simple.



Since e is a unit vector and its changes are perpendicular to it the conclusion to be drawn is that the direction to the closest point of the fitted line to the origin is an eigenvector of the indicated matrix.



The eigenvector corresponding to the smallest eigenvalue is chosen for the direction to the point of closest approach to the origin. The one for the larger eigenvalue corresponds to the direction of the fitted line. So one can find parametric equations for the line with the best fit.



For a linear fit in nD the smallest eigenvalue will not necessarily be along a line through the origin. If the data is spread out well enough there will be a largest eigenvalue and the direction of the point of closest approach to the origin can be found by eliminating the direction of the fitted line from the direction to the center of the distribution of data points.

No comments: