Monday, August 26, 2013

Using Normal Errors for Data Fits


  We found that there was a simple formula for measuring the distance of a point from a plane if its distances from the plane along direction of the axes were known. For a linear curve fit one usually does the fit for just one axis if the unit of measurement is not the same in all the dimensions. But if the measurements in all direction are made using the same unit we can try an alternative method for doing the fit using the normal distance of a data point from a line or a plane. Starting with a general definition of a line or plane, r·e=λ, we can derive a formula for the variance of the data, V=eTRe, where e is the unknown direction of the normal and R is a matrix that is just a function of the data. The Einstein convention is used where repeated subscripts in an expression indicates a summation. The derivation works for an arbitrary number of dimensions which I refer to as nD.


If we replace R and an arbitrary matrix, M, and let V=eTMe we can show that the best fit is an eigenvector using the Calculus of Variations. The direction, e, is a solution of the eigenvector equation (M+MT)e=μe and the best solution for e is the eigenvector corresponding to the smallest eigenvalue for μ. Some of the details have been left out of the derivation of this result below to save space but it shouldn't be too difficult to fill them in. It too is an nD proof.


Note that the Lagrange multiplier, μ, has been included in Φ to allow for an arbitrary variation of e since it is subject to the constraint that the magnitude of e is 1. M+MT is twice the symmetric part of the matrix M. One can show that R is symmetric so R+RT=2R and, as a consequence, e is one of its eigenvectors.

No comments: