Monday, January 10, 2011

We Need to Discount the Fit Somewhat

Our data is only an indication of the line and subject to error so we need to discount the fit accordingly. Statistically one can expect the fitted line to deviate from the "true" line but be within a few standard deviations of it. The standard practice is to associate confidence intervals with the estimated values.

One may also ask how to find the eigenvectors for dimensions greater than two. Given a distribution of points one can easily determine its center and the covariance matrix. One can get a first estimate of the direction of the line by determining the direction of a point farthest from the center and then inverting through the center those points whose projection onto the estimated direction is negative. One can compute the direction of the center of the reflected distribution to improve the estimate for the direction of the line. One can then estimate the zeros of the moments for a best estimate of the direction of the line.

Supplemental: Finding the centers of the points with positive and negative projections separately would probably work just as well for estimating the the direction of the line. All that is needed is an estimate of two points on the line.

No comments: