The matrix used to find the eigenvector for the "Balanced Least Squares" fit is the covariance matrix without averaging. One gets the expected value of a quantity by taking the average of its sum. The proof essentially states that the direction of the line giving the best fit is an eigenvector of the covariance matrix of the data.
I forgot to mention that the reason for using the difference between a point on the line and a data point for the deviation is that under the circumstances the fitted line is a function of the data. The justification is that the data maps onto the line. The point of view is that the derived rule is a product of experience. This is less hypothetical than assuming a rule, deriving its consequences and correlating the results with experience in order to justify the rule.
Supplemental note: Data reduction is a special case of e pluribus unum.