Saturday, October 13, 2012

Galton's Use Of The Normal Distribution

   Francis Galton wrote a number of papers in which he used the normal distribution in the study of human inheritance. The most mathematically complete paper appears to be Family Likeness in Stature, published in the Proceedings of the Royal Society of London, Volume 40, pages 42-73. The publication is dated January 21, 1886. The work seems to have been done with the assistance of J. D. Hamilton Dickson who was a tutor at St. Peter's College in Cambridge. Part of this paper is available in the collected papers at but part of the appendix and the tables at the end seem to be missing. The data for the inheritance of stature is from the Record of Family Faculties. I've been going over the mathematics used in the paper and have found a simple way to calculate Galton's measure of the deviation from the mean for the normal distributions. Column A in the table above contains a distribution of heights for the population in general and two sets of brothers. One can convert the frequencies in the tables to probilities by dividing by the sum of the columns. Multiplying the heights by the probabilities and adding gives the average height for each column. One can also calculate the standard deviation for each column and convert this to something like a scale height which can be used to determine the dimensionless parameter t. Galton uses the cumulative probability distribution analysis and quartile distances to characterize the spread of the distributions. For each distribution Galton found that the quartile distance qH = 1.7 inches. In general the height H = mH + t sH. An explanation of the terms can be found in the simplified Mathcad calculation seen below.
  A better name for the scaling unit that Galton used might be "scale deviation". Computing the probabilites for one inch intervals and multiplying by the sum of each A column one gets these fits.
  Supplemental (Oct 13): The calculation was intended to give a simple estimate of the scaling unit. One could get technical about the precise definition of standard deviation and the difference of estimated, expected and observed deviations. For one data point the observed deviation might be taken as zero but the standard deviation would be infinite.
  Supplemental (Oct 15): The table above is from the Google eBook edition of the Proceedings of the Royal Society that was cited.

No comments: