Saturday, June 22, 2019

Doing Histograms in Excel


  We've seen how one can generate a set of normally distributed random numbers by combining the Excel norm.inv and rand functions but creating a histogram of the results gives a good visual check on the data. The example below used mean value μ=0 and standard deviation σ=0.5. To save the results for a given set of random numbers it's convenient to work with just a copy of the values. One can also use a macro to simplify the copy and paste values routine.


For the histogram one needs a set of numbers to designate the sides of the bins and do counts of the number of random numbers in each bin. This can be done in two steps using the countif function to get a count for numbers greater than or equal to the left side of a bin then taking the difference between consecutive counts. Once entered for one line one can use drag and fill for the remaining lines but one needs to be careful about the initial and final values. Naming the fixed set of random numbers by a letter such as x makes formula simpler and easier to read.


If one compares the counts from the histogram wizard found in the analysis toolpak add-in one sees they are associated with the value for the right side of the bin.


To plot a histogram one needs to create a table containing double entries for each count. In the partial table below the offset function is used to copy values from the named bins range and cnt range above with the help of the floor function to read the correct values. The initial and final histogram values are set to zero.


So when plotted one gets the step function for the histogram.


The way the worksheet was set up it is automatically recalculated every time the copy and paste macro shortcut is used. The histogram and plot produced by the histogram wizard have to be redone each time the values in x are changed.

Using the above set of procedures makes it easier to select a good looking histogram and also shows the variation that naturally occurs.

Supplemental (Jun 23): The argument in favor of using the leftmost x value to designate a bin is that we read from left to right. The choice may be a cultural preference.

Supplemental (Jun 23): There's a lot of culture clash in the computer industry with differing file types and operating systems. Data structures and code can influence the choices programmers make which may seem strange to a mathematician. Usually the number line is read from left to right and tables are read from top to bottom. Using countif with "<" is an alternative choice for the histogram table but reversing the order of the difference doesn't change the results in the cnt column.

No comments: