Thursday, October 13, 2011

Angular Separation as the Distance Correlation

One can use other numbers besides powers of 2 for the number system of the random-number generator. Here 54 = 625 is the number of objects that are chosen from. A sample of 9001 numbers was generated. The generator seems to work better with larger numbers which is why 54 was used. The resulting numbers can be converted into their base 5 representation and the four resulting base 5 numbers could be used to represent a part of a sample drawn from 5 objects.


Instead of the autocorrelation of the string of numbers with shifted versions of itself, a dot product was taken and the resulting anglar separation was used as the distance correlation. The "demons" are still present and occur at the midpoint of the sample.


The average projection of the unit vectors of one member of the sample onto another is 3/4 so the average normal would be about √7/4. This corresponds to an angular separation of 41.41°.


As can be seen, the distribution of the angular separations agrees fairly well with the expected mean. One might expect some variation from it if the initial (reference) "direction" is atypical and therefore an outlier.

The simplest definition of a distance correlation for generator strings of the same length would probably be the angle between their unit vectors or,

dcorr(e1,e2) = acos(e1·e2).
 

Thursday, October 6, 2011

Getting Past the Demons

The problems that were encountered with the random-number generator resulted from trying to simplify it for the last post. I had been using larger numbers at first and reduction past a certain point started introducing more regularity in the output of the generator. If one wants the generator to produce numbers varying from 0 to n-1, the choice for α and μ are related by the formulas below.


The "demons" disappear in this version of the generator. The mirror symmetry problem in the output seems to have moved elsewhere. The simpler version of the generator raises the question of what we mean by statistical independence? Are mirror images of sequences allowed? If we treat sequences of numbers as the ordered numbers found in a vectors then two sequences are related if they are multiples of each other or reflected versions of each other. The vector B = mA which results by multiplying the vector A by m is related to A. The vector (y,z,x) is a permutation of the elements of (x,y,z) and so also related. But we have to allow these sequences since they are valid outputs for the generator. The question is really one of the frequency with which these sets of numbers occur. One has to look at a set of output sequences and determine whether or not the set is statistically likely. In creating a set of sequences we can compare each new entry with the existing set and reject it if it is too closely related.

Correlation was used to compare sequences of numbers. It tells us how well two sets of numbers are aligned. If they are too closely aligned it raises doubts about their statistical independence.

A reference for this random-number generator is Hamming, R.W., Numerical Methods for Scientists and Engineers, pp. 137-42.

Supplemental: The random-number generator's mechanism appears to be closer to that of the urn problem than that of shuffling. Shuffling permutes the order of the cards in a deck. Picking a card at random is an urn problem.

Tuesday, October 4, 2011

A Random-Number Generator

There are times when a programmer needs to pick a number at random and the random-number generator was introduce to fill the need. The trick used is to multiply two numbers together and select a central string of digits. Below is a simple random-number generator. mod is the modulo function which gives the remainder that results when one number is divided by another.


This generator produces numbers between 0 and 1023 inclusive. A plot of the first 8192 numbers that it produces is shown below. The pattern appears to be quite random.


To evaluate a random-number generator one can count the number of times some number recurs in a sample and plot the freqency of no occurances of a number, just one occurance, two, three, etc versus the number of occurances. The generator above yields a Poisson distribution.


To see how internally random the sequence of numbers is one can compare it with a shifted version of itself using the correlation function. This is known as autocorrelation. Correlation varies between -1 and 1. The correlations between the shifted sets of numbers is quite small.



The observed values for the correlations vary between -0.05 and 0.05 and they appear to be distributed according to a normal distribution.


The sequence of numbers does indeed appear to be random. But if one looks closely at the plot of the correlations one can see that the left and right sides are mirror images of each other. So random-number generators are only useful up to a given point and can repeat after some period.


Close-ups of the center of the correlation plot confirms that demons are likely to be present in random-number generators. One nice thing about them though is that the results are reproducible.