Understanding "randomness" – Dev

The best answers to the question “Understanding "randomness"” in the category Dev.

QUESTION:

I can’t get my head around this, which is more random?

rand()

OR:

rand() * rand()

I´m finding it a real brain teaser, could you help me out?


EDIT:

Intuitively I know that the mathematical answer will be that they are equally random, but I can’t help but think that if you “run the random number algorithm” twice when you multiply the two together you’ll create something more random than just doing it once.

ANSWER:

I guess both methods are as random although my gutfeel would say that rand() * rand() is less random because it would seed more zeroes. As soon as one rand() is 0, the total becomes 0

ANSWER:

Just a clarification

Although the previous answers are right whenever you try to spot the randomness of a pseudo-random variable or its multiplication, you should be aware that while Random() is usually uniformly distributed, Random() * Random() is not.

Example

This is a uniform random distribution sample simulated through a pseudo-random variable:

Histogram of Random()

        BarChart[BinCounts[RandomReal[{0, 1}, 50000], 0.01]]

While this is the distribution you get after multiplying two random variables:

Histogram of Random() * Random()

        BarChart[BinCounts[Table[RandomReal[{0, 1}, 50000] * 
                                 RandomReal[{0, 1}, 50000], {50000}], 0.01]]

So, both are “random”, but their distribution is very different.

Another example

While 2 * Random() is uniformly distributed:

Histogram of 2 * Random()

        BarChart[BinCounts[2 * RandomReal[{0, 1}, 50000], 0.01]]

Random() + Random() is not!

Histogram of Random() + Random()

        BarChart[BinCounts[Table[RandomReal[{0, 1}, 50000] + 
                                 RandomReal[{0, 1}, 50000], {50000}], 0.01]]

The Central Limit Theorem

The Central Limit Theorem states that the sum of Random() tends to a normal distribution as terms increase.

With just four terms you get:

Histogram of Random() + Random() + Random() + Random()

BarChart[BinCounts[Table[RandomReal[{0, 1}, 50000] + RandomReal[{0, 1}, 50000] +
                   Table[RandomReal[{0, 1}, 50000] + RandomReal[{0, 1}, 50000],
                   {50000}],
         0.01]]  

And here you can see the road from a uniform to a normal distribution by adding up 1, 2, 4, 6, 10 and 20 uniformly distributed random variables:

Histogram of different numbers of random variables added

Edit

A few credits

Thanks to Thomas Ahle for pointing out in the comments that the probability distributions shown in the last two images are known as the Irwin-Hall distribution

Thanks to Heike for her wonderful torn[] function

ANSWER:

Oversimplification to illustrate a point.

Assume your random function only outputs 0 or 1.

random() is one of (0,1), but random()*random() is one of (0,0,0,1)

You can clearly see that the chances to get a 0 in the second case are in no way equal to those to get a 1.


When I first posted this answer I wanted to keep it as short as possible so that a person reading it will understand from a glance the difference between random() and random()*random(), but I can’t keep myself from answering the original ad litteram question:

Which is more random?

Being that random(), random()*random(), random()+random(), (random()+1)/2 or any other combination that doesn’t lead to a fixed result have the same source of entropy (or the same initial state in the case of pseudorandom generators), the answer would be that they are equally random (The difference is in their distribution). A perfect example we can look at is the game of Craps. The number you get would be random(1,6)+random(1,6) and we all know that getting 7 has the highest chance, but that doesn’t mean the outcome of rolling two dice is more or less random than the outcome of rolling one.

ANSWER:

Neither is ‘more random’.

rand() generates a predictable set of numbers based on a psuedo-random seed (usually based on the current time, which is always changing). Multiplying two consecutive numbers in the sequence generates a different, but equally predictable, sequence of numbers.

Addressing whether this will reduce collisions, the answer is no. It will actually increase collisions due to the effect of multiplying two numbers where 0 < n < 1. The result will be a smaller fraction, causing a bias in the result towards the lower end of the spectrum.

Some further explanations. In the following, ‘unpredictable’ and ‘random’ refer to the ability of someone to guess what the next number will be based on previous numbers, ie. an oracle.

Given seed x which generates the following list of values:

0.3, 0.6, 0.2, 0.4, 0.8, 0.1, 0.7, 0.3, ...

rand() will generate the above list, and rand() * rand() will generate:

0.18, 0.08, 0.08, 0.21, ...

Both methods will always produce the same list of numbers for the same seed, and hence are equally predictable by an oracle. But if you look at the the results for multiplying the two calls, you’ll see they are all under 0.3 despite a decent distribution in the original sequence. The numbers are biased because of the effect of multiplying two fractions. The resulting number is always smaller, therefore much more likely to be a collision despite still being just as unpredictable.