Naively, you could take the random integer and compute the remainder of the division by the size of the interval. It works because the remainder of the division by D is always smaller than D. Yet it introduces a statistical bias

How does this introduce a statistical bias? That’s the first time I’ve heard this.

If the range of the integer is not evenly divisible by the size of the interval, the last interval-width bucket of values to be distributed will be underfilled in a uniform way, making a certain number of the maximum values in the range slightly less likely. (And a typical computer integer has either 2^31 or 2^32 different values depending on if it’s signed or not, both of which have exactly one distinct prime factor: 2.)

For instance, if your interval has width 5 and you’re uniformly generating nonnegative signed integers, you have 429,496,730 different integers which map to 0, 1, 2 and 3; but only 429,496,729 which map to 4.

This effect seems minor (one part in half a billion hardly sounds badly biased), but it is still detectable, and it gets much worse as the interval gets larger (consider an interval of size 2^31-1; in this interval, 0 is twice as likely as any other number to be generated).

How does this introduce a statistical bias? That’s the first time I’ve heard this.

If the range of the integer is not evenly divisible by the size of the interval, the last interval-width bucket of values to be distributed will be underfilled in a uniform way, making a certain number of the maximum values in the range slightly less likely. (And a typical computer integer has either 2^31 or 2^32 different values depending on if it’s signed or not, both of which have exactly one distinct prime factor: 2.)

For instance, if your interval has width 5 and you’re uniformly generating nonnegative signed integers, you have 429,496,730 different integers which map to 0, 1, 2 and 3; but only 429,496,729 which map to 4.

This effect seems minor (one part in half a billion hardly

soundsbadly biased), but it is still detectable, and it gets much worse as the interval gets larger (consider an interval of size 2^31-1; in this interval, 0 is twice as likely as any other number to be generated).Thanks, I really appreciate the detailed explanation!