1. 6

  2. 1

    “Too random” means “inappropriately distributed”.

    C++’ new library has distributions, and it’s great. If you want randomness that looks like how people’s height varies randomly, with that hump around the average, C++ gives you that, it’s called “normal” (and there are variations of that, yes). If you want randonness that looks like the number of customers entering a shop per minute (or I suppose requests to an API per second, absent crontabs), C++ gives you that, it’s called “poisson” (usually). No need to reinvent 19th century statistics just because you model something that isn’t uniformly distributed.

    1. 3

      Low-discrepancy sequences are a different topic than choice of distribution, though I agree people should also pay attention to that question. A low-discrepancy sequence is often a replacement for uniform sampling in cases where you want something more uniform, e.g. you want to draw 50 numbers, but with a high chance of them being fairly evenly spaced, which isn’t a property you get from small-sample draws from a uniform distribution. When you draw a lot of numbers you’ll get something that approaches uniform coverage, but in small samples it’s quite likely you’ll get clumping, which is not a problem if you actually want uniformly random draws (since that’s what they look like!), but can be a problem if you’re really using them as a proxy for something else where coverage matters, like to drive a Monte Carlo algorithm.

      1. 2

        (Reply-to-self, because too late to edit.)

        Upon further thought, a better way of putting this might be the other way around (i.e. not the way this post’s title puts it). You can view low-discrepancy sequences as being not for when random numbers are too random, but for when a uniform (non-random) grid is too uniform. For example, if you use the uniform grid [0, 5, 10, 15 .. 100] to run some kind of algorithm on the interval [0 .. 100] without running it at every point, you have a bunch of weird regularities to it, like every single number you picked happening to be divisible by 5, which is not really representative of the overall properties of the interval [0 ..100]. But if you instead sample [0 ..100] randomly, you do get nicely random statistics of every kind, but a good likelihood of bad coverage of the interval due to clumping. So the goal is to come up with a deterministic sequence with good coverage that nonetheless “looks” random.

        In a sense this is saying the same thing, that we magically want some of the properties of both a uniform grid and uniformly-random sampling. But putting it this way sides towards saying this is a specially chosen type of grid that tries to minimize spurious regularities (like every number being divisible by 5), rather than a special type of random sampling.

      2. 1

        I agree with your comment - what the author is looking for is less variance in what ever they generate. So a fast, pragmatic way, would be to take a uniform grid of whatever dimensionality then add a gaussian jitter around points on that grid.