1. 8

Written by @DRMacIver.

  1.  

  2. 4

    I think @drmaciver uses prop testing in a much different way than I do. I don’t do a lot of prop testing so it’s quite likely that I am the odd ball here but statements like:

    Often you want to generate something much more specific than any value of a given type.

    And

    Especially if we’re only using these in one test, we’re not actually interested in these types at all, and it just adds a whole bunch of syntactic noise when you could just pass the data generators directly.

    What made me find these odd is that I am generally generating the types my API expects, not special types just for testing. If I need a special type just for testing it tells me that my API hasn’t defined its types strictly enough (Correct by Construction). This post seems much more like maybe the author uses these testing types more like contracts?

    1. 5

      One of the examples I like is related to parsers of any kind. Let’s pick one of the simplest cases with a word-counter, that looks for all space characters to delimit words and count how many words have been seen. 1,114,112 code points are usable for this; a single one is the space, 17 fit the ‘space/separator’ family.

      Generating a random string with a uniform distribution that contains more than 2 spaces is tricky. Generating a random string that contains two of them in a sequence is rarer. Generating one that mixes either with combining marks that could change the contextual interpretation is another one.

      What you may want to control is the distribution: 20% of characters are going to be space or separator-related, 70% will be anything at all, and 10% will be combining marks.

      If all you have is a type string, you can hardly optimize for some character sets: in CSV, you’d want more commas, linebreaks, and quotation marks than in the example above; if you’re dealing with HTML, you may want to throw in more brackets and HTML entities. Similarly, if you’re dealing with years, 1970 and 2000 may sometimes prove to be more interesting as central points than 0 (which is not representable in the gregorian calendar anyway).

      At that point, the precision of your property-tests, or their ability to truly exercise your code, depends on getting generators that are likely fancier and better-directed than what your type system lets you express. Either you need to beef up the type system, or you need to be stuck with less control than what could be ideal.

      1. 0

        Let’s pick one of the simplest cases with a word-counter, that looks for all space characters to delimit words and count how many words have been seen.

        The definition of this problem is already problematic, and not at all simple, though. You can’t assume that out of 1,114,112 code points “space like” characters are the only word separators. “犬と猫” (Japanese), according to Google translate, means “the dog and the cat.”

        (This makes your point stronger, but was also a troll).

      2. 4

        I think @drmaciver uses prop testing in a much different way than I do.

        Possibly, but observationally I think my way is pretty common among anyone who use property-based testing libraries which are less tied to types. Certainly it’s very common in Python land, but there’s a bit of a biasing factor there in that I wrote the main Python property-based testing library so they may just be copying me. :-)

        What made me find these odd is that I am generally generating the types my API expects, not special types just for testing.

        Note that I am not saying that you shouldn’t test the whole range of types your API accepts. I’m saying that not every test of your API needs to do that and it is worth testing more specific domains. @ferd’s parser example is a good one, but in general there will be properties that are only interesting in a subset of your domain (e.g. because they test a behaviour that handles a special case).

        Take the mean example. It restricts the domain in large part because the property it’s testing is not valid outside of that restricted domain. You still need to test what happens with NaN, or empty lists, but what happens there is probably just an error condition that you need to test separately.

        1. 2

          One example:

          def fact(n: int):
            if n < 0:
              raise ValueError
            elif n == 0:
              return 1
            else:
              return n * fact(n - 1)
          

          You’re properly handling the negative number case, but if one of your property tests is f(n) = n!, you don’t want that specific test “failing” because Hypothesis tried n = -1.