1. 19

  2. 4

    I recently worked through a similar problem by setting up a property test with hypothesis, piping through two different implementations of a function with some generated data.

    I do wonder how well this sort of behaviour diff works on more complex structures. Hypothesis suffers a bit because you have to generate the “right” shape of data to hit differences and behaviour is rarely spread uniformly over data shapes

    1. 3

      Yes; such an approach using hypothesis is described here. It is very closely related to this work.

      CrossHair uses symbolic execution rather than a randomized approach - it works better in some cases and worse in others. It can figure out some hard cases where you need for example, a string that matches a specific pattern or two lists that are permutations of each other, etc. But hypothesis will be more effective in other scenarios; CrossHair doesn’t effectively solve the problem given in the above post, for instance. (my attempt)

      The ideal solution would be to combine the best of both approaches, and I’ve been chatting with Zac Hatfield-Dodds about how to make some kind of hypothesis-CrossHair integration happen. One of the main obstacles here is getting the maturity of CrossHair high enough, and I’m hoping easy-to-use tools like this one will help encourage more usage. :)

      1. 2

        At some point, it becomes a sequential Monte Carlo simulation; the inputs are so constrained that they must be selected from a probability distribution.

      2. 2

        This is super cool!

        I’d like to see something like this done at massive scale, analyzing millions of commits and their changes in the underlying Python. That might be revealing.

        1. 1

          Thank you! And, yeah, I’m hoping this will be a good way to exercise the system: point it at tons of “refactoring” commits and see whether it can notice stuff that people missed.

          Sadly, naive symbolic analysis like the kind used in CrossHair doesn’t scale up to large codebases well. There is a fair amount of research on tactics for scaling, but most of my time right now is focused on ensuring that the modeling is solid. (if anyone out there wants to collaborate, I am very interested!) The world needs more people bridging research and engineering, IMO.

        2. 1

          If you run CrossHair on code calling shutil.rmtree, you will destroy your filesystem.

          I wonder if there is a way of introducing a sandbox filesystem or similar to work around this?

          1. 1

            I’ve been thinking about this a little! Something like pyfakefs might work (and could potentially give you the deterministic behavior that CrossHair also requires). That said, I’m not sure I’d have enough confidence in the completeness of a tool like this to recommend people use it with CrossHair. Possibly?

            There are other kinds of effects besides the filesystem of course too: network, peripherals, etc. As I understand it, securely sandboxing Python is quite challenging. Interested in what ideas people here might have for me!

          2. 1

            Really cool, it reminds me of Github’s Scientist but much easier to use. Really needed for experimentation and hypothesis-driven development.