1. 1

    This is an excellent article suggesting a solution to a very real operational issue with dynamo style hashing.

    1. 1

      Many thanks, pims! If you or any other reader has a question that the article doesn’t answer, feel free to pester me. I probably won’t visit this Lobsters comment page often. My Twitter account, @slfritchie, will likely be quicker, or find me by email at scott @ wallaroolabs .com.

    1. 6

      Just a note: I found the term Model-Based Testing a bit distracting - then again, I come from a Rails background. I think “Generative Testing in Rust with QuickCheck” would have been more helpful with no prior knowledge of QuickCheck.

      This also set me off into exploring QuickCheck. For those who don’t know, the most helpful thing I saw to help understand it was watching this video that showed off test.check, a QuickCheck implementation in Clojure: https://www.youtube.com/watch?v=u0TkAw8QqrQ

      Basically, it’s a way to generate random data and data structures (within certain bounds that you define) to be used as inputs in testing your application logic. Since I was also confused about this, it seems like people run QuickCheck as a step separate from their non-generative specs to identify specific edge cases and then add those edgecases as regression tests to their overall test suite. In some generative testing libraries I saw after poking around, they’re even run as part of the test suite, though I’m not sure how I feel about that - couldn’t that result in missing a case locally that then fails on CI due to different inputs?

      1. 5

        In the past, it was called specification-based (QuickCheck paper said “specifications”), model-based, or contract-based… test generation depending on which crowd you were listening to. Recently, most posts call it property-based testing. All are accurate given they’re technically true and have prior work. The superset would probably be specification-based since what they use in each is a specification. Formal specifications are also oldest technique of them.

        Generative is ambiguous since it sounds like it just means automated. All the test generators are automated to some degree. So, we name them from where the process starts. Far as what name, even I started using property-based testing instead of specification-based testing as default to go with the flow. I still use others if the discussion is already using those words, though. For instance, I might say…

        1. Spec-based if we’re talking formal specifications

        2. Model-based if we’re talking Alloy models.

        3. Contract-based if we’re talking Design-by-Contract, Eiffel, or Ada/SPARK since that’s their language.

        4. Property-based if talking to people using popular languages or something since they’ll find helpful stuff if they Google with that.

        1. 2

          Thanks for the background information!

        2. 2

          There’s also been some work done to save failing inputs for later retest. I’ve used that to do test driven development with properties.

          I know that’s supported in version 2 of the original quick check, almost certain Python’s hypothesis supports that, not sure about others.

          1. 2

            If you have a QuickCheck implementation that permits easy testing of a concrete test case, grab it and use it. Once upon a time, QC found a bug. Keep that concrete test case and add it to your regression test suite. Randomized testing means that you don’t really know when randomness will create that same concrete test case again. But if your regression suite includes the concrete test case, you are assured that your regression suite will always check that scenario.

            In the Erlang QuickCheck implementations (the commercial version from Quviq AB in Sweden and also the open “PropEr” package), there’s a subtlety in saving a concrete test case. I assume it’s the same feature/problem with Rust’s implementation of QC. The problem is: re-executing the test assumes that the test model doesn’t change. If you’re actively developing & changing the QC model today, then you may unwittingly change the behavior of re-executing a concrete test that was added to your regression test suite last year. If you’re aware of that feature/problem, then you can change your process/documentation/etc to cope with it.

            1. 2

              That’s probably because the first prototype for this required the random value as input to the value generator. I know that because I wrote it, and pushed for its inclusion in the second version of QuickCheck.

              Nowadays there are libraries that will generate the actual value in such a way that you can copy and paste into a source file.

              I’ve heard that hypothesis in Python keeps a database of failed inputs, not sure if anything else has that feature.

              1. 2

                Randomness is only one place where things can go wrong with saved concrete test cases.

                For example (not a very realistic one), let’s extend the Rust example of testing a tree data structure. The failing concrete test case was: ([Insert(0, 192), Insert(0, 200), Get(0)])

                Let’s now assume that X months later, the Insert behavior of the tree changes so that existing keys will not be replaced. (Perhaps a new operation, Replace, was added.) It’s very likely that re-execution of our 3-step regression test case will fail. A slightly different failure would happen if yesterday’s Insert were removed from the API and replaced by InsertNew and Replace operations. I’m probably preaching to the choir, but … software drifts, and testing (in any form) needs to drift with it.

                1. 1

                  That’s an excellent point, I have no idea how to automate that. You’d have to somehow notice that the semantics changed and flush all the saved test inputs, sounds like more work than gain.

                  This is great info, any other thoughts on how saved inputs could go wrong?

                  1. 2

                    Ouch, sorry I didn’t see your reply over the weekend. I can’t think of other, significantly different problems. I guess I’d merely add a caution that “semantics changed” drift between app/library code & test code isn’t the only type of drift to worry about.

                    If you change the implementation, and the property test is validating a property of the implementation, you have more opportunity for drift. For example, checking at the end of a test case. For example, for testing a hash table when deleting all elements, “all buckets in the hash have lists of length zero” could be a desirable property. The test actually peeks into the hash table data structure and checks all the buckets and their lists. The original implementation had a fixed number of buckets; a later version has a variable number of buckets. Some bit of test code may or may not actually be examining all possible buckets.

                    It’s a contrived example, one that doesn’t apply only to historical, failing test cases. But the best that I can think at the moment. ^_^

                    -Scott

          2. 1

            In some generative testing libraries I saw after poking around, they’re even run as part of the test suite, though I’m not sure how I feel about that - couldn’t that result in missing a case locally that then fails on CI due to different inputs?

            This is a potential problem with property-based testing, but to turn the question around - if you’re writing unit tests by hand, how do you know you didn’t miss a case?

            That’s why you use them together.

            1. 2

              I understand using property-based testing to find edge cases, but including it in the test suite seems to introduce a lot of uncertainty as to whether your build will succeed? And potentially how much time it will take to run the tests. Granted, finding edge cases is important regardless of when you find them, I’d just be more comfortable running the property-based tests as a separate step, though I’d be happy to be convinced otherwise.

              1. 1

                Correct me if I’m misunderstanding you. If the testing is part of build cycle, a build failure will likely indicate the software didn’t work as intended. You’ll also have a report waiting for you on what to fix. If it’s taking too much time, you can put a limit on how much time is spent per module, per project, or globally on test generation during a build. For instance, it’s common for folks using provers like SPARK Ada’s or model-checkers for C language to put a limit of 1-2 min per file so drawback of those tools (potentially unlimited runtime) doesn’t hold the work up. Also, if it takes lots of running time to verify their code, maybe they need to change their code or tooling to fix that.

                1. 2

                  No, I think your understanding is correct, and that’s definitely part of the point of running specs in the build process. I guess I’m just operating from advice I got early on to keep specs as deterministic as possible. I don’t remember where I got this advice, but here’s a blog post: https://martinfowler.com/articles/nonDeterminism.html

                  He also recommends this, which is what I would instinctively want to do with property-based testing:

                  If you have non-deterministic tests keep them in a different test suite to your healthy tests.

                  Though the nondeterministic tests Fowler is talking about seem to be nondeterministic for different reasons than one would encounter when setting out to do property-based testing:

                  • Lack of Isolation
                  • Asynchronous Behavior
                  • Remote Services
                  • Time
                  1. 2

                    Just going by the problem in his intro, I remember that many people use property-based testing as a separate pass from regression tests with some failures in PBT becoming new, regression tests. The regression tests themselves are static. I’d guess they were run before PBT, as well, with logic being one should knock out obvious, quick-to-test problems before running methods that spend lots of time looking for non-obvious problems. Again, I’m just guessing they’d do it in that order since I don’t know people’s setups. It’s what I’d do.

                    1. 2

                      Ah, okay, so separating regression tests from PBT does seem to be a common thing.

          1. 1

            Interesting stuff! I’d love to hear more about how Pony’s correctness affects the need for supervision trees. Is there no supervision hierarchy at all in a normal Pony app?

            1. 3

              (Caveat: I’m still not immersed in all of Pony & its best practices)

              No, not really. Actors don’t crash, so the fault-tolerance reasons for supervisor trees do not apply. However, another use for supervisors is to coordinate application startup (of course) but also application shutdown, both in a deterministic manner. In those cases, the nearest related-a-little-bit feature in the DisposableActor interface in the standard library https://stdlib.ponylang.org/builtin-DisposableActor/ … but it does not provide any you-shutdown-before-that-other-actor coordination that a supervisor tree can.

            1. 10

              Good morning. I’m the author of this article. I’ll be happy to answer questions about dynamic tracing here.

              -Scott