1. 14

    The author has a pretty explicit bias toward Earley parsing, and using LL(k) over LR(k) methods (which I mostly share). There’s a wealth of information here, but he’s missing a couple other significant strands of generalized parsing research outside of Marpa’s particular Earley variant (perhaps because they don’t fit within his narrative).

    GLR (Generalized LR):

    • Tomita, LR parsers for natural languages, 1984.
    • Tomita, Efficient parsing for natural language, 1986.
    • Scott and Johnstone, Generalised bottom up parsers with reduced stack activity, 2005. This segues into their later work with Earley and GLL.

    GLL (Generalized LL):

    • Scott and Johnstone, GLL Parsing, 2010.
    • Afroozeh and Izmaylova, Faster, Practical GLL Parsing, 2015.

    Outside of Earley, GLL seems like a very practical generalized parsing approach. Instaparse is one implementation.

    Earley / Shared Packed Parse Forests:

    • Scott, SPPF-style parsing from Earley recognisers, 2008 (and several related papers by Scott and Johnstone). Note: I’ve never been able to get the approach described in the paper implemented correctly, and not for lack of trying.

    Earley Intersection Parsing (not sure if there’s a canonical name for this):

    • Bar-Hillel, Perles, and Shamir, On formal properties of simple phrase structure grammars in Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung, 1961. Proves some important results about intersecting automata and context free grammars.
    • Chapter 13 (“Parsing as Intersection”) of Grune and Jacobs (also cited elsewhere in his timeline), particularly 13.4 (pgs. 437-439): This describes Bar-Hillel’s Intersection Parsing approach using contemporary CS & parsing terminology, and then suggests combining it with Earley parsing. While the basic intersection parsing method produces an astonishing amount of dead-end nodes to garbage-collect later, Earley parsers limit searching to productions reachable from the start symbol. If the completer is modified to produce non-terminals in the intersection parsing format (which is easy), intersection parsing nicely handles producing the parse forest (with structural sharing, when ambiguity produces multiple derivations).

    I’ve been working on a C library for Earley Intersection parsing, and an awk-like, pattern-match-directed language based on it, but working on testing them thoroughly tends to lead me to working on theft instead.

    1. 3

      Thanks for details about alternatives.

      I have played around with Bison’s GLR option and have almost managed to create a C parser that does not require special handling of typedefs in the lexer (ambiguities still appearing at runtime for a few constructs).

      Grune and Jacobs’ Parsing Techniques - A Practical Guide is sitting in a pile waiting to be read (the first edition was manageable, the second looks daunting)

      1. 1

        I have a print copy of the second, and it’s never struct me as daunting – reading it cover to cover, perhaps, but it’s well suited to dipping into for just specific techniques, with a well-curated bibliography for further details.

      2. 2

        packrattle is another example of a GLL parser.

        The times I’ve read about Earley parsers, they seem like the same solution as GLL, but attacking the problem from a different angle. Hopefully someone in academia will eventually prove that.

        1. 2

          I’m particularly interested in Earley parsers because some use cases of mine assume ambiguity (reverse-engineering binaries, scraping damaged data), and the chart that Earley parsers build up supports several operations beyond just recognizing & building parse trees:

          • what terminals/nonterminals would advance the current parse(s) in progress? (i.e., autocomplete)
          • if we don’t assume where the starting position is, what overlapping instruction encodings are there?
          • if we inserted a fake nonterminal here, would enough of the parse have completed to match any instances of (some structure) between here and there?
          • incremental re-parsing of changing input, since earlier columns in the chart are constant once the parser has moved on.
        2. 2

          Thank you for adding this. I understand why everybody is excited about Earley parsing, though I personally don’t feel sufficiently grounded in it to share that excitement, but at the very least other lines of research deserve to be mentioned.

          1. 1

            I apologize for reviving this thread, but do you know if Joop Leo’s improvements for Right Recursion allows for optimizing indirect right recursion also?

            1. 1

              Not sure, sorry.

          1. 5

            I’ve done this on a couple projects. It was particularly helpful when I was in an embedded project and had time values from a couple clock domains – mixing up a 1 msec resolution time value with a 1 sec time value (or, worse, our hardware’s timer that had 1024 tick per second) could have led to some subtle bugs.

            Unfortunately, the lack of generics means that tracking multiple things (such as, what resolution is the timer AND is it an absolute timestamp or relative) can get verbose. Still, the struct value boxing should have zero runtime cost, so it can be a handy way to use C’s type system for extra dataflow constraints.

            1. 7

              Something that seems curiously absent is recommending that any code that is accepting untrusted input and attempting to parse it should get lots of fuzzing. That’s one of the ideal uses cases for fuzzing, and coverage-guided fuzzers like afl-fuzz and libfuzzer are really good at searching for input that provokes unusual behavior in parsers.

              1. 1

                This is interesting, but there’s an important caveat near the end:

                Don’t use the Meow hash for tiny inputs. The minimum block size for the Meow hash is 256 bytes, so if you’re feeding it 8 byte data, 16 byte data, etc., it’s going to waste all of its time padding the input, and you could have gotten much better performance by using a different type of hash.

                They mention upfront that it’s a “large data hash”; more specifically, it’s a hash algorithm that’s making trade-offs to be fast for larger binary assets, at the expense of working badly for shorter input. Maybe this is really compelling for your problem domain, maybe it isn’t, but it’s an important consideration. (I’m excited to have people focus on more niche hashing stuff like this, even if this one isn’t very personally relevant.)

                1. 9

                  Posted a new theft release, 0.4.4, which includes a couple bug fixes and Makefile improvements that shouldn’t wait until the 0.5.0 release is ready…And then 0.4.5 shortly after, because someone pointed out I’d missed the version in the pkg-config file. Oops.

                  Aside from multi-core, another major change in 0.5.0 is that I’m expecting to swap out the RNG. I’ve previously been using the 64-bit Mersenne Twister, which is statistically good enough, but rather slow, and aside from user code, theft spends most of its time in the RNG. I tried xoroshiro128+, but ran into some issues with output quality, and the author has since deprecated it in favor of xoshiro256** (note: same page, and no ‘ro’). xoshiro256** seems better (at least, it doesn’t have patterns/gaps in the output that cause my “should eventually find the minimal failure case” tests to fail), but I’m also strongly considering PCG, because I feel I understand why it works far better than with xoshrio256**. (I recommend O’Neill’s RNG talk, btw.)

                  Work: Among other things, working on some succinct data structure implementations, and a library which builds on them to create a highly space-efficient trie. Both will eventually be open-sourced.

                  1. 3

                    It’s cool you work somewhere that lets you open-source stuff like that. Unless you’re an independent contractor/consultant working for yourself. Well, that’s even cooler. ;)

                  1. 8

                    Focusing on theft again, after taking a break for several months. I’m well into a massive restructuring (to add multi-core support), which feels like it’s been going on forever. Now the main stuff is working, but I’m in the long tail of making sure error handling, config hooks, etc. work, now that some of them may be split between multiple address spaces.

                    Other stuff that should make it in the next release:

                    • Failure tagging: If the user code sets an identifier for a failure, it can avoid shrinking paths that would change into other known failures, so that common failures don’t shadow rarer ones. I have a proof of concept of this already, but it needs to be redone to work with multi-core, because that splits its state between processes.

                    • Possibly replacing the RNG with something faster than Mersenne Twister 64, since outside of user code, theft spends essentially all of its time in the RNG.

                    • A better interface for passing config to user hooks, and moving more common configuration things into official settings (for example: running until a time limit, rather than a number of test trials).

                    • Better benchmarking of multi-core behavior (especially shrinking), and tuning with servers with 30+ cores in mind.

                    • A function that provides a CLI interface, with argument parsing.

                    • Improving the bloom filter implementation. (That will also become its own library.)

                    • Lots of misc. improvements to shrinking.

                    Other things that are on the horizon, but will probably be pushed until after the next release so it doesn’t take another year:

                    • Coverage-directed test case generation. I have a proof of concept using clang’s SanitizerCoverage interface – I don’t want to make theft depend on a specific compiler, etc., but know what the interface would look like to be able to utilize that sort of info, whether it comes from clang or something more project-specific.

                    • Better structural inference during shrinking (and coverage-directed generation). This should make shrinking faster.

                    • Sort sort of persistence API for generator input.


                    • Various performance improvements to our regex engine (with a focus on frontloading work at compile-time).

                    • Implementing some succinct data structures, for efficiently querying a (currently) massive in-memory data set. I should be able to eventually open-source the parts that aren’t highly specific to our use case.

                    1. 1

                      Re RNG. I always used xorshift if speed over quality was goal. Plus, it’s tiny. A quick look led me to someone saying xoshiro-256 is faster with good, statistical properties. So, there’s a few to try.

                      1. 2


                      2. 1

                        Coverage-directed test case generation. I have a proof of concept using clang’s SanitizerCoverage interface – I don’t want to make theft depend on a specific compiler, etc., but know what the interface would look like to be able to utilize that sort of info, whether it comes from clang or something more project-specific.

                        Oh wow, exciting! I actually started looking into the theft sources with the intention of doing this, great to hear you’re working on it yourself. Did you have any successes yet finding bugs that did not show up without feedback?

                        1. 2

                          I haven’t hooked it up into theft yet, I just confirmed that the interface will work like I expect – and there isn’t much point in starting until the multi-core stuff is done. It’s almost a complete rewrite (which is why it’s taken so long).

                          I want to add a more general coverage feedback hook, which could also be used with (say) line number info from Lua’s debug hook, hashes of runtime data, or something else that can be easily monitored and would correlate to meaningfully different behavior.

                      1. 2

                        There’s a slightly cleaned up version of my entry as a github project, hopscotch. (spoiler alert, etc.)

                        1. 1

                          Great entry, thanks for sharing! The obfuscated version is particularly good. witchery abounds…

                        1. 5

                          Studying succinct/compact data structures, with a focus on trees/tries. I’ve been reading Gonzalo Navarro’s Compact Data Structures and various referenced papers. I’ve been building up prototype implementations in C along the way, some may end up in a library eventually. (The author’s book on string search algorithms is also excellent.)

                          Working on some small improvements to my Thinkpad Carbon X1 / Void Linux ansible config repo, since now a couple people are using it and have given me feedback. (And, yes, I am very happy with this laptop, and it works very well with Linux.)

                          Getting ready for some vacation time.

                          1. 14

                            Another benefit (which doesn’t seem to be in the post) – when you’re building a decision table and realize there’s cases where you don’t know how they should be handled, or (oh no!) you’ve been handling it two different ways, depending on the order different code paths check the variables.

                            1. 7

                              This is how I have used them in the past. Sometimes I’ve even directly encoded the table as a lookup table, mapping the actions to functions.

                            1. 7

                              I bought a new laptop, a 6th gen. Thinkpad Carbon X1, and I’m setting it up with Void Linux. (Replacing a Macbook Pro – I’ve been wanting to switch to a non-Apple laptop for a while.) My “laptop” Ansible playbook drove most of the setup, but I still need to make it suspend when I close the lid.

                              A couple years ago, I wrote some blog posts about using Ansible to configure laptops (1, 2). I’ve been meaning to update my example repo – while the concepts are all the same, Ansible has had some interface changes over the years. It’s usually good about messages saying e.g. “sudo: was deprecated, use become:”, but it’d be nice having an up-to-date example, and it would document getting Void running on this Thinkpad.

                              I’m going to make Void packages for a couple things I’ve installed locally – mscgen, Alloy, TLA+, and drivers for my printer/scanner (a Brother HL-L2380DW).

                              Aside from that, I’ve been working on a C library for parsing/scraping ambiguous data using Earley parsing and parse forest grammars. (Related, I gave a talk at Papers We Love SF about the Earley parsing paper a couple weeks ago.) This has been off and on for a while, I’m still adjusting the API to account for important use cases it supports beyond general parsing. I’m also working on an LL(1) parser generator, a clone of djb’s redo, and learning radare2 by working through some reverse engineering exercises.

                              1. 0

                                I don’t really understand this. Sure, it’s cool to optimize something so well, but I don’t see the point of going to so much effort to reduce memory allocations. The time taken to run this, what it seems like you would actually care about, is all over the place and doesn’t get reduced that much. Why do we care about the number of allocations and GC cycles? If you care that much about not “stressing the GC”, whatever that means, then better to switch to a non-GC language than jump through hoops to get a GC language to not do its thing.

                                1. 11

                                  On the contrary, I found this article a refreshing change from the usual Medium fare. Specifically, this article is actually technical, has few (any?) memes, and shows each step of optimization alongside data. More content like this, please!

                                  More to your point, I imagine there was some sort of constraint necessitating it. The fact that the allocation size dropped so drastically fell out of using a pooled allocator.

                                  1. 4

                                    Right at the beginning of the article, it says:

                                    This data is then used to power our real-time calculations. Currently this import process has to take place outside of business hours because of the impact it has on memory usage.

                                    So: They’re doing bulk imports of data, and the extra allocation produces so much overhead that they need to schedule around it (“outside of business hours”). Using 7.5GB may be fine for processing a single input batch on their server, but it’s likely they want to process several data sets in parallel, or do other work.

                                    Sure, they could blast the data through a DFA in C and probably do it with no runtime allocation at all (their final code is already approaching a hand-written lexer), but completely changing languages/platforms over issues like this has a lot of other implications. It’s worth knowing if it’s manageable on their current platform.

                                    1. 3

                                      They’re doing bulk imports of data, and the extra allocation produces so much overhead that they need to schedule around it

                                      That’s what they claim, but it sounds really weird to me. I’ve worked with plenty of large data imports in GCed languages, and have never had to worry about overhead, allocation, GC details, etc. I’m not saying they don’t have these problems, but it would be even more interesting to hear why these things are a problem for them.

                                      Also of note - their program never actually used 7.5GB of memory. That’s the total allocations over the course of the program, virtually all of which was surely GC’ed almost immediately. Check out the table at the end of the article - peak working set, the highest amount of memory actually used, never budged from 16kb until the last iteration, where it dropped to 12kb. Extra allocations and GC collections are what dropped. Going by the execution time listing, the volume of allocations and collections doesn’t seem to have much noticeable effect on anything. I’d very much like to know exactly what business goals they accomplished by all of that effort to reduce allocations and collections.

                                      1. 1

                                        You’re right – it’s total allocations along the way rather than the allocation high water mark. It seems unlikely they’d go out of their way to do processing in off hours without running into some sort of problem first (so I’m inclined to take that assertion at face value), though I’m not seeing a clear reason in the post.

                                        Still, I’ve seen several cases where bulk data processing like this has become vastly more efficient (from hours to minutes) by using a trie and interning common repeated substrings, re-using the same stack/statically allocated buffers, or otherwise eliminating a ton of redundant work. If anything, their timings seem suspicious to me (I’d expect the cumulative time to drop significantly), but I’m not familiar enough with the C# ecosystem to try to reproduce their results.

                                      2. 1

                                        From what I understood, the 7.5GB of memory is total allocations, not the amount of memory held resident, that was around 15 megs. I’m not sure why the memory usage requires running outside business hours.

                                        EDIT: Whoops, I see you responded to a similar comment that showed up below when I was reading this.

                                      3. 2

                                        The article doesn’t explain why they care, but many garbage collection make it hard to hit a latency target consistently (i.e. while the GC is running its longest critical section). Also, garbage collection is (usually better optimized for short-living allocations than malloc, but still) somewhat expensive, and re-using memory makes caches happier.

                                        Of course, there’s a limit to how much optimization one needs for a CSV-like file in the hundreds of MBs…

                                        1. 1

                                          Maybe their machines don’t have 8gb of free memory lying around.

                                          1. 2

                                            As shown in the table, they don’t use anywhere close to 8gb of memory at a time. This seems like a case that .NET is already very good at at a baseline level

                                        1. 2

                                          I often do, but don’t try to force it. Sometimes I just don’t have spare mental energy for it, especially when I’m doing deep work at my day job. Hobbies that get me away from tech help ground me and keep me from burning out (I particularly like cooking and bicycling). I’m usually not interested in programming for its own sake, so much as for particular projects I want to work on.

                                          Also, remember that coding means different things to different people. You might find another branch of programming fits you better. I don’t find front-end web development enjoyable at all (I’m not actively involved in that space, and the tooling changes quickly, so it feels like starting from scratch every single time), but find playing with microcontrollers / Arduinos and making generative art bots to be a lot of fun. I’m especially interested in some particular problem domains, like parsing, networking, and testing, which have had more or less overlap with my jobs over the years.

                                          1. 3

                                            This is consistent with my experience on my last couple projects – using contracts along with property-based testing and/or fuzzing seems to have an especially high payoff (bugs found, with input that reproduces them) for the effort involved.

                                            The property tests were conceptually all stateful tests – the test library (theft) generated and executed a sequence of API calls, updating a straightforward in-memory model of the thing under test along the way, and checked that the result from each API call made sense, based on the model so far. If the code and in-memory model ever diverged, or if any contracts* failed, then it found something worth further investigation. It also used shrinking to remove most of the irrelevant details from the input.

                                            This approach is incremental, and combines well with other approaches – there’s no need to thoroughly cover everything in contracts before they start catching things. Properties can check the overall expectations of the code (“no series of valid operations leads to data loss”; “no input leads to pathologically bad performance”), but also the contracts will help catch tangential issues, like implicit assumptions about initialization. To some degree, contracts sidestep thinking about the overall properties of the code, because contracts will catch any inconsistencies uncovered by combining operations at random.

                                            * Well, asserts, because this was in C – but I was using asserts for internal consistency contracts, not just p != NULL; and whatnot.

                                            1. 9

                                              While the syntax is quite different, I think people really underappreciate how simple the Erlang syntax is. It takes no more than a few hours to learn everything about Erlang’s syntax. It has so few things about it that even if they are someone odd, it’s still so much smaller than almost any other language I’ve used. I think people tend to get way too caught up on its oddness rather than this strength of simplicity.

                                              1. 1

                                                I actually agree with you about stuff like that. Most programmers don’t, though. Being C-like helped C++ and Java win converts more easily. As in, you always get more adoption giving a crowd what they’re used to starting out.

                                                1. 1

                                                  Also, the syntax is really unsurprising if you’ve toyed with Prolog in the past.

                                                  1. 1

                                                    The author explicitly mentions that as a negative given it’s a weird language hardly any mainstream coders would’ve messed with. Instantly creates an obstacle to adoption for the masses.

                                                  2. 1

                                                    Yes, I played a bit with Erlang some time ago, and I was really surprised by how beautiful and simple it is.

                                                  1. 10

                                                    This one’s mine: source (hint) That was a LOT of fun to write :D

                                                    1. 2

                                                      I was going to take a quick glance at it to see if anything jumped out at me. Yeah, to heck with that lol. Great art, though! Love it!

                                                      1. 2

                                                        If you’re trying to pick apart an IOCCC entry, generally the first thing to do is pass it through the C preprocessor, and then through a code auto-formatter like astyle.

                                                        With my entry, that should bring several other layers of obfuscations into focus. ;)

                                                    1. -3

                                                      The authors obviously don’t know anything about R.

                                                      1. 3
                                                        1. 0

                                                          What is the difference between array and vector languages? Vector languages operate on slices of arrays (which might be called array languages).

                                                          1. 1

                                                            I’ve always heard the terms used interchangeably.

                                                      1. 4

                                                        I very frequently do the sort of testing described here as “FSM validation” in C (using theft). Even if I didn’t use property-based testing for anything else, the results I’ve gotten from that approach alone would more than justify the time I’ve spent learning property-based testing tools.

                                                        1. 3

                                                          Appreciate the feedback since I want more corroboration about that. I know worked well with formal specs but people will want to see it on code-only, too.

                                                          @ferd seemed to think it was only Erlang technique. These things used to be called “specification” or “model” based testing rather than property. The use of state-based modeling with pre- or post-conditions in formal specifications was also pretty much default due to FSM’s and Diskstra’s methods being popular. People had done randomized testing of them many times but most research tried to avoid it to reduce execution time due to hardware costs or thinking they could outdo random. Here’s an example from 1997 with FSM modeling plus heuristic and random testing. One with Statecharts. Another one from 2009 with UML. Recent results with “property”-based testing create a bandwagon effect where lots of these tools are being built, including for old languages. QuickCheck style being most common.

                                                          The tooling never went mainstream, though. That’s likely because it usually came with formal specs that were hard for people to learn. There were good reasons for that for sure. However, the newer ones are just giving a purpose-built, lightweight spec that only generates tests. The simplicity and working within the language itself is probably driving a lot of adoption. The methods are old, though. One more benefit while we’re at it is that model-checkers and provers can work with FSM models better than most. So, one can get property-based testing now plus some stronger stuff later if they or a contributor has the resources.

                                                          1. 2

                                                            It doesn’t feel like a fundamentally new technique, but most of the writing I’ve seen that explicitly describes it building on property-based testing tooling has been Erlang-flavored. The commercial Erlang QuickCheck’s documentation suggests testing stateful systems that way, and the library has some built-in support, so it’s probably more familiar within the Erlang world. (IIRC, they call it “statem” testing, for “state machine”.)

                                                            Hypothesis also has API support for stateful testing now, and its docs refer to similar support in ScalaCheck.

                                                            theft doesn’t have any explicit API hooks for this use case yet, mostly because I haven’t figured out the right abstractions for a C API, but it definitely supports it. Typically, I have theft generate an array of structs, each of which describes an operation in the API I’m testing – if I were testing a hash table, it would generate things like:

                                                            { .type = OP_SET, .key = "key001", .value = 23 },
                                                            { .type = OP_GET, .key = "key005", },
                                                            { .type = OP_DELETE, .key = "key001", },

                                                            and then a simple switch-case loop calls the code under test with each operation, checks that the result makes sense according to a simple in-memory model (typically just an array, with dead-simple linear search), and then updates the model with new bindings. After the whole series of operations, it would compare the end state of the model to the hash table. If deleting a binding has caused other earlier, colliding bindings to get lost, or growing the table has corrupted data, it would fail the test, shrink the steps, and present a small series of steps to reproduce the bug.

                                                            I’d already been doing that before I encounterted Erlang QuickCheck – I was trying to use PBT like a lightweight model checker, but one that would catch implementation bugs (because, hey, I mostly work in C.). I learned a bit about model checkers from dabbling in Alloy after Guillaume Marceau’s presentation at !!Con 2014. Describing the approach as a lightweight, probabalistic model checker still makes more sense to me than “state machine testing”, because I might not be testing a state machine at all – I’ve used it for cache invalidation logic, filesystem code, stress-testing a garbage collector, fuzzing a compiler, and several other things.

                                                            I suspect this technique is so effective because there are lots of codebases where individual pieces have been pretty well tested, but combining them in unusual ways uncovers slight misalignments, and randomly chaining these pieces finds combinations that compound the misalignments and trigger surprising failures.

                                                        1. 6

                                                          I like how concise and unopinionated this article is (or rather that the opinions are clearly marked and a good attempt made to separate the information from it).

                                                          I’m pretty bad with tests, essentially doing only acceptance, diff and a lot of manual tests (for what could be automanual). I have yet to find tests that aren’t end-to-end and can survive major refactors (that are complete rewrites or close to it) or even language changes. I’d also like them to not discourage me from changing a bad API (that’s likely to invalidate a lot of unit tests).

                                                          And while I’m wishing, something that can go from manual (done from a REPL or debugger) to automanual more naturally would be nice too. Even creating a second file in a project meant to be small feels cumbersome.

                                                          Before reading this, I used to think property tests meant fuzzing and now learned that fuzz tests don’t actually check the output!

                                                          1. 6

                                                            And while I’m wishing, something that can go from manual (done from a REPL or debugger) to automanual more naturally would be nice too. Even creating a second file in a project meant to be small feels cumbersome.

                                                            Ooh this is totally a thing- obscure, but a thing. I’ll find some references and add a section on it when I get a chance.

                                                            1. 5

                                                              “and now learned that fuzz tests don’t actually check the output!”

                                                              It’s one of those terms that don’t have a precise definition. Depends on which researchers or tool developers you’re talking to. What they have in common is throwing random data at something watching for an effect. That used to not involve a check because the problems they looked for often caused crashes. If the problems don’t, you might need checks to spot them. So, fuzzing != no checks but it’s common to not have them if target is C language or something.

                                                              Lots of random input is the defining trait.

                                                              1. 6

                                                                There’s definitely a point where if you’re using a fuzzer, but adding lots of assertions about the output being reasonable, it starts to feel more like property-based testing.

                                                                I did an internal tech talk about property-based testing at work couple months ago, and made the case that PBT tooling belongs in one’s toolbox because it can be applied to a range of test styles: along with typical PBT “tactics” (to use hwayne’s term), the test library can also run properties that are closer to fuzzing or model-checking. New tactics and a variety of approaches can plug into existing PBT tooling.

                                                                The core interface* is arguably specifying how to generate input for a property (reproducibly), and how to classify that input as interesting or not (by running some test code and checking for failures/errors). This framing is pretty general, and mainly ensures the PBT library has the info it needs to shrink interesting input. Plugging different testing tactics can lead to tests with very different trade-offs:

                                                                • “random input” + “it doesn’t crash”: Classic fuzzing.
                                                                • “structured input” + “these two implementations agree”: Comparing the code under test against a reference implementation, a naive/inefficient version, or the same codebase without an experimental optimization. Taking something easier to verify, and using it as a foothold to check something more complex.
                                                                • “structured input” + “resource limits are sufficient”: Searching for input that uses disproportionately large amounts of memory, CPU time, or whatever.
                                                                • “structured input” + “roundtrip a conversion”: encode some data, decode it (say, pack and unpack data for serialization to disk), and check if anything got lost in translation. A classic tactic.
                                                                • “a sequence of operations against an API” + “call API and check results, update in-memory model”: This works more like a model checker, and an in-memory dictionary can be an easy stand-in for a database, filesystem, or other model of the state of the outside world when testing complex logic in a vacuum. While it isn’t exhaustive, the way (say) TLA+ can be, and it’s difficult to apply to problems that are inherently nondeterministic due to concurrency, it’s a great way to cheaply stress-test logic and discover surprising interactions. There are lots of APIs where the individual operations make sense, but have subtle misalignments between them that can compound and spiral out of control when combined in certain ways. As a bonus, shrunken failing input for these properties tends to have a narrative: “when you do this, then this, then this, then this, it fails in this way”.

                                                                Controlling the input generation also means that it can be steered towards particular areas of the state space, in a way that feels more direct than (say) temporarily using asserts to convince afl-fuzz that those branches are boring.

                                                                * I have the most experience using theft on C (I’m the author), but it seems like this applies to most other PBT libraries.

                                                                1. 1

                                                                  I see. Thanks for the clarification!

                                                                2. 4

                                                                  And while I’m wishing, something that can go from manual (done from a REPL or debugger) to automanual more naturally would be nice too.

                                                                  I might have misunderstood you, but something like python’s doctests could be what you’re looking for. You can basically use the REPL to manually test your code and how to use it, you paste that in the docstrings, and doctests runs it again and makes sure it behaves like it should.

                                                                  1. 1

                                                                    something like python’s doctests could be what you’re looking for

                                                                    Good suggestions. I quite like the idea of doctests but here are some things that dissuaded from adopting them, at least right away.

                                                                    • I usually don’t want to see my tests when editing code. I guess I want tests either in a separate file (which also goes against my wish to have fewer files) or maybe grouped together at the end (or beginning) of the file.
                                                                    • Only the very first few (manual) tests tend to focus on a single function.
                                                                    • In a REPL or debugger, I usually have a lot of state set up. In the debugger (say when post-mortem debugging), I don’t have the steps to recreate this state! Of course, I could spend time to cook up an example but that takes more thinking than just a copy-paste (and I’m trying to find something that lets me progressively more effort for more certainty).

                                                                    Right now, I know that a lot of my REPL commands are just lost to the history file and could probably be made automatic instead.

                                                                1. 12

                                                                  This post was really eye-opening. I was expecting another “it leads to coupling n stuff” but this gave me a way to think about what’s been bothering me. Thanks for that.

                                                                  Historically, I don’t think inheritance was “meant” to be that way. If you look at the method wars period (late 80s, early 90s), people were drawing all sorts of relations between classes: refines, implements, abstracts, generalizes, serves, etc. Inherits was the only one that became language construct, at least for a long time. I wonder if that was a major mistake, since it restricted “powerful” relationships to whatever the language designers knew about.

                                                                  If that’s the case, I’d want to explore OOP more in dynlangs.

                                                                  1. 2

                                                                    If that’s the case, I’d want to explore OOP more in dynlangs.

                                                                    I’ve found Lua handy for exploring object systems – instead of having one built in, it presents a carefully chosen set of metatable hooks (including some similar to __getitem__ in python and method_missing in ruby). These can be used to construct several kinds of object systems.

                                                                  1. 8

                                                                    Preparing release 1.4.0 of greatest: (Edit: it’s posted.)

                                                                    • Add a flag to call abort() on test failure, so if tests are being run inside gdb, it will break on the failure with the call stack and other runtime info.

                                                                    • Add a function to set optional test name suffixes, so if a test is being run with parameter(s), it can print test results as the_test_ABC, the_test_DEF, etc. This is particularly useful when tests are being run in shuffled order. The test-name-based filtering includes this optional suffix in its matching.

                                                                    • Several other misc. improvements, especially work on the documentation.

                                                                    After that’s out, I’m going to work on a new release for heatshrink. It’s essentially done, and has been stable for years, but it wouldn’t hurt to flesh out the documentation more. Also, it’d give the project some recent activity – people seem to consider projects that haven’t had changes in 3 years stale, even when they’re really tightly scoped and feature complete.

                                                                    Strangest request: Someone connecting with me on LikedIn specifically to ask me to write a C# wrapper…Unlikely, though I will probably add blocking, bitstream-compatible implementations in C and Python to make porting easier. The majority of heatshrink’s complexity comes from being able to suspend and resume on IO, between any bytes.

                                                                    I will probably be back to working on theft again in a week or two – I have a massive restructuring underway to add multicore support. Since it interacts with nearly everything, there has been a ton of integration work, but it’s starting to come together.

                                                                    Work: Wrapping up loose ends so that we can start something really exciting soon. Also, working on a couple internal presentations about recent successes.

                                                                    1. 2

                                                                      The majority of heatshrink’s complexity comes from being able to suspend and resume on IO, between any bytes.

                                                                      Been a while since I heard someone mention that feature. Might be worth a write-up or detailed comment on how what makes it complex or how it’s done.

                                                                      1. 9

                                                                        It implements LZSS, one of the LZ77 family of data compression algorithms. Everything is based on a state machine, with an internal buffer for the data being processed. I chose LZSS because it does all of its processing using that buffer (the “sliding window”), so its CPU time and memory usage is very predictable.

                                                                        As a friend described the API: “you put in data, then turn the crank until nothing more comes out, repeat.” When you’re done, you call a function to notify it the end of the stream has been reached, and keep turning the crank until processing is finished. It isn’t that surprising, since the point of compressing / decompressing is to change the size of the data, but the assumption that the caller will keep calling to sink more data from a stream means that it can check its current state / where it left off, if there’s enough data buffered to move on to the next step, and how much room is left in the output buffer.

                                                                        I’m not saying it’s incredibly complicated, but a blocking, single-pass LZSS implementation is like a page of code; the suspend/resume logic is well over half of heatshrink, line-wise. There are also optimizations for doing decompression or compression on memory-constrained embedded systems – for example, the indexing in the encoder is simple, but either significantly faster or uses an order of magnitude less memory than other approaches I’ve seen. If you’re unpacking data out of ROM on a system with only a few hundred bytes of RAM to spare, approaches like gzip’s big in-memory hash table are a non-starter.

                                                                        1. 1

                                                                          Appreciate the detailed description. That’s very interesting. :)