1. 13

    Suggestions that aren’t already in other comments:

    • Have check-in / check-out habits, so you know by habit when you’ve started and are done for the day.
    • While I use the same desk and one of my external monitors for my desktop and docked work laptop, I switch keyboards when I’m working. It’s a very continuous, tactile reminder, and unplugging and switching it is part of my check in / check out habits. (I’ve never seen anyone else suggest this.)
    • Learn the preferred communication styles of your coworkers and try to meet them halfway – Some people are much easier to work with once you know whether they prefer a quick voice call to hash things out, via text chat and issues/PRs, etc.

    In general, I’ve found setting boundaries, having a regular routine, and over-communicating helpful, but the specifics will vary between people, and between teams. It makes a big difference whether a team / company has a wide time-zone spread, for example.

    1. 4

      That keyboard suggestion is really interesting. Makes me think about what other good priming strategies there are…

    1. 6

      k/q don’t have (f g h) and have fewer “words” so it’s easy to take a look and see if it’s more discoverable.

      k)mode:*>#:'=:
      q)mode:{first idesc count each group x}
      

      but having 200 primitives shouldn’t be a problem, it’s just a lot to (re) learn. 眾數!

      Anyway.

      If I need the mode, why wouldn’t I just write?

      y{~{.\:+/y=/y
      

      That seems perfectly clear to me, and doesn’t require fiddling with “expert level” at all.

      I think stuff like this:

      (i. >./) & (#/.~) { ~.
      

      is one of those things where if you needed to compute a trillion modes a day, you might try to think about how you reduce the time and space of the operation and so it’s worth spending forty minutes on it.

      btw, this is faster: ~. {~ [: (i. >./) #/.~

      1. 5

        As a data point: While reading the blog post, I tried writing it in k, and came up with exactly the same answer you did (albeit with slight syntactic changes for kona). It only took a few minutes.

        1. 2

          It doesn’t really make sense to compare APLs to logographic languages. Natural languages are optimized for single-pass parsing.

          A more apt comparison (and the most popular one, I believe) might be between the notation of APL and mathematical notation. But even then most mathematics can be read without the insane backtracking and potential for error that APL presents.

          1. 2

            I don’t agree that “natural” languages are optimised for anything: “L’invité qu’il a dit des folies”, or “The horse raced past the barn fell“ or even “为了环保运动必须深化” all demonstrate how sometimes a lack of clarity early in the sentence, accidentally or not, can force the reader to backtrack in a “natural” language.

            I understand APL this way: from left to right, top to bottom. If you understand it a different way it may be helpful to explain to people how you are (or came to be) successful in APL instead of trying to mix up my metaphors.

            1. 2

              Certainly you can construct natural language sentences that require backtracking. But my statements about the features of natural languages are uncontroversial (ask any linguist) and backed up by significant evidence. In fact, you can easily observe this yourself through the following experiment: take an program in an APL and a sentence in Chinese or even English and see how much someone can understand when you black out a few characters/words.

              So I’m a bit surprised as to why you seem to be arguing whether natural languages are like that instead of whether APL is like that.

              Comparing APLs to Chinese or other logographic languages is disingenuous because it suggests that Chinese, a language used by billions for thousands of years, shares the flaws of the APLs, a niche language used by at most hundreds for a few decades; it attempts to link a dislike of the design of the APLs with a dislike of logographic languages in general. As I’ve said, this comparison is superficial, and it is much easier to understand the flaws and benefits of APLs through a less synthetic comparison, e.g. with mathematical notation or other programming languages. It’s folly to “mix up the metaphors” of natural language and computer language.

            2. 1

              based off of this, would it make sense to make an APL variant where you still type the syntax like in ASCII variants, but you apply a rich formatting environment to it? Do you think there would be possibly better layout techniques for reading the code?

          1. 10

            I’ve been building a C testing framework for work and heard about Snow on Lobsters, so I’m planning to peruse it’s features for inspiration. The one I’m building isn’t as macro-heavy/macro-driven but I think there are a number of advantages to leveraging macros so I want to see what I can add.

            1. 5

              You should have a look at greatest, which has worked out great for me in the past. I don’t do a lot of C, but dropped my own little testing setup for greatest, and haven’t looked back.

              1. 2

                I’ll check it out, thanks for the link. At a glance, my framework does look similar.

                Probably worth mentioning, I am sort of targeting this at folks that develop software using the traditional full cycle SDLC and have to live through that cycle many many times. As a result, I also have a goal to formally support requirements engineering. Basically what that means is that as a test engineer writes a test for a developer to implement against, they can optionally map it to either a requirement (by ID), a type of requirement (functional, non-functional, performance, etc), or a set of requirements (multiple IDs). On a very large project with many moving parts, support for this in a test tool can be invaluable.

                The nice side benefit of this is that if you’re using a tool like Git, you can scroll through major points in the Git history and clearly see the progression of development not just by what tests are passing, but also by how many of the requirements solicited from the customer/stakeholder are satisfied. Eventually, I’ll support generating metrics from the tests in a common business/professional format (such as an Excel spreadsheet, so managers can create visualizations and whatnot).

                I think it’ll be useful for developers because they don’t just have to point at a results table and say “I’m passing all the tests”, they can point at a results table and say “I’m passing all the tests and here’s also proof that the tests passing fully cover all the initial requirements laid out, therefore the product you asked for is built” (and of course if they don’t like what they got, they can go talk to the requirements engineer :P )

                1. 6

                  Hi, greatest author here. :)

                  Something that might be useful in your situation: with greatest, if you have formal requirements IDs, you could use them as tags in the test function names, and then run tests based on those – you can use the -t command line switch in the default test runner to run tests based on test name / substring match. (Similarly, -x skips matching tests, so, for example, you could skip tests with __slow in the name.) If you name tests, say, TEST handle_EEPROM_calibration_read_failure__ENG_471(void) { /* ... */ }, then ./tests -t ENG_471 would run that. (Running tests by name also helps with quick feedback loops during development.)

                  I did some automotive embedded work several years ago. We had a whole requirement traceability system that involved scraping requirement IDs out of comment headers for tests, which eventually fed into coverage reports.

                  1. 1

                    Oh wow, that’s pretty cool. That tagging system can certainly be useful for more than just the requirement IDs but ya, that would work. Being able to filter tests by the tags is also really neat and I hadn’t thought of that as a feature.

              2. 1

                I’d be curious to see what someone could come up with if the test framework didn’t use the C preprocessor and used something else instead. Might be a fun exercise. But then again, maybe I’m just not liking the preprocessor lately.

                1. 1

                  What would it look like to drive tests, for C programs, in say Lua? It seems like a wonderful idea, but I’m not sure if the boilerplate stuff can be automated in such a way to make it feasible…

                  1. 1

                    I’m not sure either, but it still might be an interesting exercise (or mini research project). Maybe I should be the one to look into it since I’m the one that spoke up. ;)

                    1. 1

                      Actually, this sounds like something @silentbicycle has probably already tried. Might be worth checking in with him first. :)

              1. 6

                Working on implementing an Earley parser using this amazing tutorial. I’ve read that site about a dozen times, and last night it finally clicked. Money quote from the author:

                Most compiler texts are obsessed with slaying the compiler dragons of the 60s with 70s technology.

                If you have any sort of appreciation for algorithms or parsing, you should study it.

                1. 4

                  The coverage in Dick Grune’s Parsing Techniques: A Practical Guide is great, and his page for the first edition includes PDFs. If you’re interested in parsers in general, it’s definitely worth picking up.

                  I’ve also been working on implementing Earley parsers, and the Earley/intersection parsing approach seems particularly compelling. It’s described in chapter 13 in the second edition, doesn’t seem to appear in the first.

                  1. 2

                    So Earley will recognize all context-free grammars, including ambiguous ones? I have two problems with that:

                    1. Doesn’t it have to return a “forest” of all possible derivations of ambiguous grammars? What do you do with that? That doesn’t seem to be solving the problem; it’s more like pushing it to a later stage.
                    2. Context free grammars aren’t powerful enough to represent most languages we use (C, C++, shell, Ruby, Perl 5 and 6, etc.). So even if Earley magically solved problem #1, it still wouldn’t be an ideal algorithm.

                    I’ve written about this length in various comments, but I think CFGs are the wrong abstraction for human programming languages. They’re simultaneously not powerful enough and too inefficient.

                    My argument is analogous to the one here regarding CFGs and network protocols. This is admittedly is a fringe claim, where as “CFGs should be used to parse programming languages” is the claim you will read in nearly every textbook:

                    https://lobste.rs/s/uyjzjc/science_insecurity_meredith_l_patterson#c_pzjzxh

                    One of the langsec people responded on HN: https://news.ycombinator.com/item?id=16126234

                    I read that paper about “calc-regular languages” but it was immature – which the authors admitted – it still can’t handle some common constructs.

                    Anyway, my overall point is that we should use formalism, but formalism should match practice, not the other way around. Practice shouldn’t be put in the straitjacket of a theory (especially when that theory doesn’t provide you with good efficiency or correctness guarantees, as CFGs don’t).

                    (FWIW I have a very messy wiki page about the difficulties with parsing here: https://github.com/oilshell/oil/wiki/Parsing-is-Difficult

                    I don’t believe any of them are addressed by Earley parsing. The main bugaboo I have now is that every language has at least TWO parsers. One for the compiler/interpreter, and a totally separate one for linting/formatting. The former ignores whitespace; the latter must recognize whitespace. Even a brand new language like Julia has two parsers!!!

                    Any system based on CFGs don’t solve that problem. I do NOT want parse trees (which include the full derivation of the string from the grammar); I want ASTs or “lossless syntax trees”.

                    1. 2

                      Well, I’m not happy about how Earley detects ambiguity TBH. But I am looking for something that is simple enough to take the question off the table so I can continue moving forward for now.

                      You ended up using Pratt parsing for Oil, correct? The choice of a parser for my project is still very much up in the air; the most important consideration is whether I can build the grammar immediately before parsing a module, which isn’t a great use case for a lot of existing tools. I don’t need the ability to mutate the parser while I’m parsing, as I don’t mind cheating by pre-processing to locate all the grammars in use.

                      From my research, Pratt parsing looks really underutilized, but I wasn’t able to discern whether it was painful to build the rules programmatically after parsing patterns and such, and whether it’d be substantially different from PEGs. For example, if I’m building a for x in iterable statement, would matching the in keyword be painful, especially if the metagrammar specification was declarative in nature? I won’t be writing the led() and nud() methods by hand here, they’ll be generated from specifications provided by the user. Also, I was initially put off by all the precedences on everything, then later realized those are implicit in most grammar specifications anyway, so that is pretty unfounded.

                      1. 2

                        Also remember you can always use more than one parser. sklogic used to point that out on HN when people were choosing between one that succeeds/fails fast and one easy fod error messages. Something like that.

                        His tool on “combinatorylogic” Github is a LISP that embeds a bunch of DSL’s he mainly uses for program analysis. So, his double-parser strategy might be well-informed. ;)

                        1. 1

                          Oil is best described as 3 interleaved recursive descent parsers + one Pratt parser for expressions, with 15 or so lexer modes. [1]

                          Pratt parsing is really best suited for expression languages, not a whole programming language, as Crockford does in his widely read article (which I reviewed here [2]).

                          Although Pratt parsing isn’t a grammar-based approach, there is something called an “operator precedence grammar” which is a subset of CFGs, and what it handles roughly corresponds to that set of languages:

                          https://en.wikipedia.org/wiki/Operator-precedence_grammar

                          Bob Nystrom has a good way of putting it:

                          http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/

                          If recursive descent is peanut butter, Pratt parsing is jelly. When you mix the two together, you get a simple, terse, readable parser that can handle any grammar you throw at it.

                          I would add the caveat that this technique is really good for implementing existing languages. If you are designing a language from scratch, it might be helpful to use some kind of parser generator to guide the design.

                          In summary, I would say the parsing method you use depends very heavily on the language. And many languages are compositions of multiple languages. I don’t have experience with Earley but I’m sure the same thing is true: it’s good for some things and not good for other things.

                          [1] http://www.oilshell.org/blog/2017/12/17.html

                          [2] http://www.oilshell.org/blog/2016/11/02.html

                        2. 2

                          Have you heard about parsing with derivatives?

                          I recently explored the possibility of doing this with Clojure’s spec. Here’s an example:

                          (s/conform ::s-expression "(abc  def)")
                          ⇒
                          [[:sub {:left  \(
                                  :nest  [[:atom [\a \b \c]]
                                          [:whitespace [\space \space]]
                                          [:atom [\d \e \f]]]
                                  :right \)}]]
                          

                          Notice that in the third line, the whitespace has been captured. A linter could examine whitespace and a compiler/interpreter could throw it out. Specs can be bidirectional too, so a formatting tool could take a data structure like the above, adjust white space, and unform it, producing formatted source code.

                          Regarding returning a parse forest, the author of parsing with derivatives goes into that here. On the next slide, the author alludes to the possibility of modifying the grammar during parsing (!) which may be of particular interest to you.

                      1. 8

                        High line coverage doesn’t necessarily imply covering enough of the state space. (Branch coverage is only slightly better in this regard.) As the post notes, testing with groups of input that correspond to different equivalence classes is more meaningful, because it exercises the code under test in several different ways.

                        There’s a problem hiding in plain sight, though: how often do we know all the edge cases?

                        Property-based testing, fuzzing, model checking, static analysis, and symbolic interpretation help here: Property-based testing and fuzzing do guided random exploration of the state space and report edge cases (known and unknown). Model checking, static analysis, and symbolic interpretation reason about possible behavior within the state space as a whole (possibly with some upper size bound).

                        1. 1

                          If you know a line is run during a test suite, it tells you nothing about whether it’s well tested.

                          However, if you know the line is not executed during a test suite, that does indeed tell you something about how well tested it is.

                          I don’t work towards 100% coverage - I work away from having code I know isn’t tested. It looks like the same thing from a distance, but it’s an important distinction to me.

                          1. 2

                            Absence of coverage can definitely be a useful insight. Line coverage can be an absence of evidence, but negative line coverage is evidence of absence. :) It just seems to lose something in translation when that coverage becomes a metric that triggers failures in CI, though.

                            (Also, if test input generated via property-based tests still aren’t covering parts of a program, it’s a sign the generator needs more variety.)

                          2. 1

                            I like to move this back to the root. Eliminate edge cases as much as possible.

                          1. 6

                            I’m actively working on multi-core support for theft again. While I’m still integrating the code for shrinking failures, running individual tests parallelizes nicely: in some early/casual benchmarks, running 10,000 trials of a CPU-bound test on a 4-core system indeed runs ~4x as fast as before. Shrinking should also be well-suited to parallel exploration. Once that’s working, I’ll do more substantial benchmarking, on a system with 40 cores.

                            It may be a couple more weeks before the next release, but in the time I’ve found for it, I’m making steady progress.

                            Incidentally, taking an existing project and restructuring it so that individual units of work can be relocated to different processes, moved around a network, etc. is an interesting exercise. A lot of implicit decisions become explicit. Thinking about what actually needs to be passed along in messages to the next phases (rather than everything staying in-scope by default, due to sequential computational inertia) draws attention to implementation details hiding in plain sight. Also, this restructuring is also a bit tedious in C, because, well, C isn’t Erlang. Pattern matching and tagged tuples would help a lot…

                            I started learning x86-64 assembly a couple weeks ago, and started doing Project Euler for practice. I’m still only a couple problems in, because I’ve focused on better understanding details like the costs of different approaches. The early problems have just the right level of difficulty for useful self-study. It also gave me some context that was useful for Meltdown & Spectre-related investigations last week.

                            Finally, I am working on a few submissions for the IOCCC. :D

                            1. 2

                              While both are probably few years out of date, here’s a post about building a 20-core (40 thread) server with 64 GB of RAM, with roughly the same budget. Building with nine Raspberry Pis doesn’t seem that efficient, unless your goal is also to “make[] parallel concepts explicit”.

                              (I haven’t done this yet, but it’s tempting, especially as I’m working on multi-core support for theft…)

                              1. 3

                                Between fighting a cold and jury duty, not so much reading lately. I just started re-reading Alexey Radul’s thesis, Propagation Networks: A Flexible and Expressive Substrate for Computation, though.

                                For networking, here are a couple different books I’ve found helpful. Each has a pretty different angle, at least one will probably resonate with you.

                                If you’re working in C, I wrote socket99, to make it simple to set up several common cases in the BSD sockets interface, using C99 designated initializers. (For example, opening a socket for an async TCP server.) The BSD socket API is especially gnarly.

                                (This is paraphrased from a comment I left on metafilter a while ago, FWIW.)

                                Also, feoh recommended The Linux Programming Interface. While it isn’t primarily a networking book, it does explain the socket interfaces and such in depth, among many other things. Stevens’ Advanced Programming in the UNIX Environment covers a lot of the same material, in a less Linux-specific way (which can be a pro or con). While they mostly overlap, one or the other might seem clearer for particular topics.

                                1. 4

                                  Some favorites:

                                  1. 1

                                    If I’m not mistaken, Bernstein had a talk about this exact issue at some point and also made up a strong hash for hashmaps to prevent DoS…

                                    1. 2

                                      siphash: https://131002.net/siphash/

                                      Switching out Lua’s hash algorithm would be a very localized change. It’s one of not particularly many languages that expects to get embedded in another codebase, with customizations.

                                    1. 5

                                      Here’s a study that suggests that lines of code is a better predictor of the number of bugs/faults in a codebase than a lot of other historical OO-related metrics.

                                      I would be very interested in reading other studies about this, rather than just anecdotes.

                                      1. 1

                                        Those metrics might be useful for maintenance though. Nothing is worse than jumping around 20 files and classes with virtual methods to figure out what’s going on.

                                      1. 14

                                        A property test may assert that, for arbitrary input, the code under test won’t crash, throw exceptions, have assertions fail, etc. This is the most fuzzer-like (fuzziest?) side of property testing.

                                        A property test might instead assert that, for some arbitrary series of calls to a stateful API, the return values along the way make sense and the end state is reasonable. It might check that, for any arbitrary ordering, a sequence of interactions in a distributed system won’t violate any invariants. This looks less like fuzzing, more like a model checker. Failures found by these tests usually have a much clearer narrative than those found by a fuzzer.

                                        Property-based testing overlaps with fuzzing, sure, but it’s not the same thing – each has different trade-offs, emphasizes different things. Property-based testing usually needs a bit more test-specific setup, but is usually better at shrinking bugs to minimal counter-examples (since input generation is better defined), and can focus generated input more to specific parts of the state space.

                                        Model checkers often have total coverage, and can test an abstract design rather than an implementation. Fuzzers usually make little or no assumptions about the input (and may be entirely directed by heuristics and coverage data). Property-based testing can fluidly switch styles per test. They’re all valuable, and understanding how they differ helps to apply each more effectively.

                                        1. 2

                                          It might check that, for any arbitrary ordering, a sequence of interactions in a distributed system won’t violate any invariants. This looks less like fuzzing, more like a model checker.

                                          If your language supports contracts, you can convert any invariant error into a crash so your fuzzer now does PBT, too! They’re definitely different things that have different purposes, but they share enough in common that both can learn a few tricks from each other.

                                          1. 8

                                            not necessarily any invariant, even though it could cover many.

                                            For example, a property-test can use a model-based approach where you could model the values of state machines on 3 different computers as a local data structure and take guided-random steps through operations. The model you have and use to validate the output of the system is able to take into consideration the perceived state of the entire system as an external observer.

                                            For example, if you have 2 books for sale, sell both, and the model considers 2/3 nodes to each have one book, but the local result from one of the nodes mentions it doesn’t have it (or another one has too many), the local invariants of each machine may individually be respected, but the system-wide one broken.

                                            Property-based testing could detect such a case, but unless your language’s design by contract can handle such a global view, you’ll have trouble dealing with it. You’d get closer to it if what you fuzzed was a program that did model checking of the system under test, but then you’re making quite specific harnesses and getting pretty close to what property-based testing does already. I’d say you’re not necessarily using the tool the way it was intended, whereas that use case is a fairly direct one in Erlang’s quickcheck (and related open source/free implementations)

                                        1. 11

                                          You might also find the explanation of bloom filters in “Cache Efficient Bloom Filters for Shared Memory Machines” helpful, particularly the description of how to make bloom filters dynamically grow as they start to get too full.

                                          My C property-based testing library, theft, uses a dynamic blocked bloom filter to check whether it has already run a property test with a particular combination of argument(s). This eliminates a lot of redundant activity, and also helps track how often duplicates are getting generated.

                                          1. 4

                                            I know silentbicycle knows this, but I’m working on a set of bloom filters written in Rust. I already have an implementation of a Blocked Bloom Filter based on the paper linked above.

                                          1. 7

                                            I did this too, except I pipe it through cowsay instead: git = !exec cowsay git

                                            1. 12

                                              I’m going to be at Strange Loop & PWLConf this week!

                                              I did more work on theft this weekend: I’m making progress on adding multi-core support (what’s faster than generating and running tests on 1 core? doing it on 40 cores in parallel!). I got theft running property tests on its own concurrent planner/scheduler, which found a couple subtle bugs by generating arbitrary event interleavings.

                                              Work: Writing code to compile a huge graph data structure to generated C code. While I can’t say anything about what it represents, I seem to have discovered some pathological cases in clang related to large switch/case bodies. In particular, changing the formatting was the difference between building in 3m45sec and 7h20m. When my main work is done, I want to write a script that generates similarly structured (but IP-free) C code and post a bug report for clang.

                                              1. 4

                                                changing the formatting was the difference between building in 3m45sec and 7h20m.

                                                What on earth? I thought I knew how compilers worked.

                                                1. 3

                                                  The parser is often the slowest pass - it’s the only one that traverses every single byte of the input.

                                                  I’d love to hear more about what happened here.

                                                  1. 6

                                                    Parsing isn’t the problem – while it will be clearer when I get around to writing that generator script (which may not be for a bit), the general idea is this:

                                                    Slow version: There’s a switch-case statement, which has about 100,000 cases. Each of the case bodies is braced (i.e., has no variables that escape the scope), calls a function with an ID and a bool that sets or clears one or more flag bits on a struct, and a subset of the case bodies end in a goto that jumps to an earlier case body. (The rest end in a break; there are no cycles or fall-throughs.) The generated .c file is about 125 MB. This version took over 7 hours to compile (on a new MBP laptop).

                                                    Medium-fast version: There are no gotos, each case body calls a function named “case_X” for each case body that it would previously goto, walking up the chain of earlier case bodies (i.e., there is duplication). There are forward references to every function (static) generated at the beginning of the file, the functions all appear after the huge switch-case function. This version took about 75 minutes to compile, and is about 300 MB. (Parsing isn’t the problem!)

                                                    Fast version: Same as the medium fast version, except the switch-case body is broken up into several functions, and there’s a wrapper that says if (id < 10000) { return switch_fun_lt_10000(id); } else if (id < 20000) { return switch_fun_lt_20000(id); } else ... up to 100,000. (This could be broken up into a binary tree, but cache locality probably makes up for it. I’ll benchmark later.) I shouldn’t need to do that, but this version only takes 3m45sec to compile. Reducing the number of cases in the switch statement helps quite a bit.

                                                    Something I haven’t tried yet is just indexing into a numeric array of function pointers; it should be functionally identical. I started using switch-case because it’s more declarative, and should have the same result.

                                                    There seems to be something super-linear (> O(n)) in the handling of switch-case (probably some sort of analysis, checking everything against everything else), and it gets noticeably worse as the number of case labels increases past 10,000 or so. Adding gotos between them adds more burden to the analysis. It should be able to rule out interactions between them, though.

                                                    1. 1

                                                      That’s fascinating. I look forward to the bug thread when they find the cause. :)

                                              1. 4

                                                More work on theft this weekend:

                                                • Investigating using clang’s coverage tracing to make theft’s search phase smarter. Most of its work so far has gone into its shrinking phase, once it has found failures. Fuzzers like afl-fuzz and libfuzzer do the opposite – work has gone into making them smarter at searching for bugs, but they make little effort to shrink what they find into a simple bug report. Ideally, theft would have a general hook to inform theft about coverage info as a property test runs, rather than anything too specific to clang’s implementation (and I’d rather not completely reimplement afl-fuzz either!).

                                                • Started investigating structural inference algorithms. I think sequitur may be a good fit: While building up property test input, generators request random bits in specific sizes, e.g. [11, 3, 8, 1, 3, 8, 1, 3, 8, 1, 3, 8, 1, ...]. Treating the repeated groups (here, [3, 8, 1]) as a single unit should make searching for simplifications smarter.

                                                • Preparing to add multi-core searching and shrinking. So far, it’s mostly been removing things from the API that are optional, have better alternatives now, and would greatly increase the surface area of the concurrency support. In particular, custom shrinker support is gone (autoshrinking works sufficiently well that I’d rather focus on improving it further; writing good custom shrinkers requires understanding some really subtle issues), and custom hashing callbacks are also gone because they can be handled better internal to autoshrinking.

                                                • Adding failure tagging: previously, theft couldn’t distinguish between failures that had different root causes, so bugs caused by simpler input could shadow others. Usually, the rarer / more complex bugs are more interesting. If a failure ID is set before returning THEFT_TRIAL_FAIL, then the shrinker will focus on failures of that type and untagged failures (which could be entirely new). This is already implemented in the single-process case, but use with forking overlaps with the multi-core work.

                                                • Improved the adaptive weighting for tactics used when shrinking failing input. Now it’s better about reducing the frequency of ineffective operations.

                                                1. 1

                                                  Have you been applying theft to theft? Lots of CompSci folks in the space do that with KLEE people finding some bugs that way.

                                                  1. 2

                                                    Applied to itself as a whole, not yet. There are several tests with a contrived property that stresses the shrinker in a certain way, and I check that it successfully finds the real local minimum quickly. I’m going to be using the non-concurrent/single process version of it to verify some new code that supports the concurrent mode, though.

                                                    1. 1

                                                      Cool. Good stuff.

                                                      1. 2

                                                        Now I have! As part of the concurrency support, I’m making a new scheduler/planner API that can coordinate meaningful next steps between several dozen independent worker processes. It’s not integrated into theft yet, but I’m running a property test on it as a standalone API (“if it runs N trials with M concurrent workers, with controlled interleaving, can it finish the tests without getting stuck or triggering asserts?”), and it’s found a couple bugs that my integration tests missed.

                                                        1. 1

                                                          Good work! Keep an eye out Monday since Im submitting something relevant you might want to try. It also caught unexpected concurrency bugs.

                                                1. 2

                                                  Continuing development on theft:

                                                  • Moving functionality that was implemented in C control flow (for loops, switch/case, conditionals) into a separate API, called “planner”, to coordinate concurrent work done by one or more processes in a worker pool, as mediated through a supervisor communicating with async IO. (Did somebody mention Erlang?) The code is a pretty simple state machine, but has a couple points where it needs to decide between multiple potentially useful options. It also lets me test quite a bit of overall logic in isolation from all the multi-process Unix syscall stuff. I wrote about some details on the randomized-testing google group yesterday.

                                                  • If any of you are using theft, I’d love to get usability feedback, or hear more about how you’re using it.

                                                  Otherwise, trying to catch up on addressing fairly small PRs against other projects, since pretty much all my personal project time lately has gone into theft for a while.

                                                  Work: Fuzzing, and helping to fix stuff I’ve found. (deliberately vague)

                                                  1. 5

                                                    Writing is formalizing ideas in a different dimension than programming, and if something feels really handwavy in that description, that’s useful information. I’d expect, say, specific command line arguments or the particulars of a schema / config format / whatever to evolve quickly early on, but the overall goals and fundamental design should have a solid foundation. I find writing helpful for quantifying uncertainty – figuring out where the gaps are, and what experiments might help draw things into focus. That can be useful at any stage.

                                                    1. 5

                                                      I really like the “not every is an X” and “extremist programming” points. Especially how they contrast with a different Perlis quote: “It’s better to have 100 functions that operate on 1 data structure (X) than 10 functions that operate on 10 data structures”.

                                                      “Everything is an X” gives you a lot of benefits, but it’s also where a lot of languages fall down:

                                                      • Shell: everything is a string. But even booleans and numbers are strings, which makes them annoying to deal with.
                                                      • Java: everything is an object; there are no first-class functions. Except that objects often don’t compose. They’re harder to compose than functions. I like Steve Yegge’s characterization of Java as “Legos that don’t actually fit together”.
                                                      • Lisp: everything is a cons cell, even code is a bunch of cons cells. Downside: hash tables are awkward, arrays are awkward, mutation is common but awkward, etc. Python/Ruby/Perl/JavaScript took a lot of lessons from Lisp (dynamic typing, garbage collection), but made concessions to reality / how computers actually work.
                                                      • C: everything is a pointer/offset. You can cast anything to void*. Mistakes are very expensive.
                                                      1. 4

                                                        As a specific data point: Lua is another “Everything is an X” language (everything is a “table”, a (possibly nested) key/value hash table that recognizes when it’s being used as array), but the escape hatch of conveniently doing things in C that are better done in C is pervasive in the language design. It seems like a sweet spot.

                                                        Tcl is another “Everything” language – everything is a string – but strings seem a bit too loosely structured. Still, Tcl works curiously smoothly as a shell, and has good support for quoting/wrapping other arbitrary data (such as with expect).

                                                        1. 3

                                                          I think some of these are inconsistency problems, like, “everything is an x, except for y,” in situations where there isn’t a really good reason for y to be different, it was just convenient at one point.

                                                          In shell I think the way that process exit codes behave differently from other values (which are always strings) is weird and makes it harder to script

                                                          I think in Java that the weirdness around primitives being set entirely apart from object types is a good example of “everything is an object, except …” being a problem.

                                                          On a different note, I think “booleans and numbers are strings” isn’t the biggest problem for programming in Bourne shell. IME a much bigger problem is that the “everything is a string” rule doesn’t accommodate arbitrarily nested collections very well, so it’s very hard to structure data.

                                                          1. 2

                                                            Not everything is an object in Java, and first-class procedures are very easy to encode as objects anyway (verbosely, of course). This just happens to be a foreign style for the Java community, and thus doesn’t get much support from either the standard library or the rest of the library ecosystem.

                                                            It’s also funny that you say more recent dynamic languages “make concessions to how computers actually work”, considering that Lisp implementations are sometimes competitive in performance with the likes of ML, Eiffel, etc. (of course, by selectively turning off the dynamism), whereas the main Python and Ruby implementations are consistently one or two orders of magnitude slower (and the only way to make things any faster is “call C, C++ or whatever”).

                                                          1. 8

                                                            I’m getting ready for a new release of theft, my property-based testing library for C.

                                                            Version 0.4.0 (CHANGELOG):

                                                            Most significantly it adds support for shrinking crashes and infinite loops, and improves dataflow for the hooks – this should make certain custom behaviors easier to set up. For example, there’s less indirection necessary to pass a log level to the property test function, which can be bumped way up when re-running the failing test after it finishes shrinking its input.

                                                            There’s a few more API changes, but the major version is still 0, and the changes should help with usability long-term. The details are in the changelog.

                                                            I’m running final tests now on several platforms, with and without valgrind. This takes a bit, because part of the test involves intentionally crashing thousands of times.

                                                            Work this week is going to involve a lot of fuzzing and property-based testing, working on stressing some exciting new code in novel ways and shaking out obscure bugs before deployment.