1. 28
  1. 11

    Similar to Fibonacci, a piece of code that enlightened me was the simple PEG implementation, which is represented as a pair of mutually recursive procedures. I used to think of parsers as really difficult to understand, and hard to write. But the simplicity of PEG was an eye opener.

    def unify_key(key, text, at):
       if key not in grammar:
           return (True, at + len(key)) if text[at:].startswith(key) else (False, at)
       rules = grammar[key]
       for rule in rules:
           res, l = unify_rule(rule, text, at)
           if res: return (res, l)
       return (False, 0)
    
    def unify_rule(rule, text, at):
        for token in rule:
              result, at = unify_key(token, text, at)
              if not result: return (False, at)
        return (True, at)
    

    Given a grammar as below, we try to unify the input string with the starting key – here expr. If To unify a key with the given string (unify_key), we check if the key is in the grammar. If it was not, then it is a plain token match, and we do simple string match. If not, we get the production rules defined in the grammar corresponding to the key, and try each rule one by one using unify_rule. We return as soon as any rule succeeds. The unify_rule is also simple. It gets the parts of the rule, and tries to match them in sequence using unify_key. If any of these fails, then return failure.

    grammar = {
        'expr': [
            ['term', 'add_op', 'expr'],
            ['term']],
        'term': [
            ['fact', 'mul_op', 'term'],
            ['fact']],
        'fact': [
            ['digits'],
            ['(','expr',')']],
        'digits': [
            ['digit','digits'],
            ['digit']],
        'digit': [[str(i)] for i in list(range(10))],
        'add_op': [['+'], ['-']],
        'mul_op': [['*'], ['/']]
    }
    

    A driver

    import sys
    res, till = unify_key('expr', sys.argv[1], 0)
    assert(till == len(sys.argv[1]))
    print(res)
    

    Using it:

    $ python3 peg.py '1+10/(3+2)'
    True
    $ python3 peg.py '1+10/x(3+2)'
    Assertion failure
    

    While this implementation can have really bad performance due to back tracking, all one needs to do to make it linear is to decorate the procedures with @functools.lru_cache(maxsize=None), which memoizes the parsing, and makes parsing linear. While this does not implement the complete PEG syntax, it implements enough to be complete in terms of the grammars that can be represented (see Ford 2004 for details).

    Edit: Thanks @asrp for finding that the original had issues.

    1. 2

      Tried it with something like

      print unify_rule(['expr'], '1+10/(3+2)', 0)
      

      There seems to be quite a few typos in here

      -   if key not in grammar: return (text[at:].starts_with(key), len(rule))
      +   if key not in grammar: return (text[at:].startswith(key), len(key))
      
      -      if not result: return False
      +      if not result: return (False, 0)
      
      -  return (True, len)
      +  return (True, l)
      

      There must be more errors since it still only matches the first character of the expression.

      I found confusing for both rule names and string token to be encoded as strings. The same goes for concatenation and ordered choice to both be encoded lists.

      Thanks for bringing up this example, though!

      1. 2

        Sorry about that, and thank you so much for taking time to run it. I took the code from an old presentation, and seems I made errors while typing it in.

        I have fixed it.

        1. 1

          Thanks for the new version! I understand what its trying to do better now.

      2. 1

        Thanks for the example! While PEGs are not familiar to me (I’ll need to look into them), there’s a lot of potential for minds to be blown in the parser world. Playing a few times with parser combinators did exactly that to me - if that’s something that’s not as familiar to you, then here’s a nice example in Scala.

        1. 1

          The parser combinators are a way to represent PEG directly in code. So you will find them quite familiar. I love combinator based parsing, they are quite powerful and enlightning – A while back, I did a few posts (1, 2, 3, 4, 5, 6, 7) on building a small concatenate language using Parsec (The original monadic combinator library in Haskell) for parsing. You may find them interesting.

      3. 6

        For me, the single piece of code that had the largest impact was the ddmin from Andreas Zeller which is used to isolate failure causing input from much larger input.

        import random
        import string
        
        def complement(s, i, l): return s[:i] + s[i + l:]
        
        def some_complement_is_failing(s, npartitions, testfn):
            subset_length = len(s) // npartitions
            items = range(0, len(s), subset_length)
            complements = (complement(s, i, subset_length) for i in items)
            return next((i for i in complements if testfn(i)), None)
        
        def update_input(s, npartitions, fn):
            v = some_complement_is_failing(s, npartitions, fn)
            if v:  return v, max(npartitions - 1, 2)
            else:  return s, min(npartitions * 2, len(s))
        
        def ddmin(s, fn):
            npartitions = 2
            while s:
                s, n1 = update_input(s, npartitions, fn)
                if npartitions == len(s): break
                npartitions = n1
        return s
        

        Given a failure causing input, it is able to narrow the string down to the simplest sequence of characters necessary to cause the same failure. It is not limited to just strings but to any set of input conditions, and has been extended numerous times – The same algorithm lies behind git bisect. However, the core is still just this much.

        As an example, run the following code: The test succeeds if the input contains “()”. So, we progressively trim the input until we find the smallest input that can cause the test to succeed.

        def test(s):
            print("%s %d" % (s, len(s)))
            return set('()') <= set(s)
        
        def fuzzer(): return ''.join(random.choices(string.digits +
                                                    string.ascii_letters +
                                                    string.punctuation, k=1024))
        
        mystr = fuzzer()
        minimized = ddmin(mystr, test)
        print("%s => %s" % (mystr, minimized))
        

        Edit: here is a better ddmin code than the one above. The above is kind of mangled by me.

        1. 1

          (late reply) Thanks a lot for the link to this code! I ended up playing with it here:

          https://github.com/oilshell/blog-code/tree/master/ddmin

          (I tried your version first, but couldn’t get it to run.)

          I put it in my blog-code repo to share with somebody… I won’t write a blog post until I’ve actually used it for something. But it was a fun exercise to learn about this interesting algorithm!

          It’s interesting how much Python has changed since 2005 or so, when that code was written! Or maybe it was 2002 or earlier, and the latest timestamp is just 2005.

          1. 1

            Ah thank you for the reply!.

            Any idea what was wrong with my version?

            1. 1

              I ran it but I don’t remember what the error was! Since you linked to the original I decided to work with that instead.

              I like rewriting code to understand it, and the original was obviously re-writable :)

              1. 1

                Ah I see!. As you can see from my example, I like rewriting code to understand it too :)

                You seem to have kept the original function mostly as is. Is that better to understand rather than the multiple function structure I have adopted?

        2. 3

          I can’t read that page. I see it as a purple background with a faint red texture on it. Does chrome on Android have a high contrast feature?

          1. 4

            Ugh, with JavaScript disabled it’s even worse. The text isn’t even readable.

            1. 4

              Luckily Firefox has Reader View (and similar for other browsers.)

              I’m not against making things look pretty, but no default way to read simply plain text is just unforgivable.

              1. 3

                Thanks for the feedback, I’ll go through it with our web team to improve things in the future! It’s a shame for our authors if their content cannot be read.

            2. 3

              There’s some sort of scroll monitoring script that turns the background white, do you have javascript enabled?

              1. 1

                Thanks for replying! The second time I tried the site, it suddenly turned white and I could read it.

            3. 2

              The APL one definitely blew my mind.

              1. 1

                I did stop reading when I reached the common mis-attribution of who did the bulk of the work on the Apollo code.

                1. 4

                  What’s the misattribution in question?

                  1. 3

                    Author here: would love to hear more on this? Fact-checking always appreciated. If you’re referring to Margaret Hamilton’s contribution, the article states her position as leading the team, which does not directly mean that she wrote the code in question. Fair enough, I’ll make note to add the actual author of the critical pieces as well.

                    1. 1

                      I have to get a good book on the Apollo Code (I bet someone here has a recommendation) but I recall that Hamilton was elevated to leadership position at a more mature stage of the project. Once you put a name to a project, you are making it about a personality (rather than the code) and the question of fair attribution comes up.

                      1. 3

                        I think “mis-attribution” is an overstatement. But it’s true that Hamilton did not singlehandedly write the AGC code (nor did anyone; it was a team effort). The single person who probably most deserves credit for the fault tolerance feature is probably MIT’s Hal Laning. He designed the executive and waitlist systems. Hamilton deserves as much credit as anyone, but I wish it was expressed that she was a member of a team rather than a one-person effort.

                        1. 1

                          @craigstuntz thanks for that reference! Is there a good book on the Apollo automation engineering efforts? I’m interested in both the hardware and software and not just the AGC but the whole of it.

                          1. 2

                            I can tell you about some books I’ve actually read; these might not be the best for your specific interests. “How Apollo Flew to the Moon,” by David Woods. But it doesn’t go into the AGC in minute detail. “The Saturn V F-1 Engine: Powering Apollo into History,” by Anthony Young, is good, but doesn’t cover controls at all. There are a couple of books specifically about the AGC which I haven’t read.

                    2. 3

                      You could just give him the specifics. Then he can update the article. I’m curious what yours are since I’ve read several different versions of the story.

                      1. 2

                        This story and the story of Rosalind Franklin are both curious to me. We could start a project - ideally primary sources - to do some archaeology. For DNA all I can think of papers. For the Apollo code it has to be early documentation - all US Gov docs have lists of contributors and responsible people.

                        1. 5

                          I was asking because I did it before to address the possibility that a bunch of people did work and she just put her name on it. She seems to have for her team. So, I started saying Hamilton’s team. The problem occurs enough that I started doing that in general.

                          Now, as to Apollo, I did have some kind of paper that was specific. I faintly recall it saying they were responsible for one or two key modules but they did all the verification. They were the best at making robust software. So, if I’m remembering right, the paper talked like they were handed the code other people wrote to find and fix the errors on top of their own modules. That’s impressive. So is the stacks of spec and code in assembly in the picture. The “Lessons from Apollo” article here has a description of the environment and mindset that led to that as well. Later, she and another woman spent whole career developing CASE tools (HOS and 001 Toolkit) for designing systems with no interface errors that generated code for you. It was like binary trees, Prolog, and Occam combined which is why usability and performance sucked despite it succeeding on robustness and generality.

                          So, that’s what I learned when I dug into it. If I get the papers again, I’ll send you the one with the attributions. No promises that I’m going to do another deep dive into that soon, though.

                          1. 3

                            That would be a very interesting project indeed! Part of the beauty and definitely one of the main reasons the Apollo program was so successful IMHO is specifically the way the work and communication was organized. To this day the project stands as a testament to how such a large-scale project should be carried out effectively. While I’m not privy to the inner working of NASA, there seems to be evidence that some of the organizational systems were lost later and this led to sever inefficiencies. It’s a pity, but luckily it offers us a wealth of learning opportunities.

                            1. 2

                              On Hamilton’s side, it seemed like they mostly just let people do their work their own way. The work was also highly-valued and interesting. So, they came up with innovative solutions to their problems. This doesn’t usually happen in process-heavy or micro-managing environments.

                    3. 1

                      I’ve never seen that one image in the Game of Life article. That looked wild. Do you have a link to that implementation and its parameters so we can replicate it?

                        1. 1

                          I saw some experimentation back when I looked at it with gliders or simple patterns. I didnt know they’d taken it this far with these complex designs. Pretty amazing. Thanks for the link.