1. 41

  2. 8

    Neat. It’s nice to have some features like pattern matching in C-likes. It’s also very awesome to have a language spec for a language like this. For same reason, C– has ended up being used a lot in schools for teaching compilers. It would be great to include ML-like features in compiler courses for C-style languages, like pattern matching and unions.


    Mae Myrddin yn yr enw Cymraeg am Merlin. Ti'n dweud e fel “Muh-rh-th-in” mewn Saesneg! Mae dd yn swndo mor fel “th” galed mewn Cymraeg. Nagw e'n swndo fel d.

    Myrddin’s the Welsh name for a Merlin character. You pronounce it as “Muh-rh-th-in”. The double d is actually a letter, which sounds more like a hard th in English. It doesn’t sound anything like d.

    1. 3

      Cool, I’ve been looking for implementations of pattern matching compilers.


      If that’s it, it doesn’t look too long at 839 lines. As far as I understand, compiling to decision trees results in the fastest but largest code, and backtracking automata is the opposite tradeoff.

      https://hal.inria.fr/inria-00000763 (page 2)

      People say the decision tree requires exponential space in the number of patterns but I don’t quite get that?

      Also checking for pattern matching exhaustiveness is apparently NP-complete?


      1. 4

        Yes, that’s it. Note that it doesn’t try to be especially clever about optimizing things. Still, I’ve found that in practice, either the matches are often shallow, with the wildcards (captures) clustered near the end. So far, it hasn’t mattered too much, although I also haven’t put so much work into generating fast code.

        1. 3

          People say the decision tree requires exponential space in the number of patterns but I don’t quite get that?

          If there are N things to test, a decision tree might need height up to N to make decisions. A complete height-N binary tree has 2^N nodes. In practice not every decision path will be full-height, but depending on how you construct the tree, in the worst case you can still end up with size roughly on that order. One bit of intuition for why is that you can have exponentially many duplicate tests: somewhere down the tree, it might turn out that 6 of the 8 branches need to make exactly the same test, so this test is duplicated 6 times. Ways to avoid this include heuristics to ensure the tests needed by more paths end up further up the tree and therefore not duplicated, and explicit sharing between branches, i.e. a decision DAG rather than a proper decision tree.

          1. 1

            OK yeah, so then the way I think about it is that the branching factor of the tree is related to the number of patterns/cases… but the height of the tree is related to the number of variables tested (the depth of the patterns).

            The height is the thing that is exponential, so it’s the number of variables tested. You might need to test a single variable against like 20 things, but that doesn’t make it exponential.

            I guess it just doesn’t seem like a big deal in practice to be exponential this way. Am I wrong? Are there some examples from real code where just doing the naive thing results in a lot of bloat? I feel the patterns are commonly at most 3 or 4 levels deep, but I don’t have tons of experience with ML.

            I guess this could answered with some program analysis…

        2. 3

          Re. Your note about the pronunciation. Bengali has 5 (I think) d-like sounds. I wonder if your “dd” is then like the bengali dh “ধ”. This is pronounced like this

          1. 3

            Hmm, not quite. The dd doesn’t sound very much like d at all. At least not to my Welsh ears. Here’s a north Wales pronunciation -> https://youtu.be/nnH33183y2M?t=17

            1. 2

              Wikipedia’s Welsh phonology page groups it as a voiced dental fricative, like Greek δ. Your sample doesn’t sound identical to how I would pronounce Greek δ, but much closer than an English ’d' in either case.

              1. 3

                Yep, agreed, that is closer. When I make that noise, I push my tongue up behind my teeth, leaving a gap between the roof of my mouth and my tongue, and then sharply exhale, allowing my tongue to vibrate. I don’t know that helps much, but hopefully it does! It’s also important to note that the noise of a word in Welsh puts a lot of pressure on the first syllable - as dd is at the end in Myrddin, then it has a much softer sound than if you pronounce the letter alone.

                1. 1

                  That does actually sort of help explain why it sounds a little different, I think. I pronounce Greek δ as you describe, except with the tongue pressing against the gap between slightly parted upper and lower teeth. Whereas it sounds like your tongue is against the upper teeth, not against a gap between upper and lower? The linked Wikipedia article on voiced dental fricative briefly mentions an “interdental” variant (which is what I use). When I attempt both varieties, the interdental version sounds “harder” to me, i.e. if there’s a spectrum from the softer th in “think” to the harder th in “that”, both are on the harder end of the spectrum, but interdental is further there. Maybe?

                  Getting even more off-topic in this off-topic thread, but the first time I heard Welsh on a train, I thought it was Swedish. I’m not 100% sure why, because there is no actual linguistic relationship. The closest I can figure out by listening to some stuff on YouTube is that there’s a certain upwards intonation at the end of words like “bydd” that is very similar to an upwards intonation in the Swedish tonal system on words that end in “-iet” (example here). There is something else about the phonology and/or intonation that really minds me of Swedish though (specifically Swedish, not Danish… I lived in Denmark for years and Welsh sounds nothing like that).

        3. 7

          Of all the random programming projects on the internet, Myrddin is one of the few that I think about routinely. I’m really excited to see the QBE backend go in!

          Actually, this makes me wonder whether I should target QBE over LLVM for my own personal project.

          Congrats to Ori on a first release!

          1. 2

            Remember, this is 0.1, not 1.0. Its a point of reference, but not a stability promise in terms of API or a guarantee there are not bugs. Myrddin currently has a small userbase, so there will be issues.

            Pull requests will be welcomed.

            1. 8

              Yes. More or less, I stopped hitting compiler bugs when writing code. New users tend to find new bugs. As a result, this release was created to address the shortage of chaos monkeys.

              1. 3

                That’s an incredible reason to hit a 0.1 release. Kudos to you for keeping this going for so long!

            2. 2

              Nice changelog :)

              1. 1

                This is an amazing engineering effort. Congratulations on the release!