1. 8

    If humans are going to be reading, supporting, and re-writing code, I don’t see why we’d want to eschew one’s strongest language, say, English, in favor of one that reads like hieroglyphics.

    Look at what humans already do in domains that require precision, as programming does. “reads like hieroglyphics” is a fair description of a lot of mathematical notation, yet mathematicians have long preferred symbology to English - if you think programmers shouldn’t then you should be able to explain why mathematicians do. Lawyers write “English” but in a famously stilted style, full of clunky standardized constructions, to the point that it’s sometimes considered a distinct dialect (“legalese”) and one could reasonably ask whether a variant that used a symbol-based language would be more readable. And of course it bears remembering that the most popular human first language is notated not with alphabetics but with ideograms where distinct concepts generally have their own distinct symbols.

    Even within programming, humans who are explaining an algorithm to other humans tend not to use English but rather a mix of “pseudocode” (sometimes likened to Python) and mathematical notation.

    Erlang is notorious for being ‘ugly,’ but I wonder what that’s all about. Truly. I like to think most my Erlang code is composed of English sentences, lumped together into paragraphs, and contained in solitary modules. It’s familiar, unsurprising, and quite beautiful when one’s naming is in top form.

    Really? Most languages allow sensible plain English names for concepts. The part of Erlang that’s notoriously ugly - and the part that makes it read very unlike English - is its unusual punctuation style. If the author really finds Erlang English-like to read, I can only assume this is because they’re much more familiar with Erlang than other languages, rather than the language being objectively more English-like than, say, Python or Java.

    1. 12

      Erlang’s punctuation style is more like English than most C-likes out there. Let’s take this statement:

      I will need a few items on my trip: if it’s sunny, sunscreen, water, and a hat; if it’s rainy, an umbrella, and a raincoat; if it’s windy, a kite, and a shirt.

      In a C-style language (here, go) it might look something like

      switch weather.Now() {
      case weather.Sunny:
          sunscreen()
          water()
          hat()
      case weather.Rainy:
          umbrella()
          raincoat()
      case weather.Windy:
          kite()
          shirt()
      }
      

      In Erlang it could just be:

      case weather() of
           sunny -> sunscreen(), water(), hat();
           rainy -> umbrella(), raincoat();
           windy -> kite(), shirt()
      end.
      

      You even get to keep the , for enumeration/sequences, ; for alternative clauses, and . for full stops!

      1. 2

        That example is a little misleading. In C and C++ it could be written like this, which arguably reads more like English than either Erlang or Go:

        if ( weather() == Sunny ) {
            sunscreen(), water(), hat();
        }
        else if ( weather() == Rainy ) {
            umbrella(), raincoat();
        }
        else if ( weather() == Windy ) {
            kite(), shirt();
        }
        
        1. 1

          all you need is the full stop (.) instead of {} and then you have all the tokens of Erlang!

      2. 6

        Mathematicians use a seamless hybrid of prose and formula:

        For every v ∈ V, there exists w ∈ V such that v + w = 0.

        Similarly in code, some parts are more like formulae, some are more like prose, and some are more like tables or figures… and it’s interesting to consider separate syntaxes for these different types of definitions.

        Have a look at the Inform 7 manual’s section on equations, for an example. Here is a (formal, compiling, working) definition of what should happen when the player types push cannonball (I’ve used bullet lists in Markdown to get indentation without monospace):

        • Equation - Newton’s Second Law

          • F=ma
        • where F is a force, m is a mass, a is an acceleration.

        • Equation - Principle of Conservation of Energy

          • mgh = mv^2/2
        • where m is a mass, h is a length, v is a velocity, and g is the acceleration due to gravity.

        • Equation - Galilean Equation for a Falling Body

          • v = gt
        • where g is the acceleration due to gravity, v is a velocity, and t is an elapsed time.

        • Instead of pushing the cannon ball:

          • let the falling body be the cannon ball;
          • let m be the mass of the falling body;
          • let h be 1.2m;
          • let F be given by Newton’s Second Law where a is the acceleration due to gravity;
          • let v be given by the Principle of Conservation of Energy;
          • let t be given by the Galilean Equation for a Falling Body;
          • say “You push [the falling body] off the bench, at a height of [h], and, subject to a downward force of [F], it falls. [t to the nearest 0.01s] later, this mass of [m] hits the floor at [v].”;
          • now the falling body is in the location.

        (Yes, the Inform 7 compiler will solve equations for you. Why aren’t normal programming languages capable of this kind of high school math? Are we living in some kind of weird bubble?)

        1. 2

          Why aren’t normal programming languages capable of this kind of high school math?

          They are. Inform 7 uses English words as syntax, but otherwise is a normal programming language. Why do you think this can’t be solved in, say, Python?

          1. 5

            Of course it can be solved in any general purpose programming language, but none of them feature dimensional analysis or algebraic equation solving out of the box in a convenient and natural way… yet Inform 7 does, oddly.

            I find this interesting because that stuff would seem to be the most obvious use case for computing machines from, like, a 1930s perspective.

            1. 1

              Ah, my apologies for completely misunderstanding what you said.

              I completely agree. Python (and maybe Basic?) are close, but even then they fall dramatically short of “language as code” as Inform 7 does. I wonder what keeps Inform 7 from becoming a more general purpose programming language?

        2. 4

          Also, English isn’t everyone’s strongest language. As natural languages go, it’s pretty complex and inconsistent. If you want your code to be understood by people who are more familiar and comfortable with other natural languages, then your own familiarity with English isn’t necessarily such an advantage in writing code.

          1. 2

            Was going to say something along these lines. These discussions tend to be extremely anglocentric.

            Additionally, many of us work with distributed teams, where there’s no one “strongest language” anyway.

        1. 14

          A property test may assert that, for arbitrary input, the code under test won’t crash, throw exceptions, have assertions fail, etc. This is the most fuzzer-like (fuzziest?) side of property testing.

          A property test might instead assert that, for some arbitrary series of calls to a stateful API, the return values along the way make sense and the end state is reasonable. It might check that, for any arbitrary ordering, a sequence of interactions in a distributed system won’t violate any invariants. This looks less like fuzzing, more like a model checker. Failures found by these tests usually have a much clearer narrative than those found by a fuzzer.

          Property-based testing overlaps with fuzzing, sure, but it’s not the same thing – each has different trade-offs, emphasizes different things. Property-based testing usually needs a bit more test-specific setup, but is usually better at shrinking bugs to minimal counter-examples (since input generation is better defined), and can focus generated input more to specific parts of the state space.

          Model checkers often have total coverage, and can test an abstract design rather than an implementation. Fuzzers usually make little or no assumptions about the input (and may be entirely directed by heuristics and coverage data). Property-based testing can fluidly switch styles per test. They’re all valuable, and understanding how they differ helps to apply each more effectively.

          1. 2

            It might check that, for any arbitrary ordering, a sequence of interactions in a distributed system won’t violate any invariants. This looks less like fuzzing, more like a model checker.

            If your language supports contracts, you can convert any invariant error into a crash so your fuzzer now does PBT, too! They’re definitely different things that have different purposes, but they share enough in common that both can learn a few tricks from each other.

            1. 8

              not necessarily any invariant, even though it could cover many.

              For example, a property-test can use a model-based approach where you could model the values of state machines on 3 different computers as a local data structure and take guided-random steps through operations. The model you have and use to validate the output of the system is able to take into consideration the perceived state of the entire system as an external observer.

              For example, if you have 2 books for sale, sell both, and the model considers 2/3 nodes to each have one book, but the local result from one of the nodes mentions it doesn’t have it (or another one has too many), the local invariants of each machine may individually be respected, but the system-wide one broken.

              Property-based testing could detect such a case, but unless your language’s design by contract can handle such a global view, you’ll have trouble dealing with it. You’d get closer to it if what you fuzzed was a program that did model checking of the system under test, but then you’re making quite specific harnesses and getting pretty close to what property-based testing does already. I’d say you’re not necessarily using the tool the way it was intended, whereas that use case is a fairly direct one in Erlang’s quickcheck (and related open source/free implementations)

          1. 4

            My recollection is that Quickcheck has code to generate minimalist test cases from the input that goes awry, which is a cool feature compared to simply throwing random data at a function.

            1. 3

              QuickCheck generates from the types of the inputs to a function. Fuzzing is for anything that takes user input… so I guess it’s really any function taking that string of bytes.

              1. 5

                Ok, but once QuickCheck finds a problem, it tries to generate an example that’s as small as possible, which is kind of cool:

                https://stackoverflow.com/questions/16968549/what-is-a-shrink-with-regard-to-haskells-quickcheck

                1. 5

                  Haskell’s quickcheck does that. The Erlang variants use combinators that you can write and compose, and let you guide the distribution of inputs you want to have rather than just taking ‘types’ in there.

                  You can then, for example, decide that rather than sending any string, you’re going to take strings that contain 20% emoji, 5% ASCII, 10% sequences that include combining characters, and the rest is taken in linebreaks, escape sequences, and quotes.

                  This turns out to give you an approach that while definitely reminiscent of fuzzing, sits a bit closer to regular tests in terms of how you approach system design (you can even TDD with properties), whereas I’m more familiar with traditional fuzzers being used as a means of validation.

                  1. 3

                    QuickCheck generates from the types of the inputs to a function

                    QuickCheck can generate input based on the types: it has a typeclass called Arbitrary, which provides an arbitrary function that we can think of as a “generic” random generator for those types which implement it (this typeclass is also where shrink is defined).

                    We can also write completely standalone generators when we want something more specific, like evenGen :: Gen Int which only generates even Ints, and we can use these in properties using the forAll function, e.g. forAll evenGen myProperty.

                    There are a two other things to consider as well:

                    • Properties can have preconditions, which are implemented using rejection sampling. For example myProperty n = isEven n ==> foo will only evaluate foo if isEven n is True. If we generate an n which isn’t even, the test is skipped. If too many tests get skipped, QuickSpec tells us. We could achieve a similar thing with boolean logic, e.g. myProperty n = not (isEven n) || foo, but in this case we’re replacing skips with passes, which might give us false confidence in the results (e.g. we might get 100% of tests passing, but never actually generate an input which passes the precondition)

                    • We can use newtype to give a different name and Arbitrary instance to existing types. QuickCheck comes with a NonEmpty alias for lists, NonZero aliases for numbers, etc. The important difference between using a newtype and using a normal function (like evenGen) is that we can ensure some invariant when shrinking: e.g. shrinking an even number shouldn’t spit give us odd numbers.

                  2. 3

                    So do many coverage-guided fuzzers, e.g. afl-tmin. Doesn’t really have much to do with the way the fault was discovered in the first place.

                  1. 4

                    I think @drmaciver uses prop testing in a much different way than I do. I don’t do a lot of prop testing so it’s quite likely that I am the odd ball here but statements like:

                    Often you want to generate something much more specific than any value of a given type.

                    And

                    Especially if we’re only using these in one test, we’re not actually interested in these types at all, and it just adds a whole bunch of syntactic noise when you could just pass the data generators directly.

                    What made me find these odd is that I am generally generating the types my API expects, not special types just for testing. If I need a special type just for testing it tells me that my API hasn’t defined its types strictly enough (Correct by Construction). This post seems much more like maybe the author uses these testing types more like contracts?

                    1. 5

                      One of the examples I like is related to parsers of any kind. Let’s pick one of the simplest cases with a word-counter, that looks for all space characters to delimit words and count how many words have been seen. 1,114,112 code points are usable for this; a single one is the space, 17 fit the ‘space/separator’ family.

                      Generating a random string with a uniform distribution that contains more than 2 spaces is tricky. Generating a random string that contains two of them in a sequence is rarer. Generating one that mixes either with combining marks that could change the contextual interpretation is another one.

                      What you may want to control is the distribution: 20% of characters are going to be space or separator-related, 70% will be anything at all, and 10% will be combining marks.

                      If all you have is a type string, you can hardly optimize for some character sets: in CSV, you’d want more commas, linebreaks, and quotation marks than in the example above; if you’re dealing with HTML, you may want to throw in more brackets and HTML entities. Similarly, if you’re dealing with years, 1970 and 2000 may sometimes prove to be more interesting as central points than 0 (which is not representable in the gregorian calendar anyway).

                      At that point, the precision of your property-tests, or their ability to truly exercise your code, depends on getting generators that are likely fancier and better-directed than what your type system lets you express. Either you need to beef up the type system, or you need to be stuck with less control than what could be ideal.

                      1. 0

                        Let’s pick one of the simplest cases with a word-counter, that looks for all space characters to delimit words and count how many words have been seen.

                        The definition of this problem is already problematic, and not at all simple, though. You can’t assume that out of 1,114,112 code points “space like” characters are the only word separators. “犬と猫” (Japanese), according to Google translate, means “the dog and the cat.”

                        (This makes your point stronger, but was also a troll).

                      2. 4

                        I think @drmaciver uses prop testing in a much different way than I do.

                        Possibly, but observationally I think my way is pretty common among anyone who use property-based testing libraries which are less tied to types. Certainly it’s very common in Python land, but there’s a bit of a biasing factor there in that I wrote the main Python property-based testing library so they may just be copying me. :-)

                        What made me find these odd is that I am generally generating the types my API expects, not special types just for testing.

                        Note that I am not saying that you shouldn’t test the whole range of types your API accepts. I’m saying that not every test of your API needs to do that and it is worth testing more specific domains. @ferd’s parser example is a good one, but in general there will be properties that are only interesting in a subset of your domain (e.g. because they test a behaviour that handles a special case).

                        Take the mean example. It restricts the domain in large part because the property it’s testing is not valid outside of that restricted domain. You still need to test what happens with NaN, or empty lists, but what happens there is probably just an error condition that you need to test separately.

                        1. 2

                          One example:

                          def fact(n: int):
                            if n < 0:
                              raise ValueError
                            elif n == 0:
                              return 1
                            else:
                              return n * fact(n - 1)
                          

                          You’re properly handling the negative number case, but if one of your property tests is f(n) = n!, you don’t want that specific test “failing” because Hypothesis tried n = -1.

                        1. 8

                          erlang uses it to allow runtime update of code with zero downtime. OTP has hooks for transforming the old data formats to new data formats on live systems.

                          1. 7

                            Why is this enabled by dynamic typing? In my view it might be possible to achieve the same with static types. For example by restricting code updates to the same type signature as the code they are replacing (for erlang that would be the types of the exported functions) and allowing those hooks to transform one incoming type to a different outgoing type if the types needed to be changed.

                            1. 2

                              I think the type signatures might go a bit crazy when you have so many generic functions. If the vm type checked upgrade code against old and new types it could be quite neat.

                              1. 6

                                There are a number of approaches to deal with strong static types with hot-swap upgrades. Cloud Haskell (basically an effort to copy Erlang’s actor model features to Haskell) uses Haskell’s Typeable library. It’s a bit messy, but it does allow for type-safe hot replacement of distributed system components without having to e.g. transmit values as JSON or something.

                                1. 3

                                  Erlang works such that you could make a reference to an anonymous function in a module at version 1, send it over to a process that uses version 2 of the module to wrap it in another one, and pass it to a process that runs version 1 again. For that period of time, until version 1 is unloaded, this is a valid operation that should be possible to execute and unwrap.

                            2. 6

                              Haskell, quintessentially strongly typed, has the “plugins” library which allows runtime hot-loading. I think this is sort of orthogonal to how the language is typed.

                              1. 4

                                Erlang is strongly typed too. I think you meant static :)

                                1. 1
                                2. 3

                                  Java can do it. C can do it (and thus everything using dll/so/dylib libraries). C# can do it. So all mainstream statically typed languages are capable of a runtime update of code with zero downtime. They are not designed to do it, so it does not work as nice as in Erlang, but static typing is not part of the problem.

                                  1. 1

                                    That depends whether you define “problem” as “a property making it impossible to do something” or “a property making it a pain in the ass to do something”. IMO static type systems don’t make it impossible to hot-reload code, but they sure as heck make it hard, which is why statically-typed languages tend to only get that feature implemented once they’re really mature.

                                1. 1

                                  `[Property-based testing] is a high-investment, high-reward kind of deal.’ I can’t think of a more Erlang-y notion than that (in a good way).

                                  1. 3

                                    I do stand by that comment. In the (current) preface I mention that I’ve spent years toying with property-based testing [on and off] without feeling competent. I’ve seen talks about it, asked questions to people who knew more than I do, and a lot of the examples – toy examples – always felt very limited or always highlighting some basic bug (0.0 =/= 0 but Erlang allows both!) Stateless tests are fairly okay to test, but stateful ones were more of a challenge.

                                    Eventually I decided to take a deep dive and test non-trivial projects with it [albeit personal toy projects] until I could find ways to deal with asynchronous code, complex outputs and inputs, and figure out, with limited outside help, what seemed to stick or not.

                                    I’m hoping that this ‘book’ helps share my experience and therefore makes the investment required lower, but truth is that there is a kind of habit that you get in “thinking in properties” similar to what you get in programming languages or specific paradigms, and that takes practice. There are tips and tricks, but you only think comfortably within a paradigm once you’ve forced yourself to hit and go through a few walls with it.

                                    1. 1

                                      but truth is that there is a kind of habit that you get in “thinking in properties” similar to what you get in programming …

                                      I find that the things that make good contracts also happen to overlap with good properties to test. That doesn’t solve the “properties are hard”, but does reframe the problem in a way that’s often, in my (rather limited) experience, easier to reason about.

                                  1. 3

                                    It’s almost like a pipeline with structured exception handling… :P

                                    1. 9

                                      Those are not exceptions though. It does this on return values that are specific tuples.

                                      1. 1

                                        Point. I meant in the general “huh, crap, that function didn’t take the happy path with the supplied arguments path” rather than literal PL construct of exceptions. My apologies. :)

                                      2. 2

                                        It’s almost like a less thought out and less useful attempt at expressing monads.

                                        1. [Comment removed by author]

                                          1. 17

                                            You know what I like about your post?

                                            Let me show you the problem the tool is trying to solve. Got it? Good. Here’s the tool and how it solves it.

                                            Sooo many posts fail at that first part. Thanks!

                                            1. 3

                                              I personally enjoyed your post; that’s a very cool language feature. To deflect comments mentioning that this is essentially a specific sub-case of monads and do-notation (which we’ve seen in one form or another many times in many languages), you perhaps could have mentioned this fact in your post and preempted such comments.

                                              While the GP’s post was a bit curt, it’s not wrong, and it’s valuable to the discussion because not everyone will recognize that this is a specific case of a more general concept of interest to the languages community.

                                              1. 2

                                                For what it’s worth, I meant my comment in the sense of squinting at it and going “oh, nifty”.

                                                I don’t know much about monads by that name, mostly because of the smugness of Haskell weenies and how much I can’t possibly give a shit about whatever they want to say when they fail to put in the effort to educate.

                                                Even if it’d help.

                                                ~

                                                In that same spirit, the reason I said “pipeline with structured exception handling” is that your link was the first time it clicked with me what with was doing.

                                                We’d normally write

                                                "foo"
                                                |> bar()
                                                |> baz()
                                                |> quux()
                                                

                                                And hope no exceptions/BadMatches occured, or maybe (yuck) wrap the thing in a try/rescue block.

                                                With with, you declare a pipeline, and if a stage fails, it tries matching down in the else clause. That’s really neat.

                                                1. 7

                                                  I can’t possibly give a shit about whatever they want to say when they fail to put in the effort to educate.

                                                  Do 3+ years and a thousand pages get me permission to bend your ear?

                                                  1. 3

                                                    Well, you did write a book to help teach on it, and you haven’t (here at least) gone with “lol it’s just a monad [git gud]”. So, sure! :)

                                                    1. 1

                                                      I save the git gud for /r/darksouls

                                                  2. 3

                                                    I don’t know much about monads by that name, mostly because of the smugness of Haskell weenies and how much I can’t possibly give a shit about whatever they want to say when they fail to put in the effort to educate.

                                                    Wat? I guess you’re not aware of the glut of monad tutorials? One may not read them or think they are poorly written but given the size of the community there is quite a lot of educational material developed by it.

                                                    1. 7

                                                      Alright, let me clarify my English there and I’ll stop derailing the thread:

                                                      I am aware there is a glut of monad tutorials–least of all because apparently every time a Haskeller finally groks what a monad is and how they work they immediately write another tutorial on it.

                                                      I am also aware that the Haskell community writ large has made lots of efforts to educate.

                                                      My assertion is better quoted as:

                                                      Whenever I read a post that smugly monad-splains how whatever is being talked about is some inferior version of a concept from Haskell, especially without examples given, I automatically lose interest.

                                                      Imagine if every time you were chatting in public about a Haskell concept (say, lifting) that a Javascript developer popped out and said “yeah that’s kinda like currying in Javascript but used by fewer people and nobody gives a shit because we’re busy raising 5M USD seed rounds with Meteor and Mongo and doing cocaine and banging models at our plug-in camp at Burning Man fuck youuuuu”.

                                                      Do you see how annoying that is? Not in the least because they didn’t provide any code demonstrating their point, and because they clearly have at least some truth to their statement–not that you’ll ever know, because they didn’t stick around to explain the interesting bits of what they said (like how JS has almost certainly produced more revenue than Haskell, ever)?

                                                      1. 3

                                                        Imagine if every time you were chatting in public about a Haskell concept (say, lifting) that a Javascript developer popped out and said “yeah that’s kinda like currying in Javascript but used by fewer people and nobody gives a shit because we’re busy raising 5M USD seed rounds with Meteor and Mongo and doing cocaine and banging models at our plug-in camp at Burning Man fuck youuuuu”.

                                                        That’s called “every single day” :)

                                                        I can sort of see your point. Where I disagree is that this Elixir concept is literally a less powerful version of monads, so bringing them up, to me, is clearly related.

                                                        1. 4

                                                          You can make an instance of Monad that subsumes the baked-into-the-language thing they did here. I don’t know or care what “less powerful” means, but good PL design doesn’t require reifying instantiations of basic patterns in the implementation.

                                                          If you don’t care, ok, but I’d like for our industry to be less mired in mediocrity for the next 5 decades.

                                                          1. 4

                                                            Could one of you write an example, in Haskell and a vulgar language of your choosing (ideally Elixir), of what you’re getting at?

                                                            Especially if you are going to toss around terms like “mired in mediocrity”.

                                                            1. 5

                                                              Here’s the example given in the post:

                                                              do
                                                                a <- Math.divide 1 0
                                                                b <- Math.divide a 4
                                                                return ("success: " ++ show b)
                                                              

                                                              The else bit doesn’t translate as nicely, but it also doesn’t generalize as well.

                                                              Here’s depth first, exhaustive, lazy search

                                                              do
                                                                a <- [1 .. 100]
                                                                b <- [1 .. 100]
                                                                guard (a + b == 50 && a * b > 250)
                                                                return (a, b) 
                                                              

                                                              Point being that the structure of do subsumes more than just the behavior of with since there’s a somewhat more general pattern you could abstract over. The first example is super useful and if you had to pick just one way to use this pattern it’s a pretty good choice.

                                                              Otoh, it’s also very nice to have convenient access to the whole set of similar patterns.

                                                              But I’ll say no more. Chris actually thinks very hard about how to talk about all of this and probably will do a much better job.

                                                              1. 4

                                                                Oh probably, let me stew on how best to condense the point and why it matters (there are a lot of reasons, but not all of them are obvious if you haven’t been breathing it)

                                                    2. 1

                                                      Really? This is what you’re taking away from this?

                                                      You make it sound like that’s the wrong message to take away…are you saying it’s not this? It certainly lacks the power of being able to define what “bind” does, so it’s much more limited. There doesn’t really seem to be that much to discuss, to me. It’s sugar for some case statements.

                                                      I’ll concede the way I expressed myself might be sub-optimal, but I don’t think that makes with more interesting.

                                                      1. 7

                                                        is there a place where middlebrow dismissals are ever the right takeaway? Maybe instead of insulting the language designers for not completely implementing a feature the way you’d like it, you could just go punch a couch pillow and use swear words and stomp around the house for a few hours?

                                                1. 13

                                                  Oh no. Unicode.

                                                  Unicode is fraught with difficulties, but it’s still a godsend. Try working on a system that’s non-ASCII based and you realize how simple Unicode is in comparison. Code pages are horrific to manage. (If you are using the Shift In and Shift Out codes, then you’ve reached a special place in programming hell.)

                                                  1. 10

                                                    Unicode is a good thing (with some questionable historical decisions, but still better than the rest). I think the biggest problem is the preconception many programmers have about strings where they nicely fit the “one character per array index” model that just does not work well with real world text data. Of course the problem tends to show up while using Unicode, so blaming Unicode is a lot easier!

                                                    1. 3

                                                      Indeed. The notion of char * with only ASCII has taught the wrong abstraction to a lot of people.

                                                      1. 3

                                                        And a generation of Java programmers grew up with the idea that a 16-bit char represents a character (it did in UCS-2 of course). I think it is often also a lack of proper education in character sets/encodings in text books and undergraduate programs.

                                                        1. 1

                                                          There are still a handful of languages that use an array-of-32-bit-chars model, which retains the traditional equation of strings and character arrays, while also being able to represent all of Unicode. The String in the Haskell prelude comes to mind, although there are plenty of alternatives.

                                                          1. 3

                                                            UTF-32 and other 32-bit encoding schemas are usually based on a fixed-width in representation, but only for specific code points. Because of combining marks, the grapheme cluster (“a character”) may require multiple code points to be represented and breaks alignment.

                                                            Many textual representations can be done using either (i.e. ‘ç’ can either be ‘c + ¸’, or the pre-combined ç), but some representations just don’t fit that scheme, such as emoji where skin color is done through combining tone modifiers, for example. Another one is the zero-width joiner that can be used to join arbitrary characters or emoji when supported (the family emoji is possibly made up of two adult emoji and one or two child emoji)

                                                            You really have to use your Unicode libraries’ functions and algorithms to work, and think in terms of grapheme clusters in multiple cases. Strings as arrays just don’t really make sense anymore.

                                                            1. 1

                                                              Racket does UTF8 in unsigned int32. So, just because you use a 32 bit int, doesn’t have to imply UTF-32… that is unless my understanding is insufficient, and UTF-8 implemented this way is equivalent!

                                                              1. 1

                                                                There is Unicode directly, using codepoints as larger integers. This, by definition (with joiners and combining marks) cannot have fixed width characters.

                                                                UTF-8 as an encoding could sure be encoded with larger integers, but if you do it that way, you still have to implement the encoding using the same code units, which means it gets to be longer than other variants. If not, the encoding is just not utf-8.

                                                    2. 5

                                                      Unicode is terrible because the set of all human languages put together is terrible, and different languages interact strangely, with odd rules for specific characters that conflict with rules in other languages. If you want to support all human languages, Unicode is about as good as you can do.

                                                      It’s still awful to handle correctly, and I would be willing to bet that no software system gets it fully right.

                                                      1. 2

                                                        Some encodings are are as easy as ascii, others are really silly, but not necessarily much worse than unicode. Why do people always have to excuse unicode by comparing it to the worst of them all? Is the bar really that low?

                                                        1. 6

                                                          It’s not the encoding that’t the godsend, it’s the fact that you don’t have to convert between encodings when crossing system boundaries.

                                                          1. 6

                                                            Some encodings are are as easy as ascii, others are really silly, but not necessarily much worse than unicode.

                                                            Could you give an example? Most other encodings require codepage switching (for 8-bit encodings), which is terrible, or have relatively high storage costs per character.

                                                            Also, what are you referring to with unicode, since unicode specifies multiple character encodings? I would argue that UTF-8 is one of the most sensible encodings:

                                                            • ASCII is valid UTF-8.
                                                            • Since ASCII is valid UTF-8, most XML/HTML tags use only 1 byte per character.
                                                            • Since most codepoints are relatively small, UTF-8’s variable-length encoding encodes text much shorter than UTF-32.
                                                            • UTF-8 is self-synchronizing.
                                                            • UTF-8 is very easy to decode.

                                                            Of course, because it is variable length, it does not provide constant-time indexing or length, but fixed-width encodings would require > 16-bits per character for a reasonable character inventory.

                                                            1. 2

                                                              And, as ferd points out above, the entire concept of constant-time indexing into a sequence of Unicode codepoints is silly for most applications. (You’re almost certainly interested in grapheme clusters instead.) So the fact that UTF-8 doesn’t facilitate it isn’t much of a loss.

                                                        1. 1

                                                          I guess I missed the iPhone 1970 bug the first time around. Was it really only 1970? Or 1971 too? Because some of the explanations involving time underflow don’t really make sense. But if it were a code signature failure, that could happen for any day that’s too old. Of course it’s hard to find a more authoritative source for what happened than a gizmodo article “I googled 1970 and here’s what I learned about unix”.

                                                          1. 9

                                                            There were quite a few sources, but none that showed up as well on a slide. Here’s a few:

                                                            • This iOS date trick will brick any device – “A Reddit thread offers a handful of possible causes of the issue. The most reasonable appears to be that it all comes down to time zones and that setting the date to January 1st, 1970 causes the internal clock to be set to less than zero, which confuses the device and crashes it.”
                                                            • Setting the date to 1 January 1970 will brick your iPhone, iPad or iPod touch “The precise cause of the issue has not been confirmed, although speculation points to the way iOS stores date and time formats meaning that 1 January 1970 is stored as a value of zero or less than zero, causing every other process that requires the time stamp to fail.”
                                                            • Don’t Set Your iPhone Back to 1970, No Matter What – “The bug appears to only affect 64-bit iOS devices, meaning iPhone 5S, iPad Air, and iPad Mini 2 and newer are affected. It’s almost certainly related to the same Unix glitch that caused Facebook to wish people a happy 46 years on the service; the date 1/1/70 has an internal value of zero on a Unix system, which in this case is leading to a software freakout.”

                                                            The kind of common guess going around is that it could have to do with code signatures, either too old or too new. There’s been no official explanation of the absolute details and everything is a guess so far. I personally like the underflow theory (and do mention it is unsure and I only wish it to be true). I also used the 32 bit overflow (year 2104) as an example, even if the bug affected 64bit devices only. It didn’t feel particularly dishonest given the wording around it.

                                                            1. 3

                                                              Oh, thanks. Sorry if I was unclear, I didn’t mean you needed well sourced links for the talk. I’m just curious about the exact cause and was hoping somebody had figured it out.

                                                          1. 9

                                                            There’s an interesting cycle in the learning of a language. Few sources use or mention rebar3 because they come from a time before it. We (Tristan Sloughter and myself) developed it during our time at Heroku, and even though we have both left, we still maintain it.

                                                            Books like Learn You Some Erlang not only pre-date rebar3, they initially were written in part or entirely before any build system whatsoever (aside from custom makefiles) even existed.

                                                            That being said, I’m glad the current ecosystem is proving welcoming and motivating for developers :) Keep on trucking Damien.

                                                            1. 3

                                                              The entire community and industry is indebted to you and Tristan. You keep on trucking too!

                                                            1. 4

                                                              BTW, this company still exists (under the name AdGear), still uses Erlang very heavily, and was recently acquired by Samsung. People interested in working on low-latency Erlang systems should consider applying for a job. (Disclaimer: I work there. I apologize if the recruitment pitch is inappropriate.)

                                                              1. 4

                                                                Author here! Did not know about the acquisition, nice to hear!

                                                                1. 1

                                                                  BTW, this company still exists (under the name AdGear), still uses Erlang very heavily, and was recently acquired by Samsung. People interested in working on low-latency Erlang systems should consider applying for a job. (Disclaimer: I work there. I apologize if the recruitment pitch is inappropriate.)

                                                                  Do you know if the company is remote/freelance friendly?

                                                                  1. 1

                                                                    It has had remote employees in the past, but presently the team working on the systems mentioned in the blog are all located in Montreal. I’m not commenting on this in any official capacity, of course, but I think remote is possible for the right people and roles.

                                                                1. 4

                                                                  I have to agree, that was a confusing explanation for a simple difference. I personally like Erlang’s minimalism, where:

                                                                  Example = "one". % binding variable
                                                                  Example = "one". % pattern match variable (true)
                                                                  

                                                                  The runtime errors are usually sufficient to catch these mistakes in most situations.

                                                                  I would like to see the implications of Elixir’s auto-rebinding in a broader scope rather than the narrow example of case statements. While I know those are used heavily in Erlang programs, I’m curious about the risks introduced by rebinding in a larger more complex application with lots of moving parts.

                                                                  This is likely something that would require using Elixir for a period of time, which I haven’t yet tried, rather than being easy to communicate through blog post code samples.

                                                                  Regardless, in those cases I would prefer the stricter immutability of variables that Erlang offers. Especially considering the programming style of using multiple processes and the use of concurrency in even basic programs.

                                                                  1. 2

                                                                    I would personally prefer it if Elixir had Erlang’s matching by default, but optional rebinding with ^ rather than what they’re doing right now.

                                                                  1. 3

                                                                    What a pile of straw men. I can’t help thinking that, intentionally or accidentally, the childish presentation is making readers turn down their critical thinking.

                                                                    Every one of those things, when not taken to the caricatured extremes of the story, has made me a better programmer. Learning Python and C. Focusing on getting good at one thing, at times even at the expense of helping others. Calling out bad things when they’re bad, and accepting the endemic badness I can’t change at the time. Adopting new things early.

                                                                    Trying to be a real programmer has made me a better programmer. Looking at the human side of problems… it’s often the right thing to do, but by no means always. Sometimes the system really does encapsulate the important aspects, whereas looking at any one user’s face would lead you astray. I think the advice here is misleading at best.

                                                                    1. 8

                                                                      Of course they’re strawmen. This is written in an attempt to emulate the little prince, a children’s book written with such a style in the first place. It’s a story, not a paper to be published on the best practices of engineering.

                                                                      It’s not like reading books, trying frameworks, being an expert, ops person, or an architect is useless or a despicable goal. Pointing out errors is valuable, trying to improve is valuable, going technical is valuable, and so is staying up to date. But there is such a point, at least in my personal experience, where it becomes easy to forget why these things are valuable, where they become an exercise in futility on their own. They’re a burden driven by anxiety to perform or prove myself, rather than a healthier objective of performing something or improving myself. I personally have the habit of taking a good thing and pushing it too far, wrapping myself in it. Then I tend to err towards the caricature and have to check myself, making sure I stay more grounded.

                                                                      And the human face to keep in mind is not always just a user’s. God help me jobs could get depressing fast in that case. It can be your team, peers, coworkers, someone you’re looking to help indirectly or not. And then again, this is only for feeling a more long-term fulfillment, in the case illustrated here.

                                                                      There’s no doubt to me that focusing on the tech side of things only can be fun and extremely entertaining, but for me, it only lasts a while. I don’t think it’s a viable path for me in the long term. The satisfaction is not long lasting enough, and it becomes harder and harder to have it be a suitable replacement for working on something I perceive as actually worthwhile to someone else than someone reaping heaps of money off of it.

                                                                      This story, to me, was about showing that disconnect that makes its way, in a lighthearted way.

                                                                      1. 1

                                                                        And yet, I think that I’ve met every single one of those strawmen and have sometimes had to make a conscious effort to avoid becoming one; eg, ensuring that my constructive feedback was actually helpful in moving forward and not off-putting.

                                                                    1. 7

                                                                      This sounds like a very cumbersome way to do things.

                                                                      1. [Comment removed by author]

                                                                        1. 8

                                                                          Do you honestly think that’s the TL;DR of the article? Do you think every person using Erlang and calling into process_info/1,2 knows about this issue?

                                                                          1. 6

                                                                            I think it makes sense because nothing is shared. The info must be copied. What is unclear is the procedure by which it is done: does it use some sort of hidden signal/message kind of deal that needs to be scheduled, or does it just go ahead and read into the memory of other processes before copying it, for example.

                                                                            1. 2

                                                                              A quick glance at the code tells me the later is the case.

                                                                              I agree this makes sense, since how Erlang is expected to work by “share nothing”. Still might be a surprise in production when trying to find out what processes are causing a slow down or whatever, and then suddenly the memory going up to the sky without apparent reason.

                                                                            2. 2

                                                                              Given the semantics of Erlang I do not see how it could work any other way.

                                                                              1. 4

                                                                                By this logic every “surprising” behavior from Erlang that is not accepted as a bug, could be reduced to “Given the semantics of Erlang I do not see how it could work any other way.” Not sure how’s that helpful.

                                                                                1. 2

                                                                                  I guess I do not understand why this would be considered surprising behaviour at all.

                                                                                  1. 2

                                                                                    All I read is “I do not have empathy for other people”

                                                                                    1. 2

                                                                                      Fair enough. I think it is fair to assume people understand the basic semantica of their language but to each their own.

                                                                                      1. 11

                                                                                        While is fair to say that the following is not clear from the blog post, not everyone using an Erlang program knows how to program in Erlang, let alone the semantic details of it. There’s a fair amount of people running systems like RabbitMQ/Riak/Ejabberd in production that have no idea about Erlang, but they still need to debug it when things go wrong. Those people might be issuing commands to for example, get information about processes. Said people are expecting a printout of whatever Erlang thinks should go into process_info. I bet that if their system suddenly crashes since it ran out of memory due to calling said debugging facilities, the person will be surprised.

                                                                                2. [Comment removed by author]

                                                                                  1. 3

                                                                                    You still have to copy the data to move it between processes.

                                                                            1. 21

                                                                              The worst things in this article are not the “anti-patterns” but the author’s belief in a “meritocracy” (as if skill was the only thing rewarded on in any company, and not biases) and the rejection of a candidate due to “cultural fit” (a great way to form a gross monoculture) - I really hope the software industry soon learns to think of these things as awful ideas.

                                                                              1. 12

                                                                                Sometimes cultural fit is just a nice way of saying the person would be toxic to the team. I’ve seen it more than a few times with the title-obsessed.

                                                                                1. 4

                                                                                  How could you both have a meritocratic environment and reject people based on a perceived culture fit?

                                                                                2. 4

                                                                                  It’s strongly normative towards a very particular method of software development. I don’t think that it’s useful to generalize from that, as I’ve worked with very productive people who do not share the “roll up your sleeves and apply your mind-bursting sagacity by working 27 hour days” mentality.

                                                                                  1. 1

                                                                                    Isn’t culture by definition singular ? How can a company have more than two cultures for developers ?

                                                                                    1. 5

                                                                                      “Monoculture” doesn’t mean “one culture.” It means “a culture composed of only one type of thing.” The term comes from agriculture.

                                                                                    2. 1

                                                                                      The authors use of words are unfortunate here, both “meritocracy” and “culture fit” are now bad words in the tech industry (and for good reasons that I’ll save for another comment on another day)

                                                                                      But reading the article, it’s clear that the author is not advocating these in the way that we read them. In the article:

                                                                                      • Meritocracy - valuing people over job titles, particularly with removing one of the bigger biases out there
                                                                                      • Culture fit - rejecting someone that is title obsessed, someone that thinks a bigger title means more power

                                                                                      Both of these seem reasonable and I don’t see anything else in the article promoting meritocracy or to suggest that he’s building a monoculture.

                                                                                    1. 10

                                                                                      I think there’s one interesting property queues introduce to systems, and the article seems to take it for granted. Without queuing your capacity (let’s say measured in requests/s) must be greater than or equal to your peak traffic. With queuing your capacity merely has to be greater than or equal to your average traffic. If you’re in an industry where your average traffic is not equal to your max traffic, queues might be a win.

                                                                                      Another important aspect is separation of concerns. If you can keep your frontend from even needing to know about all the downstream databases and api calls that need to be made you’ve not only introduced better logical decoupling, but there’s a good chance you’ve improved the availability of your frontend. It’s relatively easy to keep workers from getting overloaded since they pull work rather than having it pushed on them.

                                                                                      However at the end of the day if your average traffic exceeds your capacity, you’re going to be in trouble. Queuing can buy you time (hours or a small number of days) to increase capacity. You better think twice and then think twice again before you tell a customer: “Sorry, we’re not good enough at our job. send your transactions elsewhere”.

                                                                                      1. 7

                                                                                        Author here.

                                                                                        The property you mention is only true as long as you are allowed to have extremely high latencies in your responses that get queued up. Most systems have a qualitative value where either time outs are going to be shorter, users will give up (or get disconnected) before then, or will perceive your quality of service as low and unacceptable.

                                                                                        While the queue will keep the system up to your average load, it won’t keep its quality to an acceptable level. In fact, having a queue may end up ruining the quality of service to requests taking place during slow times in your day, in a fashion similar to bufferbloat: because the peak load tasks are in the queue, they take precedence over the non-load periods' tasks. These tasks end up feeling the slowdown that happens over the peak period, and in the end, everyone has a worse experience.

                                                                                        This can be solved in two ways: replace the queue by a stack (you reduce the overall latency per-task, but the oldest tasks take all of the accumulated time – that’s how phone support systems work and why people who wait long tend to wait even longer), or increase the capacity at any given time.

                                                                                        The question of using a queue to help handle temporary overload is still a complex one when you have to care for the quality of service. Maybe you won’t get to tell the customer “Sorry, we’re not good enough at our job. send your transactions elsewhere”, and they’ll instead go “Sorry, your service is too slow, we’ll send our transactions elsewhere.”

                                                                                        1. 2

                                                                                          I think the implications of shedding load vs back pressure vs high latencies depend on your use case, I agree that you basically have to pick one. In my ad-tech work where we queue conversions for processing. Shedding load would imply data corruption. Back pressure would imply slowing down the customer’s e-commerce site and breaking their checkout process. High latency means that we’re delayed in reporting conversions.

                                                                                          Sometimes high latency is the least-bad choice.

                                                                                          1. 2

                                                                                            High latency can be a way of implementing back-pressure, if your upstream’s load balancer is sensible. Shedding load can be mixed with back pressure via things like circuit breakers, or if your service does more than one thing–for example, if your service optionally returns many kinds of search results, but must collate the search results it does return, you can shed load by dropping some of the search queries to downstreams, and implement backpressure by slowing down by however much you need to to support collating the search results.

                                                                                            Fundamentally, back pressure is just signaling back to your upstream that you’re having trouble, and you can do that as well as throw away work if you want.

                                                                                            1. 1

                                                                                              Sometimes, yes. I also used to work in ad stuff (RTB), where if you’re analyzing incoming bid requests, you’re better off to shed load ASAP because you have 100ms (including network time) to make a decision without risking being penalized by the ad network.

                                                                                              There’s no one-size-fits all solution there, of course. But in the case you mention, your allowed latency is bound by your memory size. If you let it reach that size, you will lose everything in the queue, or if it’s a persistent one, everything that would hit your service while it’s crashed because it went out of memory.

                                                                                        1. 1

                                                                                          The red arrow (and the red arrow behind it once the first red arrow is addressed, and the one behind that…) are the only things that should ever be addressed for optimization purposes. Nothing else matters if the throughput is capped.

                                                                                          1. 1

                                                                                            The red arrow should be spotted and addressed first. If from then on, you test your system and can never get the throttle on the red arrow to trigger, there’s one or more potential bottlenecks that sit above it in the stack, and that should then be addressed in order to be able to reach your optimal throughput.

                                                                                            EDIT: it seems you edited your own post to account for that while I was writing mine, disregard this.

                                                                                          1. 4

                                                                                            Distributed database to work in a mostly partitioned state to synchronize gif directories, https://github.com/ferd/figsync, based on a simpler database made on purpose for this thing (https://github.com/ferd/peeranha)

                                                                                            1. 8

                                                                                              The few things that bother me about this article are these apparent contradictions:

                                                                                              And then, when it was all working, I refactored out the duplicated code. And I refactored again. And in the end, the whole thing was simpler and shorter than what I would have done with generics

                                                                                              And then:

                                                                                              When writing the Go function, I started at the top and typed until I got to the bottom. And that was it. There aren’t very many ways to write this function in Go.

                                                                                              You don’t need generics, just refactor until it works; also, you don’t really need to think about how to rewrite thing, it should be obvious.

                                                                                              Other bits, like:

                                                                                              So again, in the end, Go turned out to be a language for solving real problems rather than a language filled with beautiful tools, and so you build real solutions rather than finding excuses to use your beautiful tools. […] But if you’re trying to solve specific, practical problems in the forms professional developers typically encounter, Go is quite nice.

                                                                                              Seem to carry the implicit assumption that it’s a mutually exclusive problem, or that having beautiful tools leads to not solving the problem appropriately. That if you use fancy tools, the problem somehow doesn’t get fixed, masturbatory practices take place instead.

                                                                                              I’m not sure I can see myself agreeing.

                                                                                              1. 3

                                                                                                So, when you are actually writing Go code, translating what you intend to do into actual code is much as the author describes: you just write it. That doesn’t mean your initial intent isn’t flawed. I have never written a program knowing exactly how I wanted to do everything from the get-go. Once I learn more about the problem I’m solving by actually writing code to solve it, then I discover flaws in my old approaches and refactor to fit an updated strategy.

                                                                                                The author could have wanted a generic behavior, but discovered at the end of writing the program that the behavior wasn’t actually as generic as they predicted in the beginning, and had common elements that could be factored together. I hypothesize this because the same thing just happened to me at work a week ago! =)

                                                                                                Have you written much Go? Because I notice all the same things as the author. I don’t spend nearly as much time deciding what approach to use on a function by function basis, and spend more time architecting the entire program. For me, it’s not so much a masturbatory need to use beautiful tools of a language that slows me down, but the paradox of choice.

                                                                                                1. 2

                                                                                                  I haven’t written much Go, but I’ve had that feeling in different languages at different times. For me the epiphany came when writing Erlang, and figuring out that all of a sudden, I had found a language where how I thought (planning my programs with bubbles and arrows on a whiteboard) and a declarative style where the errors are possibly handled outside of the ‘happy path’ fit exactly the way I think.

                                                                                                  For me, reading and writing Go has always been a disagreeable experience that I felt was crufty, risky, and unnecessary.

                                                                                                  Before anything, let me add that this isn’t to the merit of Erlang nor the detriment of Go. I strongly believe that what language you feel comfortable with is a happy coincidence, and that the following elements must all be met for you to be at home with a language:

                                                                                                  • The language fits the problem space you’re working in
                                                                                                  • The language semantics fit the way you construct solutions and algorithms (recipes => imperative, composition and black boxes => functional, taxonomy-based => objects or typeclasses, and so on, for some straw examples)
                                                                                                  • The language’s organisational model for code and abstractions fits your own internal one
                                                                                                  • The community has approaches that tend to match yours (so libraries feel more intuitive)
                                                                                                  • The existing teaching material has been written for people like you in mind (different levels of formalism, tone, examples chosen, assumed background knowledge, etc.)
                                                                                                  • Experience with similar paradigms in the past
                                                                                                  • etc.

                                                                                                  The problem with arguments such as “this language is really well designed for solving problems” is that it carries the heavy assumption of “when I am using it” added to the end of the sentence, where ‘I’ is the big variable.

                                                                                                  I didn’t notice the same things as you or the author did when reading about or using go, which is why these sentences confused me a bit.

                                                                                                  1. 2

                                                                                                    Well, the first thousand lines (or some N lines of code) you write in any language are spent checking and double checking the documentation, so when I first started writing Go I didn’t notice those things either. But unlike, for example C++, it was very easy to get to a point where I remember everything important about the language and could blast out code.

                                                                                                    I definitely agree that what language you write best in is dependant on what kind of coder you are. Go seems to resonate with systems programmers like myself, because it’s spiritually similar to C with a just enough stuff changed to make it easier (slices and maps being painfully absent from C). Then throw in the latest ideas about concurrency as primitives and you have the optimal language for a systems programmer.

                                                                                                    Also, +1 for Erlang, it’s a great language. Idiomatic Go concurrency is fairly similar to the Erlang design principles. =)

                                                                                                    As for the dynamic language people who are adopting Go so enthusiastically, I think they are just thrilled they can write performant code without having to use something heavy-handed like C++ or Java. ;)

                                                                                                2. 1

                                                                                                  I read this differently. I believe the author was refactoring because he’s new to Go and still coming to terms with the paradigm, just as there’s a learning curve in any language. He started by trying write complex generic code when a more idiomatic Go solution turned out to be more straightforward. This is a strength of Go - good solutions tend to minimize complexity and emphasize practicality.