1. 5

    The comment on how Swift make prototyping harder by forcing developers to express correct types is spot-on, and would apply to other strongly typed languages. One could argue that you should solve the problem on paper first and sketch out the types before writing the implementation, but I find it a good example of how dynamic languages shift our expectations in terms of programming ergonomics.

    1. 19

      I’d be very curious to hear what situations you’ve encountered where you were prototyping a solution that you understood well enough to turn into code, but not precisely enough to know its types? I’ve personally found that I can’t write a single line of code – in any language, static or dynamic – without first answering basic questions for myself about what kinds of data will be flowing through it. What questions do you find the language is forcing you answer up-front that you would otherwise be able to defer?

      1. 7

        When I have no idea where I’m going I sometimes just start writing some part of the code I can already foresee, but with no clue how anything around it (or even that part itself) will end up looking in the final analysis. I have no data structures and overall no control flow in mind, only a vague idea of what the point of the code is.

        Then with lax checking it’s much easier to get to where I can run the code – even though only a fraction of it even does anything at all. E.g. I might have some function calls where half the parameters are missing because I didn’t write the code to compute those values yet, but it doesn’t matter: either that part of the code doesn’t even run, or it does but I only care about what happens before execution gets to the point of crashing. Because I want to run the stuff I already have so I can test hypotheses.

        In several contemporary dynamic languages, I don’t have to spend any time stubbing out missing bits like that because the compiler will just let things like that fly. I don’t need the compiler telling me that that code is broken… I already know that. I mean I haven’t even written it yet, how could it be right.

        And then I discover what it is that I even wanted to do in the first place as I try to fill in the parts that I discover are missing as I try to fill in the parts that I discover are missing as I try to fill in the parts that I discover are missing… etc. Structures turn out to repeat as the code grows, or bits need to cross-connect, so I discover abstractions suggesting themselves, and I gradually learn what the code wants to look like.

        The more coherent the code has to be to compile, the more time I have to spend stubbing out dummy parts for pieces of the code I don’t even yet know will end up being part of the final structure of the code or not.

        It would of course be exceedingly helpful to be able to say “now check this whole thing for coherence please” at the end of the process. But along the way it’s a serious hindrance.

        (This is not a design process to use for everything. It’s bottom-up to the extreme. It’s great for breaking into new terrain though… at least for me. I’m terrible at top-downing my way into things I don’t already understand.)

        1. 4

          That’s very interesting! If I’ve understood you correctly, your prototyping approach seems to allow you to smoothly transform non-executable sketches into executable programs, by using the same syntax for both. So instead of sketching ideas for your program on a napkin, or on a whiteboard, or in a scratch plaintext file, you can do that exploration using a notation which is both familiar to you and easy to adapt into an actual running program. Would it be correct to say that by the time you actually run a piece of code, you have a relatively clear idea of what types of data are flowing through it, or at least the parts of it that are actually operational? And that the parts of your program whose types you’re less confident about are also the parts you aren’t quite ready to execute yet?

          If so, then I think our processes are actually quite similar. I mainly program in languages with very strict type systems, but when I first try to solve a problem I often start with a handwritten sketch or plaintext pseudocode. Now that I think about it, I realize that I often subconsciously try to keep the notation of those sketches as close as possible to what the eventual executable code might look like, so that I’ll be able to easily adapt it when the time comes. But either way, we’re both bypassing any kind of correctness checking until we actually know what it is we’re doing, and only once we reach a certain level of confidence do we actually run or (if the language supports it) typecheck our solution.

          Let me know if I’ve missed something about your process, but I think I understand the idea of using dynamic languages for prototyping much more clearly now. What always confused me is that the runtime semantics and static types (whether automatically checked or not) of a program seem so tightly coupled that it would be nearly impossible to figure one out without the other, but you seem to be suggesting that when you’re not sure about the types in a section of your program, you’re probably not sure about it’s exact runtime semantics either, and you’re keeping it around as more of a working outline than an actual program to be immediately run. So even in that early phase, types and semantics are still “coupled,” but only in the sense that they’re both incomplete!

          1. 3

            If I’ve understood you correctly, your prototyping approach seems to allow you to smoothly transform non-executable sketches into executable programs, by using the same syntax for both.

            Yup.

            Would it be correct to say that by the time you actually run a piece of code, you have a relatively clear idea of what types of data are flowing through it, or at least the parts of it that are actually operational?

            Well… it depends. For the parts that are most fully written out, yes. For the parts that aren’t, no. Neither of which are relevant when it comes to type checking, of course. But at the margins there is this grey area where I have some data structures but I only know half of what they look like. And at least one or two of them shift shape completely as the code solidifies and I discover the actual access patterns.

            If so, then I think our processes are actually quite similar.

            Sounds like it. I’d wonder if the different ergonomics don’t still lead to rather different focus in execution (what to flesh out first etc.) so that dynamic vs static still has a defining impact on the outcome. But it sure sounds like there is a deep equivalence, at least on one level.

            Now that I think about it, I realize that I often subconsciously try to keep the notation of those sketches as close as possible to what the eventual executable code might look like

            Seems natural, no? 😊 The code is ultimately what you’re trying to get to, so it makes sense to keep the eventual translation distance small from the get-go.

            So even in that early phase, types and semantics are still “coupled,” but only in the sense that they’re both incomplete!

            I had never thought about it this way, but that sounds right to me as well.

        2. 4

          You didn’t ask me but I’ll answer anyway because I’d like your advice! I am currently prototyping a data processing pipeline. Raw sensor data comes in at one end then is processed by a number of different functions, each of which annotates the data with its results, before emitting the final blob of data plus annotations to the rest of the system. As a concrete example, if the sensor data were an image, one of the annotations might be bounding boxes around objects detected in the image, another might be some statistics, etc.

          At this stage in the design, we don’t know what all the stages in the pipeline will need to be. We would like to be able to insert new functions at any stage in the pipeline. We would also like to be able to rearrange the stages. Maybe we will reuse some of these functions in other pipelines too.

          One way to program this is the “just use a map” style promoted by Clojure. Here every function takes a map, adds its results to the map as new fields, then passes on the map to the next function. So each function will accept data that it doesn’t recognize and just pass it on. This makes everything nicely composable and permits the easy refactoring we want.

          How would this work in a statically typed system? If the pipeline consists of three functions A, B then C, doesn’t B have to be typed such that it only accepts the output of A and produces the input of C? What happens when we add another function between B and C? Or switch the order so A comes last?

          What would the types look like anyway? Each function needs to output its input plus a bit more: in an OOP language, this quickly becomes a mess of nested objects. Can Haskell do better?

          Since I cannot actually use Clojure for this project, I’d welcome any advice on doing this in a statically typed language!

          1. 3

            In my experience statically typed languages are generally very good at expressing these kinds of systems. Very often, you can express composable pipelines without any purpose-built framework at all, just using ordinary functions! You can write your entire pipeline as a function calling out to many sub-functions, passing just the relevant resources through each function’s arguments and return values. This approach requires you to explicitly specify your pipeline’s dependency graph, which in my experience is actually extremely valuable because it allows you to understand the structure of your program at a glance. The simplicity of this approach makes it easy to maintain and guarantees perfect type safety.

            That said, based on your response to @danidiaz, it sounds like you might be doing heavier data processing than a single thread running on a single machine will be able to handle? In that case, depending on the exact kind of processing you’re doing, it’s still possible that you can implement some lightweight parallelism at the function call level without departing too much from modeling your pipeline as an ordinary sequence of function calls. Ordinary (pure) functions are also highly reusable and don’t impose any strong architectural constraints on your system, so you can always scale to a more heavily multi-threaded or distributed environment later without having to re-implement your individual pipeline stages.

            If you do have to run your system across multiple processes or even multiple machines, then it is definitely harder to express a solution in a type-safe way. Most type systems don’t currently work very well across process or machine boundaries, and a large part of this difficulty stems from that it is inherently challenging to statically verify the coherence of a system whose constituent components might be independently recompiled and replaced while the system is running. I’m not sure how your idiomatic Clojure solution would cope with this scenario either, though, so I’d be curious to learn more about exactly what the requirements of this system are. These kinds of questions often turn out to be highly dependent on subtle details, so I’d be interested to hear more about your problem domain.

            1. 2

              You can write your entire pipeline as a function calling out to many sub-functions, passing just the relevant resources through each function’s arguments and return values.

              That’s basically what Cleanroom does it its “box structures” that decompose into more concrete boxes. Just functional decomposition. It has semi-formal specifications to go with it plus a limited set of human-verifiable, control-flow primitives. The result is getting things right is a lot easier.

              1. 1

                Thank you. Just to emphasize, we are talking about prototyping here. The system I am building is being built to explore possibilities, to find out what the final system should look like. By the time we build the final system, we will have much stricter requirements.

                I am working on an embedded system. We have limited processing capability on the device itself. We’d like to do as much processing as we can close to the sensors but, we think, we will probably need to off load some of the work to remote systems (e.g. the “cloud”). We also haven’t fixed precisely what on-board processing capability we will have. Maybe it will turn out to be more cost-effective to have a slightly more powerful on-board processor, or maybe it will be helpful to have two independent processors, or maybe lots of really cheap processors, or maybe we should off-load almost everything. I work in upstream research so nothing is set in stone yet.

                Furthermore, we don’t know precisely what processing we will need to do in order to achieve our goals. Sorry for being vague about “processing” and “goals” here but I can’t tell you exactly what we’re trying to do. I need to be able pull apart our data processing pipeline, rearrange stages, add stages, remove stages, etc.

                We aren’t using Clojure. I just happen to have been binge watching Rich Hickey videos recently and some of his examples struck a chord with me. We are using C++, which I am finding extremely tedious. Mind you, I’ve been finding C++ tedious for about twenty years now :)

              2. 2

                Here every function takes a map, adds its results to the map as new fields, then passes on the map to the next function.

                Naive question: why should functions bother with returning the original map? Why not return only their own results? Could not the original map be kept as a reference somewhere, say, in a let binding?

                1. 4

                  If your functions pass through information they don’t recognize - i.e. accept a map and return the same map but with additional fields - then what is to be done is completely decoupled from where it is done. You can trivially move part of the pipeline to a different thread, process or across a network to a different machine.

                  You’re absolutely right though, if everything is in a single thread then you can achieve the same thing by adding results to a local scope.

                  At the prototyping stage, I think it’s helpful not to commit too early to a particular thread/process/node design.

            2. 6

              I may be too far removed from my time with dynlangs but I’ve always liked just changing a type and being able to very rapidly find all places it matters when things stop compiling and get highlighted in my editor.

              1. 5

                The comment on how Swift make prototyping harder by forcing developers to express correct types is spot-on, and would apply to other strongly typed languages

                Quick bit of notation: “strongly typed” is a subjective term. Generally people use it to mean “statically typed with no implicit type coercion”, but that’s not universal. People often refer to both C and Python as strongly typed, despite one having implicit coercion and the other not having static types.

                1. 1

                  Thanks for the clarification!

              1. 8

                The proposed solutions provided in the article all swarm around the idea of finding a third-party source randomness that everyone can agree on. Almost all the proposed solutions on the reddit thread do the same. (Props to this person for walking to the beat of a different drummer.)

                I think they (or we) can do better! But, I don’t know how, yet.

                I think the solution should be cryptographic in nature… So, I’ll try to get close to the answer by clicking anything in Wikipedia’s Cryptography Portal and connected nodes and sharing anything that looks remotely related.

                These look really promising:

                These look … interesting? Hopefully unnecessary.

                1. 5

                  What about this?

                  1. Each party generates a secret string
                  2. Each party publishes the hash of their string to the entire group
                  3. Once all hashes have been published, each party publishes their original string to the entire group
                  4. The random number is the hash of the concatenated strings

                  There’s nothing in this protocol enforcing that the secret strings be random, but I believe that it’s game-theoretically in each party’s interest to do so, so as to avoid e.g. dictionary attacks. I can’t see how any party gains anything by generating a string that is anything except truly random, ensuring a random hash of the concatenated strings.

                  Am I thinking about this correctly?

                  EDIT: Ah, I see, this is basically the “commitment scheme” idea mentioned in the Wikipedia article you posted. Cool!

                  1. 1

                    I came up with a variant of this, but instead of strings, each person picks a number, which is then hashed, then the resulting number is the sum of the chosen numbers, mod 20, plus 1.

                    Another thing you could do is send a message, and the millisecond it was received at, mod 20, plus 1, is your number. You would have to trust that the internet makes the timing random enough, and that you can trust your chat system, but usually you can.

                  2. 4

                    They don’t need to agree on the source of randomness, it just needs to be reasonably beyond manipulation by dev1. Like @asrp says, stuff like Bitcoin IDs will work. You could also hash the fifty first characters of the first article of a newspaper published the day of. Just as long as you announce the method before the event happens, and that the data source is sufficiently out of your control, you’re good.

                    1. 2

                      It depends on what they mean by their constraint 2. If there’s no input from the other devs or any third party then the only remaining source of information is the first developer and so I think it cannot be done.

                    1. 9

                      Working on the typechecker for Nickel, a type-safe, memory-safe, effect-safe intermediate language I’m building for use as a backend for functional compilers. Think Rust with a more expressive type system, or Haskell with precise control over memory management and mutation.

                      It turns out that if you don’t care about ergonomics or readability at all, you can create some very powerful low-level abstractions – perfect for an intermediate language! :)

                      1. 3

                        This seems like it would be a perfect intermediate language for my nascent programming language, Forest.

                        I’m going to watch your progress eagerly. Have you had any interest from other programming language implementers?

                        1. 3

                          I’m glad you’re interested! Nickel is very, very new, and you’re actually the first language developer to mention that you’d be interested in using Nickel for your compiler. When my project is a bit farther along, I’d love to talk with you more about how Forest’s abstractions might map on to Nickel’s execution model and type system. Feedback from an actual language implementer (other than myself) would be extremely helpful for refining Nickel’s design!

                          1. 1

                            Would love to go through that with you, and since Nickel is actually aiming to implement a lot of what I was going to do in the Forest compiler, I could be interested in contributing :)

                            What do you think is the best forum to talk this through? I could open an issue on Nickel and talk about the stuff from the README that seems like it would fit Forest, if you like.

                            1. 1

                              I’ve set up a Gitter for Nickel – come and chat!

                      1. 11

                        This is interesting! In the language of functional programming, one way I look at this is that J automatically lifts functions into what Haskell considers to be the applicative functor of single-argument functions. What the OP writes in J as

                        mean =: +/ % #
                        

                        can be written in Haskell as

                        mean = (/) <$> sum <*> length
                        

                        which basically means “divide the result of sum by the result of length, with the argument kept implicit throughout.” It’s pretty cool that it’s implicit in J (a lot of powerful language features amount to implicitly lifting code into monadic / applicative contexts; see exceptions, coroutines, continuations, backtracking, and even state and IO in a fundamental sense), and it’s also pretty cool that Haskell can model it as a library feature!

                        Pedantic note: strictly speaking, to make the types in the above code work out, you have to replace length with (fromIntegral <$> length) or equivalently (fromIntegral . length), but that’s incidental.

                        1. 5

                          Sounds like a language that doesn’t try to waste my time :) Speaking of which, i think it might be interesting to think about what an “iteration” in the language might look like - something to address “I changed some stuff, what is the result?”.

                          Now, there are people who use “tests” for this purpose, but that term comes with a lot of baggage, much of which results in my time being wasted, at least in the short term. So perhaps it’s possible to factor out the quality-assurance/maintainability/code-monkey-metrics aspects and find something that a) is in-between a REPL and a test suite and b) has a simple, baggage-free term for a name.

                          Perhaps “examples”? As in, some sort of collection of inputs and their previous/expected/possible results together with short-term time-saving automation. Could be as simple as recording stdout on each run and showing a diff on the next re-run. Or something very clever with generators and fuzzing.

                          (Edit) Adding on to the diff idea, having useful “diffs” for (nested) maps, lists, and primitive types out-of-the-box might already go a long way.

                          1. 3

                            I think this is a fantastic idea. I can’t speak for other developers, but even when I don’t write full tests for a piece of code, I always run it on some example input, in a REPL or a scratch executable file, and “eyeball” the results to see if they match my expectations. Even though these “examples,” as you call them, don’t come with a machine-checkable criterion for their correctness, they are still valuable in their own right as a first step towards validating that a codebase behaves as intend, and also serve as a kind of executable documentation that future developers can use to understand how a project’s APIs work in practice. Current tooling seems to lack first-class support for this kind of example code, and in some cases (e.g. REPLs) actually actively works to ensure it never lasts beyond a single session, but I think it would be valuable to treat it as an important persistent artifact of the development process; an asset to stand alongside code, documentation, and tests.

                            1. 15

                              I think history has shown that there is no warning label large enough to dissuade stubborn users from using a language intended for small-scale work for inappropriately large, complex projects: see PHP, JavaScript, Python, or even to some extent Bash. At a technical level Schlub may be excellent for the quick-and-dirty use case you intend it for, but how do you prevent the social phenomenon of Schlub codebases growing far beyond their intended limits? How do you force users to confront the moment when it’s time to cut their losses and rewrite their prototype using more industrial-strength tools?

                              1. 9

                                Or even Excel spreadsheets (Excel happens to be the most popular non-programmer programming language ever written).

                                1. 5

                                  A great way would be incentivising the refactor.

                                  For example, if the language had a “transparent” FFI to Python (kinda disagree about it being on the same level as PHP and JavaScript for this, and it’s flexible enough), then you could move stuff over to a “serious” language without having to rebuild your toolchain.

                                1. 9

                                  While I really love the overall message of this post – that it is usually most useful when discussing language features to drop the word “explicit” in favor of more precise words that capture specific aspects of “explicitness” – I’m not sure I agree with the idea that the definition of “explicit” presented in the article can ever be useful at all. In particular, according to the article, “explicit is not local,” which seems to propose that “explicit” be shorthand for “deducible about a program from global analysis of its source code.” If this is the definition being proposed here, then I’m not sure I understand how any feature could ever be “implicit.” In principle every property of a program (that can be known at all) can be known by looking at its source code; after all, the source code is the program!

                                  I find myself wondering: what kind of properties of a program would not be explicit under this definition, except for properties that are unknowable either in theory because of decidability issues, or in practice because they are only determined by complicated and unpredictable runtime behavior? And if properties not amenable to static analysis are the only “implicit” properties, then doesn’t “explicit/implicit” become a synonym for “static/dynamic”? If so, why not just talk about “static/dynamic” instead?

                                  1. 5

                                    Here are some reasons for not loving Ada:

                                    • It is not type safe, even if you only use features that were meant to be type safe. In other words, Ada’s type system fails at being a type system. “Almost safe” languages (Ada, Eiffel, etc.) are actually more dangerous than C, since at least C programmers are permanently aware that they are walking through a semantic minefield, whereas users of “almost safe” languages often think they aren’t.

                                    • Much of Ada’s type system exists to automate the insertion of runtime safety checks, just like much of pre-templates C++’s type system exists to automate the insertion of runtime type coercions and cleanup operations. Now, automation is nice, but, are types the right tool for this job? I do not think so. Macros offer a more compelling alternative.

                                    • With Ada, you can pick at most two between safety, efficiency and an implementation-independent semantics. Safely getting rid of the onerous runtime checks requires external analyses not evidently justified by the Ada specification. A formal semantics for the whole of Ada (not just the subset without dynamic resource management) is very much missing.

                                    1. 5

                                      at least C programmers are permanently aware that they are walking through a semantic minefield, whereas users of “almost safe” languages often think they aren’t.

                                      This is a very good point, and in my experience it also applies to functional purity. In an inherently impure language like C or Java, you’re always aware that shared mutable state is present and consequently you’re on guard for it, so you generally won’t build abstractions that depend on purity in order to work. In a language like Haskell or Elm, on the other hand, you know that the compiler actually offers strong purity guarantees at the type level, so you can use functional abstractions on a large scale with confidence. The real problem lies in the middle, when you’re using a language or a library that is pure by convention but will accept impure functions as well, and you build functional abstractions on it that break in really subtle and unexpected ways when mutability and nondeterminism come into play. I’d say an example of the latter category is some modern JS frameworks like React, which depend heavily on the programmer keeping track of what subset of their code has to be pure and what subset can be imperative, all in a language that has no compile-time ability to distinguish the two.

                                      1. 2

                                        It is not type safe, even if you only use features that were meant to be type safe. In other words, Ada’s type system fails at being a type system. “Almost safe” languages (Ada, Eiffel, etc.) are actually more dangerous than C, since at least C programmers are permanently aware that they are walking through a semantic minefield, whereas users of “almost safe” languages often think they aren’t.

                                        That example doesn’t support the point because it’s using an Unchecked package, which exist solely to side step the type system. They’re the Ada equivalents of Haskell’s Unsafe functions.

                                        Much of Ada’s type system exists to automate the insertion of runtime safety checks, just like much of pre-templates C++’s type system exists to automate the insertion of runtime type coercions and cleanup operations. Now, automation is nice, but, are types the right tool for this job? I do not think so. Macros offer a more compelling alternative.

                                        I don’t think I agree with the premise that Ada’s type system exists to automate the insertion of runtime safety checks. It can be viewed in that way, but so can any other type system. In reality, an Ada compiler is free to skip any runtime check that it can detect is unnecessary.

                                        1. 2

                                          That example doesn’t support the point because it’s using an Unchecked package, which exist solely to side step the type system.

                                          Re-read the code. The Conversion function doesn’t use anything from any Unchecked package. The Unchecked_Conversion function is only imported for comparison purposes.

                                          It can be viewed in that way, but so can any other type system.

                                          That would be wrong. For certain type systems, e.g. ML’s, it is a theorem that legal programs can be striped of all type information without altering the dynamic semantics of the language.

                                          In reality, an Ada compiler is free to skip any runtime check that it can detect is unnecessary.

                                          Yes, but it is in general undecidable whether a specific check can be safely elided. Things wouldn’t be so bad with a formal semantics for Ada, allowing programmers to verify by themselves that specific checks can be safely elided.

                                          EDIT: Fixed identifier. Added last sentence.

                                        2. 2

                                          I would like to add another reason: not much choice between implementations of Ada.

                                          1. 2

                                            This one doesn’t bother me at all, although I can see why others would consider it a problem.

                                        1. 10

                                          Structure and Interpretation of Classical Mechanics should be right up your alley. Authored with a familiar name and also involves Scheme.

                                          1. 7

                                            Another truly excellent book in this loose “series” is “Functional Differential Geometry”, which uses Scheme as a notation to express mathematical concepts precisely. The book is free online, and I’m currently having a wonderful time working through it as a first text on differential geometry.

                                          1. 13

                                            Except that, even though you knew the type, you still knew nothing about the structure of that JSON

                                            That’s nonsense, use Generics or TemplateHaskell and make a config that fits the pattern you want and you get automatic JSON parsing from your types in addition to getting to know the structure of the JSON based on the structure of the data type. This isn’t a property some Haskellers actually want, but the option is there if you do.

                                            What we wound up doing was nesting pattern matching to get at the value we wanted. The deeper the JSON nesting, the deeper our pattern matching. We wrote functions to wrap this up and make it easier. But in general, it was a slog. The tools Haskell gives you are great at some things, but dealing with arbitrarily nested ADTs is not one of them.

                                            This is utterly unnecessary for both the JSON handling itself: https://github.com/bitemyapp/bloodhound/blob/master/src/Database/V5/Bloodhound/Types.hs#L1951-L1965

                                            And for working with nested types in general: you can use lenses, but even with a very nested API like Elasticsearch’s just being able to compose accessors is generally enough.

                                            I’m familiar with this guys’ schtick, he did some Haskell back in like 2010 with a Happstack based application and nothing was particularly nice. If I was stuck with Happstack, I’m not sure I would want to use Haskell either.

                                            Rich had some good points about positional semantics. When defining an ADT, positional semantics make it hard to read the code. If you’ve got seven arguments to your ADT’s constructor, you’ve probably got them in the wrong order and you’ve forgotten one.

                                            First, I avoid passing a bunch of the same type in consecutive order to a data constructor.

                                            I avoid Text -> Text -> Int -> Int -> MyType and replace it with: HostName -> Alias -> PortNumber -> WorkerCount -> MyType or some-such, further, if there’s more a few arguments to a data constructor I’ll use record syntax to construct the value:

                                            MyType {
                                                 hostname = myHostname
                                              ,  alias = myAlias
                                              ,  portnumber = myPortnumber
                                              ,  workercount = myWorkercount
                                              }
                                            

                                            Then ordering doesn’t matter. I usually set record fields strict, especially if I plan to use the record syntax, so that I can’t forget anything.

                                            The fact is, in real systems, these ADTs do accrete new arguments.

                                            I can’t say I recall the last time Value got a new constructor or one of its data constructors got a new argument, but I’d sure be grateful that my type system would be there to tell me everywhere in my project I need to fix the code if it did.

                                            I don’t have the patience for the rest of this.

                                            — Went from Clojure -> Haskell

                                            1. 8

                                              Don’t waste your breathe dude there is a much simpler rebuttal: static and dynamic types are a false dichotomy so any attempt to discount one in context of the other is an invalid argument

                                              Instead weight the specific features of the type systems against each other. In the case of Haskell vs Clojure, GHC gives you much more feedback than the entire Clojure ecosystem of spec/schema etc so if the criteria for choosing a language is type safety then Haskell wins.

                                              1. 12

                                                static and dynamic types are a false dichotomy

                                                Absolutely. People have written some useful stuff about this:

                                                https://existentialtype.wordpress.com/2011/03/19/dynamic-languages-are-static-languages/

                                                1. 2

                                                  I actually don’t totally agree with that article. I’m channeling Foucault here: our understanding of type systems is dependent on the context of type theory. How do we know our type theory is the most true? Sure it allows us to prove lots of things, but that is a pragmatic measure. If type theory has pragmatic value, then our type systems should be measured pragmatically as well. Clojure actually wins in many respects because you get lots of monadic interfaces (the collection abstraction) that allow for reusuable code without the difficulty of the IO monad etc. This is not true in many other dynamic languages, e.g. Python.

                                                  So the “false dichotomy” is really apparent when you compare the monadic-ness of Python, C, Haskell, and Clojure. Clearly Clojure and Haskell fall into the “monadic” category and Python and C into the “non-monadic”. This “monadic-ness” distinction tells us more about how composable functions and data structures are than the “static” or “dynamic” distinction, which means that “static” and “dynamic” categories aren’t the most useful categories to put languages into.

                                                  1. 3

                                                    How do we know our type theory is the most true?

                                                    Is this really the question at stake? If a language tells me a variable is a certain type, I tend to believe it. Truth doesn’t seem to be an appropriate axis for evaluating type systems.

                                                    There’s definitely lots of variety regarding what a type system can tell you. The static/dynamic distinction is about when you can know the type information. Static languages allow you to know the types at compile time, whereas dynamic languages force you to wait until runtime.

                                                    1. 1

                                                      Is this really the question at stake? If a language tells me a variable is a certain type, I tend to believe it. Truth doesn’t seem to be an appropriate axis for evaluating type systems.

                                                      Within the context of your language, a variable has a type; this is the truthiness of that type. Within the context of type theory, the type system of that language has a certain truthiness to it. Within the context of the real world, type theory has a truthiness to it. The implied question in most static vs dynamic debates is, how true is my type system? Which begs the question, how true is type theory?

                                                      The answer is that it doesn’t matter. What matters: how useful is type theory generally? How useful is the specific type system I’m using? The static vs dynamic distinction doesn’t do much to help us answer this usefulness question. Understanding when types become relevant in a language definitely does help you, so I agree with you on that, but there are other more interesting aspects of type systems too.

                                                      1. 3

                                                        Maybe it’s just me but I really can’t follow.

                                                        Within the context of your language, a variable has a type;

                                                        Sure, no qualms yet.

                                                        this is the truthiness of that type.

                                                        Completely lost. What is the truthiness? I don’t know what your pronoun “this” is referring to.

                                                        I think I will continue to be lost until you can explain what it means for a type system to be false. “True” does not signify anything to me here.

                                                  2. 1

                                                    That is the most absurd thing I’ve read. Static typing is necessarily more restrictive because you have to express yourself in a way that can be verified by the type checker at compile time. Thanks to Godel, we know that the set of provable statements is a subset of all valid statements. Since dynamic languages do not attempt to prove correctness, they allow you to make many interesting statements that are impossible to make in a static language. Here’s a trivial example:

                                                    (eval ’(defn add-one [x] (inc x)))

                                                    (add-one 10)

                                                    You should also read this in depth rebuttal to the link you posted http://tratt.net/laurie/blog/entries/another_non_argument_in_type_systems.html

                                                    1. 14

                                                      That whole rebuttal rests on a conflation of dynamic types and static types. This is the standard stance for folks arguing from the dynamic camp (specifically, they’re two of the same sort of thing) and the opposite is so basic to the mental space of the static camp that it almost always goes unspoken.

                                                      In summary, if Bob’s article had explicitly stated that he was solely considering static expressivity… I would have agreed with it.

                                                      So, yes, Bob did not state that explicitly but it is taken without need for explicitness within static typing literature.


                                                      My take is that there are two “kinds of power” any formal language may be judged upon and that they are in tension. They are: the power to create and the power to analyze (deconstruct).

                                                      The more power to create a formal system affords the greater richness of expression exists and the more complex of “values” (whatever that might mean) can be constructed.

                                                      The more power to analyze a formal system affords the greater richness of semantics the values of the language have and the greater information can be extracted from each one.

                                                      Types are an alternative semantics to runtime evaluation. Thus, asking for types is asking for greater power to analyze and comes at a cost of power to construct.

                                                      From here, an argument must move on to tradeoffs between power to analyze and power to construct. Why wouldn’t you want to just side entirely with power to construct?

                                                      Most arguments of the static typing apologist end up being arguments for the value of power to analyze. Essentially, having more parallel semantic interpretations makes the system you’re working with more stable (fewer bugs, easier refactoring, partial verified documentation) and also opens up doors for other systems to exploit those other interpretations (typeclass resolution, servant, dependent types at large).

                                                      So which is better?

                                                      This is just a value judgement now. Nobody can answer for anyone else.

                                                      But I think that power to construct/power to analyze tradeoffs happen constantly so it’s a good concept to have an eye on.

                                                      1. 1

                                                        I pretty much agree with what you’re saying, and that’s ultimately the tradeoff between static and dynamic typing. Depending on your situation one or the other might be preferable.

                                                        1. 4

                                                          I think an extension on this is that the technology already exists to make those tradeoffs locally within a single program. Haskell excels at this

                                                          • the ability to embed DSLs and effect contexts lets you pick and choose this power tradeoff
                                                          • the existence of the IO monad (as much as it is sometimes maligned) means that there’s an “escape hatch” to compare all of your more analyzable structures against
                                                          • the existence of the Dynamic type as well as other “semi-dynamic” types like Json gives you opportunity for greater power at the cost of analyzability

                                                          I think what’s most interesting is that the third point about is rarely explored. The real place where this exploration seems to happen is in row typing/structural typing a la Purescript or OCaml, but we could be building more and more sophisticated ways of working with the Json type without constraining its type.

                                                          I think the lack of this development is why Haskell is really bad for exploratory data analysis, but it’s mostly the lack of exploration/development rather than a total failure, I believe.

                                                          On the other hand, gradual typing systems also are an attack at the middle ground but I’d place a smaller bet on them. Ultimately, I think it’s easier to “forget” analytical power and regain constructive power than it is to go the other way. I’d be willing to put some money on this being a very fundamental logical tradeoff.

                                                          So: where are the research projects for EDA in Haskell? I’d love to see them.

                                                          1. 2

                                                            While this is all true, the actual workflow is quite different than it is in a language like Clojure. A lot of the time I don’t know the solution up front, and I don’t know what the types are going to be in the end. It’s actually valuable to me to be able to work with an incomplete solution. With Clojure, I can use the REPL to explore the shape of the solution. I can also do a workflow that’s very similar to type driven development with Spec as seen here.

                                                            Personally, I find that I generally want a specification at the API level, as opposed to having to structure all my code around types. For me, Clojure default is simply more ergonomic. Meanwhile, Spec provides me both with a sufficient guarantees regarding what the code is doing, and a meaningful semantic specification.

                                                      2. 3

                                                        Any dynamic procedure can be implemented in any turing-complete static language; you just have to express your uncertainty about its runtime behavior in the type system. Even in a total language, you can build your entire program out of partial functions just by modeling that partiality with Maybe return values. Even in a pure language, you can build your entire program out of impure functions just by modeling that impurity with an IO monad or something similar. Even in a language with strong data types, you can write all the dynamically-typed code you want just by creating a large sum type of possible cases and using those overly-broad types everywhere (which will almost certainly require modeling some partiality as well). In general, you can write arbitrarily dynamic code if you’re willing to get very few useful guarantees from your type system. That may sound bad, but it’s no worse than writing in a dynamic language, where you get no static guarantees at all from your language (except maybe some very broad invariants about, for example, memory safety).

                                                        1. 1

                                                          And everything you can do in a language like Haskell, can be done using assembly. Yet, I think we’ll agree that ergonomics of writing assembly are not the same. It’s simply not interesting to discuss what’s possible in principle, it’s the actual workflow that the language facilitates that matters. The way you express yourself in a dynamic language is very different from the way you express yourself in a static one.

                                                  3. 7

                                                    I watched Rich’s talk a couple of times, and I think that positional issues can’t be dismissed. Even if you have record syntax for data, to my knowledge you don’t have this for types. Every time I pull up one of the more complex Haskell libraries, I first have to spend a bunch of time deciphering what I need to put in each position of the types. You end up having to rely on convention and hoping people don’t reorder.

                                                    named arguments are somewhat self-documenting and help with the debugging scenario as well.

                                                    I like that ADTs end up breaking all over the place in code when I change things, but I also like how my Python code and things likes dictionaries mean that I can work on objects by concentrating only on the parts that “matter”. It’s a tension, but I don’t think it’s a dicothomy and we can have solutions.

                                                    ADTs that represent objects (Money or Complex) are definitely of the kind where you want to reconfirm everything, but what about ADTs for configuration objects? Does adding another configuration parameter really require auditing everything? The canonical solution might be to decompose that a bit further to avoid this, then you run into the acretion problem. I’m writing wrappers upon wrappers for things.

                                                    It’s a shame that Rick decided to be so rant-y, because there were some valid points in there. The common ADT patterns force us down a very specific kind of static verification that end up shoehorning us into verbose patterns. I disagree about overall maintenance, but it does lead us to boilerplate.

                                                    1. 7

                                                      The issue with losing positional arguments in types is that it’s desirable to be able to talk about partial application without creating new type names (as equality in types gets more complex you run into hairy, very difficult problems quickly, so: if you solve partial application with aliases, how do aliased types equate?).

                                                      For instance, Scala has the worst of both worlds: positional types and very, very painful partial application.

                                                      With respect to your earlier statements though I agree a lot. ADTs are overly strict in that they represent a demand of both existence and completeness. Most Python idioms rely on an informal “structural subtyping” formalism (what people call duck typing sometimes) which is verrrry nice.

                                                      This works very well in the statically typed world: see OCaml or Purescript’s record types (and open variants).

                                                      1. 1

                                                        Most Python idioms rely on an informal “structural subtyping” formalism (what people call duck typing sometimes) which is verrrry nice.

                                                        I guess you’re referring to stuff like __iter__ and __str__ etc? It’s been a long time since I used Python, but I did like those “duck typed interfaces”.

                                                        But did you know there’s something similar in Clojure, with “protocols”? :)

                                                        1. 2

                                                          I meant to point more at the heavy use of dictionaries in Python, though the more I think about it the less it seems like a clear example.

                                                          Clojure really is one of the clearest examples. Idiomatic Clojure just involves passing dicts around and assuming you’ve got the right keys, hoping that there’s a solution (is there an intersection between the keys produced and the keys needed across all points in dataflow?). This is quintessentially “duck typing” and I think it’s a practice that’s well explored in the world of row types.

                                                      2. 2

                                                        It’s a shame that Rick decided to be so rant-y

                                                        How exactly was he ranty?

                                                      1. 4

                                                        Aren’t the safety features designed to carry zero runtime cost? The costs are incurred at compile time, so I fail to see how performance of Rust programs would suffer compared to C due to any added safety.

                                                        1. 2

                                                          They have runtime cost because the language won’t let you write the most performant solution. And like, there’s mundane stuff like bounds checking.

                                                          1. 4

                                                            The language does let you write the most performant solution, you just might need to use unsafe, depending on what problem you’re trying to solve. I do a lot work in Rust on text search and parsing, for example, and I rarely if ever need to write unsafe to get performance that is comparable to C/C++ programs that do similar work. On the contrary, I wrote some Rust code that does compression/decompression, and in order to match the performance of the C++ reference implementation, I had to use quite a bit of unsafe.

                                                            And like, there’s mundane stuff like bounds checking.

                                                            Well, firstly, bounds checking is trivial to disable on a case-by-case basis (again, using unsafe). Secondly, bounds checking so rarely makes a difference at all. Notice that the OP asked “nearly as fast as C.” I think bounds checking meets that criteria.

                                                            1. 1

                                                              I wrote some Rust code that does compression/decompression, and in order to match the performance of the C++ reference implementation, I had to use quite a bit of unsafe.

                                                              Why specifically? That kind of code is common in real world. Figuring out how to do it safely in Rust or with an external tool is worthwhile.

                                                              1. 2

                                                                Unaligned loads and stores. Making it safe seems possible, but probably requires doing a dance with the code generator. That’s the nice part about Rust. If I want to do something explicit and not rely on the code generator, I have the option to do so.

                                                                It’s worth pointing out that the library encapsulates unsafe, so that users of the library need not worry about it (unless there’s a bug). That’s the real win IMO.

                                                                1. 2

                                                                  That makes sense. I recall that some projects mocked up assembly in their safe language or prover giving the registers and so on types to detect errors. You thought about doing such a routine in a Rust version of x86 assembly to see if borrow checker or other typing can catch problems? Or would that be totally useless you think?

                                                                  1. 3

                                                                    It sounds interesting, and I hope someone works on it, but it’s not personally my cup of tea. :-) (Although, I might try to use it if someone else did the hard work building it! :P)

                                                                    1. 3

                                                                      I’ll keep that in mind. For your enjoyment, here’s an example of what’s possible that’s more specialist rather than just embedding in Rust or SPARK. They go into nice detail, though, of what benefits it brought to Intel assembly, though.

                                                                      https://lobste.rs/s/kc2brf/typed_assembly_language_1998

                                                                      They also had a compiler from a safe subset of C (Popcorn) to that language so one could mix high-level and low-level code maintaining safety and possibly knocking out abstraction gaps in it.

                                                                    2. 1

                                                                      “typed assembly language” makes me think of LLVM IR :P

                                                              2. 3

                                                                Bounds checking almost never matters on modern CPUs. Even ignoring the fact that Rust can often lift the bounds-check operation so it’s not in the inner loop, the bounds-check almost never fails, so the branch predictor will just blow right past. It might add one or two cycles per loop, but maybe not.

                                                            2. 4

                                                              Or, in many cases: Rust.

                                                              1. 0

                                                                I didn’t mention it for “more readable code” as question asked. Rust has a steep learning curve like Ada does. I didn’t mention it either. Neither seems to fit that requirement.

                                                                1. 1

                                                                  Can you compare and contrast your own personal learning experience between Ada and Rust?

                                                                  1. 3

                                                                    Lots most of my memory to a head injury. It’s one of those things I don’t remember. I’m going to have to relearn one or both eventually. I have a concept called Brute Force Assurance that tries to reduce verification from an expensive, expert problem to a pile of cheap labor problem. The idea is a language like Scheme or Nim expresses the algorithms in lowest-common denominator form that’s automatically converted to equivalent C, Rust, Ada, and SPARK. Their tools… esp static analyzers or test generators in C, borrow-checker in Rust, and SPARK’s prover… all run on the generated code. Errors found there are compared against the original program with it changed for valid errors. Iterate until no errors and then move on to next work item.

                                                                    I’ll probably need to either have language experts doing it together or know the stuff myself when attempting it. I’m especially interested in how much a problem it will be if underlying representations, calling conventions or whatever are different. And how that might be resolved without modifying the compilers by just mocking it up in the languages somehow. Anyway, I’ll have to dig into at least Rust in the future since I probably can’t recreate a borrow-checker. Macroing to one from a Wirth- or Scheme-simple language will be much easier.

                                                                1. 3

                                                                  A term of service doesn’t magically allow you to take millions from other people’s wallet just because they sent a bunch of internet packets to your server…

                                                                  1. 2

                                                                    I agree. Using Wikipedia for advertising purposes is by far the most egregious breach of internet norms here. Anyone or anything being able to activate Google Home is a well-known feature/mistake, and is in fact its explicitly intended behavior, so BK hijacking it, while irritating, shouldn’t really come as a shock to anyone. Threatening Wikipedia’s impartiality, however, is genuinely new (AFAIK) and unintended, and that’s a little terrifying.

                                                                  1. 6

                                                                    The most useful thing I’ve ever read about indexing is that indices can be interpreted as pointing between elements, not at elements. In other words, indices are points, and items are intervals between those points. In this view, the difference between 0-indexed and 1-indexed system isn’t really a difference in the interpretation of the indices so much as a difference in the behavior of the indexing operator – does it choose the cell to the left or right of the index?

                                                                    Phrased this way, it seems pretty natural to me to choose the cell on the right (that is, index from 0), since it’s very common to refer to regions and intervals of all kinds by an origin and a positive “extent” beyond that origin – think rectangles in graphics libraries (x, y + width, height) or intervals of time in common speech (e.g. “the 1990s”). It’s very rare to refer to an interval by a point on its right and to then assume it to have negative extent, and that’s exactly what 1-indexing does.

                                                                    Thinking of intervals as points between indices with an origin at 0 has a lot of practical benefits. Most importantly, it becomes immediately obvious why, when representing ranges of elements, lower bounds should be inclusive and upper bounds should be exclusive; the items between the lower and upper bound are the items contained in the range, and that range includes the element to the right of the lower bound but not the element to the right of the upper bound. This also makes it crystal clear why, in such a representation of ranges, the length of the range is the difference between the upper and lower bounds. Indices-as-points also makes modular arithmetic and floor and clamping operations a lot easier in my experience, because you can get back to thinking of your indices as ordinary numbers on a number line and stop thinking of them as buckets. Finally, this whole interpretation works very nicely in image processing applications, where it is complementary to the interpretation that pixel values are samples of a function at points, which is often a very useful way of looking at things.