Threads for chriswarbo

  1. 1

    Could something like this be accomplished in Zig via comptime?

    1. 1

      In theory, you could do it with any static language if you write a verification, condition generator to integrate with Why3 or Boogie. They do the proving. A language with metaprogramming might need an initial pass that does compile-time evaluation. Metaprogramming and dynamic languages are difficult in general. Worst case, you can use subsets and/or annotations to aid the analyses.

      1. 2

        That reminds me of the different approaches to handling declarative dependencies in Nix (in my case that’s mostly Haskell libraries with version bounds):

        • One approach is to have our Nix function (e.g. buildHaskellPackage) implement a constraint solver, which reads in version bounds from each version of each dependency, and picks a mutually-compatible set of dependencies to build.
        • A more practical approach is to just shell-out to an existing solver (cabal in Haskell’s case) and parse its output.

        Whether such build-time analysis is performed “within” the language via macros, or externally as a separate step of the build process, the same solvers can be called and the end result is the same (for checkers like this, there’s also nothing to parse: if a solution is found then we throw it away and carry on to the next step, if not we exit with an error message).

        I used to dislike the thought of performing I/O at compile-time, but I’m seeing more and more compelling use-cases: shelling out to competent solvers is one; “type providers” like F# are another (where types can be generated from some external source of truth, like a database schema; ensuring out-of-date code fails to build). One I’ve used recently was baking data into a binary, where a macro read it from a file (aborting on any error), parsed it, built a datastructure with efficient lookups, and wrote that into the generated AST to be compiled. This reduced the overhead at runtime (this command was called many times), and removed the need for handling parse errors, permission denied, etc.

        1. 2

          Yeah, integration with external tool can help in all kinds of ways. The simplest is static analysis to find code-level issues the compiler can’t find. I really like your idea of baking the data into a binary. It’s like the old idea of precomputing what you can mixed with synthesis of efficient, data structures. That’s pretty awesome.

          Actually, I’ve been collecting, occasionally posting, stuff like that for formal methods. Two examples were letting someone specify a data structure in functional way or modify/improve loops. Then, an external tool does a pass to make an equivalent, high-performance, imperative implementation of either. Dumps it out as code. Loop and data structure examples. Imperative/HOL’s technique, if generalizable, could probably be applied to languages such as Haskell and Rust.

    1. 2

      I see it uses SMT solvers to check the annotations. I know that Z3 is quite impressive, but am interested in how scalable this would be. Does the language design ensure these checks are kept ‘local’ (i.e. adding N definitions/calls adds O(N) time to the build), or can the constraints interact over a longer distance (potentially causing quadratic or exponential time increase)? I’d also like to know if there are common failure-modes, e.g. where some common code pattern can’t be solved without extra annotations.

      For example, the classic “hello world” for dependent types is to define Vector n t (lists containing n elements of type t), and the function append : Vector n t -> Vector m t -> Vector (plus n m) t whose output length is the sum of the input lengths. Woe betide anyone who writes plus m n instead, as they’d have to prove that plus commutes!

      1. 3

        With SMT based verification, each property is proved separately and assumed true when proving others. So there’s no problem with locality. The problems are very different vs. dependent types. SMT is undecidable, so unless the tool is very careful about using decidable logic (like Liquid Haskell is), you’ll very often see that the solver just times out with no solution, which is pretty much undebuggable. It’s difficult to prove anything through loops (you have to write loop invariants), etc.

        1. 2

          In this case, SMT isn’t undecidable: a friend pointed out the write!(solver,"(set-logic QF_UFBV)\n").unwrap(); line to me, which means “quantifier free with uninterpreted functions and bitvectors”. It’s decidable, just super hard.

          1. 2

            Yeah, if you can get away with quantifier-free, everything is much better. But you can’t do much with C code without quantifiers. zz probably doesn’t support verifying loops.

            1. 2

              I agree, unless ZZ has its own induction system for loops, using SMT only to prove the inductive hypothesis. It’s a lot of work though.

        2. 2

          Does the language design ensure these checks are kept ‘local’ (i.e. adding N definitions/calls adds O(N) time to the build), or can the constraints interact over a longer distance (potentially causing quadratic or exponential time increase)?

          I’m also interested in this in terms of API stability. If interactions happen at a longer distance it can be difficult to know when you are making a breaking change.

          1. 2

            Z3 “knows” addition commutes, so that’s no problem. Usual trouble with dependent type is that you define addition yourself, so it is hard for Z3 to see your addition is in fact addition.

            1. 1

              Yes, I gave that example for dependent types (the type requiring a different sort of branching than the computation) since I’m more familiar with them.

              The general question was about the existence of pathological or frustrating cases in this SMT approach (either requiring much more time, or extra annotations) arising from very common assertion patterns, or where the/a “obvious” approach turns out to be an anti-pattern.

          1. 3

            It is always super weird to see someone quote me. Really cool! But also super weird. I’m not fully adjusted to that yet ;)

            If you find Haskell’s type system interesting, I’d also recommend checking out an ML! The module system in ML is really cool, powerful, and unique. I’m shocked more languages haven’t adopted similar module systems.

            Also, a couple of quick pedantic nitpicks:

            I was reading through some of Hillel Wayne’s blog posts, and one of them discussed the Curry-Howard correspondence, which somehow proves a 1:1 association between aspects of a mathematical proof and aspects of a type system. You can convert a proof into a type, if your type system supports it. I think Haskell’s type system respects this. I don’t think Python’s type system does.

            I don’t think I explained the CHC very well in that. All sound static type systems are “proofs”, where what you are proving is “this is well-typed”. This is true for both Haskell and Python (mypy). But different languages can encode different consequences of being well-typed, like if you can express the type of sorted lists, you can say “I’ve proven this is well-typed ⇒ this sort function correctly sorts the list.” Haskell’s type system is more powerful than most other commercial languages, but it’s not powerful enough to encode arbitrary theorems. For that you need dependent typing a la Idris or Coq.

            Also proving type-safety for arbitrary theorems is really, really hard. Tradeoffs!

            Python has hypothesis, but having used it a tiny bit for an open-source contribution, I don’t think it’s the same without the Haskell type system.

            Gotta defend hypothesis a bit here! It sounds like you were using quickcheck in the context of learning from a book, while you were using hypothesis for a real-world project. In that context quickcheck is gonna seem more elegant. But Hypothesis is world-class and a lot of more modern PBT libraries follow its example. It’s a lot better at generating inputs like “two dictionaries where the values of the first dict are a superset of the keys in the second.”

            1. 2

              OCaml seems really cool, and I’ve heard it used in industry; is it the most popular ML for practitioners? I also know of ReasonML used by Facebook to create Facebook Messenger. But I think there’s still so much to learn about Haskell, and between going broad and learning about type systems, I’d want to go deep into Haskell and become intermediate, or even advanced if I’m fortunate enough to have the time and resources to do so!

              Hmm, I didn’t know that Idris or Coq would be powerful enough to encode arbitrary theorems. I think if I were to study more into type theory (an interesting subject), I’d try to learn those!

              Yeah, I think Corbin mentioned that Hedgehog was closer to hypothesis than QuickCheck was; I think I was too quick in dismissing hypothesis. I do like the notion of using generators over types, and the integrated shrinking (when I wrote this I assumed that QuickCheck had integrated shrinking; I think that’s actually a Hedgehog or maybe hypothesis innovation). I definitely like your blog post point about creating integration tests from a combination of hypothesis and pycontracts, I’ll give that a shot for my next project :D

              1. 2

                QuickCheck does shrinking. The Arbitrary typeclass has a shrink :: a -> [a] method which should return a list of smaller versions of its argument.

                If you don’t want to create an instance of Arbitrary, you can use the function forAllShrink. It’s type is a more polymorphic version of (RandomSeed -> a) -> (a -> [a]) -> (a -> Bool) -> Bool: the first argument is a generating function, the second is a shrinking function and the third is the property to check.

            1. 4

              There are some accurate observations in this article, and it seems that the author has gained a real appreciation for strong, static type systems. That’s great!

              There are also a number of inaccuracies and basic misunderstandings here. Most of the “small nits” section is… frustrating to read.

              Haskell has no null definition of type Char

              I don’t understand what the author wants. A String is a sequence of Chars; that sequence can be empty. What would an empty Char even be? And why isn’t Maybe Char satisfactory for their use-case?

              Int and Integer are two different types

              Yes, they are. Integer is a dynamically-resized integer equivalent to Python’s integer type. Int is a fixed-size integer of at least the range [- 2^29, 2^29 - 1], similar to C’s long. If that range is exceeded, the behaviour is undefined.

              ghci> maxBound :: Int
              9223372036854775807
              ghci> maxBound :: Integer
              
              <interactive>:3:1: error:
                  • No instance for (Bounded Integer)
                      arising from a use of ‘maxBound’
              

              Haskell can raise errors partway through evaluating an error message.

              I’m not sure what they mean, but it seems the author is just discovering lazy evaluation. In Haskell, expressions are not evaluated until their results are needed. In map (+1) [2,3,undefined], the thing that “needs” the result is the print function which is implicitly called by every line.

              print will print the first thing of the list, (+1) 2. This must be evaluated first, and then it’s printed. Then print waits for the next thing in the list, (+1) 3, which also needs to be evaluated before being printed. Finally print needs (+1) undefined to be evaluated, which causes the program to stop.

              1. 2

                Haskell can raise errors partway through evaluating an error message.

                I’m not sure what they mean, but it seems the author is just discovering lazy evaluation.

                I have to admit, I hit this occasionally and I’ve been programming in Haskell for years. I don’t like unhelpful error messages, so if I find myself writing a partial function I’ll at least add an explicit branch for the unwanted case (rather than leaving it to a generic “unmatched pattern” error), and I’ll put in a call to error "..." with something useful and pertinent about what happened. Sometimes it might be useful to append (some representation/summary of) the inputs to the error message, to make it clear what went wrong (e.g. "foo/bar" isn't an absolute path). Sometimes those inputs can raise their own errors (e.g. if we branched on whether the first character is /, there would still be an error lurking in something like List.intercalate "/" ["foo", undefined]).

                As a good rule of thumb: when writing error handling (even if it’s just reporting, rather than recovery), be extra cautious and don’t necessarily assume that your program’s invariants still hold (after all, something had to cause the error!).

              1. 19

                As somebody who first learned Python, and then Haskell, and then left both, this review reminds me of many of my own experiences.

                I was reading through some of Hillel Wayne’s blog posts, and one of them discussed the Curry-Howard correspondence, which somehow proves a 1:1 association between aspects of a mathematical proof and aspects of a type system. You can convert a proof into a type, if your type system supports it. I think Haskell’s type system respects this. I don’t think Python’s type system does.

                Yes, but undefined happens to be a value of every type, in Haskell; this corresponds to being able to prove any statement, including the false ones. So while it is indeed possible to turn proofs into programs, it’s not possible to turn Python or Haskell programs into useful valid proofs.

                (Somebody may pedantically note that every Python or Haskell program gives a proof, just not of the fact that we might hope that they prove; they usually have Python-shaped or Haskell-shaped holes in them.)

                I don’t think you can do anything like [fmap] with Python or any other practitioner’s language that I’m familiar with.

                I think that this is part of the roots of the Haskeller memetic complex. Because Haskell has the lightweight syntax of an ML, and its prelude is rich in algebraic tools, suddenly there is a blindness as to the fact that the tools have been there all along. For example, in Python 3:

                >>> plusOne = lambda x: x + 1
                >>> lmap = lambda f: lambda xs: [f(x) for x in xs]
                >>> nada = object()
                >>> mmap = lambda f: lambda x: nada if x is nada else f(x)
                >>> dot = lambda f: lambda g: lambda x: f(g(x))
                >>> zf = dot(lmap)(mmap)(plusOne)
                >>> zf([1, 2, 3])
                [2, 3, 4]
                >>> zf([1, nada, 3])
                [2, <object object at 0x7ff65c5db0d0>, 4]
                

                Even the base error type, bottom or |, is defined as an evaluation that never completes successfully, either because a computation failed, or because there’s an infinite loop. … I don’t think Python’s BaseException has a built-in notion of infinite loops evaluating to an error condition. This appears super powerful to me, because the big error type introduced by going over network connections are network timeouts, which appear to most programs as infinitely long computations.

                There are many opinions on this, but I am of the opinion that the sort of incompleteness generated by an unresponsive network peer, known as FLP impossibility or perhaps FLP “incompleteness” of network requests, is not the same as the incompleteness that Turing, Gödel, Tarski, Lawvere, and others showed via diagonalization. They are connected, in that the FLP result implies that a Turing machine which simulates many computers on a network may encounter a Turing-complete subproblem which prevents the simulated network peers from responding.

                In Alice-and-Bob framing, suppose Alice is waiting for Bob to reply over the network. The FLP impossibility of Alice having to wait forever for Bob’s reply is connected to the Turing undecidability of whether a simulation of Alice and Bob will hang forever while simulating Bob’s preparation of their reply.

                Haskell has Test.QuickCheck, a property-based testing framework that generates values to check a declared property. Python has hypothesis, but having used it a tiny bit for an open-source contribution, I don’t think it’s the same without the Haskell type system.

                I think that this is another memetic root; specifically, the root is type-driven dispatch. To compare apples to apples, a better analogue of QuickCheck in Python might be PayCheck, while Hedgehog is closer to Hypothesis in Haskell. As soon as the one-dimensional comparison becomes two-dimensional, the flexibility of Hypothesis, and crucially the ability to select arbitrary strategies for generating test cases, far outweighs the convenience of simple type-checking. This makes sense, in a certain way; if Haskell’s type system were sound in the first place, then QuickCheck would be little more than a totality checker. Hypothesis’ own author recommends Hedgehog over QuickCheck, and in my language, Monte, I implemented both a classic unit-test runner and also a Hypothesis-style property test runner.

                Ultimately, I don’t think [indentation] is a big issue. Haskell has code auto-formatters like haskell-formatter, much like Python’s black. Use it and move on. Bikeshed about something else, you know?

                Since you didn’t mention it, and none of your code samples seem to use it, and to give an example of something to bikeshed over, it’s worth pointing out that Haskell supports curly-brace-oriented syntax as well as indentation. This is a recurring topic in programming language design: We want code to be readable, and that leads to caring about whitespace.

                1. 9

                  This makes sense, in a certain way; if Haskell’s type system were sound in the first place, then QuickCheck would be little more than a totality checker.

                  I don’t think that follows unless you’re using incredibly specific types. There are many total functions of type Ord a => [a] -> [a], but not all of them would pass the tests of a sorting function.

                  1. 3

                    This is a fair observation. QuickCheck test selection indeed is driven not just by types, but by typeclasses and associated behaviors, and I was remiss to dismiss it so glibly.

                    The main desire is that one might express algebraic laws, and ultimately all of these testing tools are only approximations to actual equational reasoning. That doesn’t preclude a type system from giving useful annotations.

                  2. 4

                    Wow, amazing comment! I understand about half of this. I wish I had more than one upvote to give you.

                    I am really curious as to whether Python’s NoneType would map to Haskell’s bottom in that regards. Would that be the Python-shaped hole you’re talking about?

                    I did see a tutorial by David Beazley on implementing a lambda calculus in Python. I didn’t think it was all that practical tbh, and I think he said the same in the video: https://www.youtube.com/watch?v=pkCLMl0e_0k

                    Could you give a brief summary of how FLP impossibility and Turing incompleteness are different? I know of “Turing completeness” and that it’s a thing, but my knowledge ends there. I haven’t heard of FLP impossibliity before. Bookmarked that website.

                    Huh, the author of hypothesis says QuickCheck doesn’t have integrated shrinking. From his statement on generators rather than type classes, I understand your point on how Hedgehog and Hypothesis are more similar.

                    Hmm, there weren’t a whole lot of examples of curly-brace oriented syntax. Most examples used indentation. The Haskell wiki does mention it though: https://en.wikibooks.org/wiki/Haskell/Indentation#Explicit_characters_in_place_of_indentation

                    (I do care about whitespace and clean code; I read through “Clean Code” my second year out of college, but in my experience as a practitioner so far I’d rather use a tool and automated process to have a consistent way of doing things with multiple people, in order to spend political capital elsewhere.)

                    1. 10

                      (Phew this got long, time to turn it into a blog post ;) )

                      I am really curious as to whether Python’s NoneType would map to Haskell’s bottom in that regards.

                      It’s important to distinguish between the empty type and the unit type. In (type) theory there are no values of the empty type, and there is one value of the unit type. In a language without _|_, like Agda, we might define them something like this:

                      data Empty : Type
                      
                      data Unit : Type where
                          unit : Unit
                      

                      This says that Empty is a Type, that Unit is a Type, that unit is a Unit. Since we’ve given no contructors (or “introduction forms”) for Empty, there’s no way to make one. Likewise we can write function types like Int -> Empty which have no values (since there’s no way for any function to return a value of type Empty), we can write tuple types like Pair Int Empty that have no values, etc. We can also write function types like Empty -> Int which do have values (e.g. const 42), but which can never be called (since there are no Empty values to give as an argument). Incidentally, the fact that we can have types with no values is one of the reasons we can’t have a value with type forall a. a (in Haskell) or (a : Type) -> a (in Agda): they claim to return a value of any type, but we might ask for an Empty.

                      The unit type is trivial, literally. Since there’s only one value of this type (unit), whenever we know that a value will have type Unit we can infer that the value must be unit, and hence (a) if it’s a return value, we can always optimise the function to simply unit and (b) if it’s an argument, it has no effect on the return value (there’s nothing to branch on). In this sense, the unit type contains 0 bits of information (compared to Bool which contains 1 bit).

                      As an aside, Either is called a “sum type” because it contains all the values from its first argument plus those of its second: Either Unit Unit is equivalent to Bool (with values Left unit and Right unit). Pairs are called “product types” since they contain every combination of their first argument and second arguments, which is the number of values in each multiplied together. In this sense Empty acts like zero: the type Either Zero a doesn’t contain any Left values, so it’s equivalent to a, just like 0 + a = a for numbers. Likewise Pair Empty a contains no values, since there’s nothing to put in the first position, just like 0 * a = 0 for numbers. Either Unit a acts like 1 + a, since we have all of the values of a (wrapped in Right) plus the extra value Left unit; note that this is also the same as Maybe a, where Right acts like Just and Left unit acts like Nothing. Pair Unit a is the same as a since each a value can only be paired with one thing, and we know that thing is always unit; this is just like 1 * a = a with numbers.

                      Back to your question: things get confusing when we try to apply this to “real” languages, especially since a lot of the terminology is overloaded. In Haskell the type Unit is written () and the value unit is also written ().

                      In Haskell the type Empty is called Void, but that is not the same as void in languages like Java! In particular, remember that we cannot have a function which returns Empty, so void being Empty would mean we could not implement methods like public static void putStrLn(String s). Such methods certainly do exist, since they’re all over the place in Java, but what happens if we call them? They will run, and return a value back to us (hence it can’t be Empty). What is the value we get back? We know, without even running the method, that we’ll get back null. That’s just like the Unit type (where we know the value will be unit without having to run the code). Hence Java’s void acts like Unit, and null acts like unit.

                      If we follow a similar line of reasoning in Python, we find that None acts like unit (or Java’s null) and hence NoneType is like Unit (or Java’s void). AFAIK Python doesn’t have anything which acts like Empty (Haskell’s Void) since, lacking any values, Empty is only useful for type-level calculations (of the sort which get erased during compilation), which Python doesn’t tend to do.

                      That just leaves “bottom”. There are a couple of ways to think about it: we can be “fast and loose” where we ignore bottom as a nuisance, which mostly works since bottom isn’t a “proper” value: in particular, we can’t branch on it; hence any time we do a calculation involving bottoms which results in a “proper” value, we know that the bottoms were irrelevant to the result, and hence can be ignored. Any time a bottom is relevant, we have to abandon our nice, logical purity in any case, since catching them requires IO, so why bother complicating our pure logic by trying to include them?

                      Alternatively we can treat bottom as an extra value of every type, in a similar way to null inhabiting every type in Java. From this perspective Haskell’s Void type does contain a value (bottom), and the () type contains two values (() and bottom). In this sense we might think of Java’s void corresponding to Haskell’s Void, but I find this line of thinking difficult to justify. In particular, we can branch on null in Java (in fact we should be doing such “null checks” all the time!), whereas (pure) Haskell doesn’t even let us tell whether we have () or bottom.

                      As a final thought, whenever comparing dynamic and static languages, it’s important to not conflate different concepts which just-so-happen to have similar names. In particular I find it useful to think of dynamically typed languages as being “uni-typed”: that is, every value has the same type, which we might write in Haskell as a sum like data DynamicValue = S String | I Int | L List | O Object | E Exception | .... Python is actually even simpler than this, since “everything is an object” (not quite as much as in Smalltalk, but certainly more than e.g. Java), so Python values are more like a pair of a class (which itself is a Python value) and a collection of instance properties.

                      This is important because “dynamic types”, like in Python, aren’t directly comparable to “static types” like those of Haskell; they’re more comparable to “tags” (e.g. constructors). In particular, think about a Python function branching based on its argument’s “dynamic type”; this is the same as a Haskell function branching on its argument’s constructor. What does this tell us about “bottom” in Python? From this perspective, there isn’t one (at least not reified; an infinite loop might be comparable, but it’s not something we can e.g. assign to a variable). Python’s None is just a normal value like any other, in the same way that Haskell’s Nothing is a normal value (we might sometimes use it to stand for an error, but that’s not inherent to the system); likewise Python’s exceptions are also normal values (we can assign them to variables, return them from functions, etc.); the idea of “throwing and catching” (or raise/except for Python) is actually a perfectly normal method of control flow (it’s actually a limited form of delimited continuation), and this is orthogonal to error representation and handling.

                      This makes raising an exception in Python very different to triggering a bottom in Haskell, since Haskell provides no way to branch on a bottom (or whether we even have one). In Python we can raise a value (with the exception “tag”) as an alternative to using return, and we can catch those values, inspect their tag and value, etc. to determine what our overall return value will be, with none of this being visible by the caller. To do anything like that in Haskell we need some “proper” data to inspect and branch on, hence why I say None, exceptions, etc. are just normal values (even though they’re used to represent errors, and we treat them specially because of that; but the same could be said about Either String values in Haskell, for example). Consider that in Haskell the only way to even check if a bottom exists is to use IO, but the idea behind IO is that it’s like State RealWorld, i.e. every IO a value is a pair of (RealWorld, a), so conceptually we never “actually” see a bottom; it’s more like triggering a bottom changes the RealWorld, and we’re branching on that.

                      1. 4

                        Thank you for the clarification on None vs bottom! I think it definitely helps me understand how much more different Python is from Haskell.

                        I like the idea of saying data DynamicValue = S String | I Int | L List | O Object | E Exception for dynamic languages, and I think from a type theory perspective I finally understand what people say when they say “In {Ruby, Python}, everything is an object”.

                        I hate try/catch logic in Python. It works, but I much rather prefer the functorial structure espoused by Haskell.

                        I agree on the description on IO! The book does mention IO is technically without effects, since it is merely a description of what IO actions should be taking place instead of imperatively executing IO.

                        1. 4

                          I like the idea of saying data DynamicValue = S String | I Int | L List | O Object | E Exception for dynamic languages

                          That’s a good intuitive starting point, but it’s not the whole story. The “unitype framework” is a way of understanding dynamic types from a static typing perspective, but we can also start from a “dynamic typing perspective” and see where it gets us. A dynamically-typed language is one where you can manipulate type definitions at runtime, and test for type membership at runtime. So saying DynamicValue = S String | I Int ... will always be incomplete, because we might not know something’s type even after creating it!

                          An intuition I like here is that dynamic types are boolean predicates: x is of type T iff T(x). This means we can define the type of, say, all functions with format as an optional keyword, or the type of all iterables that aren’t strings, or the type of all types that appear as return values in functions.

                          (The downside of this is it gets hellishly confusing very quickly, and allows for a lot of spooky action at a distance. Also, there’s no guarantee that checking type membership is total. That’s one reason even highly dynamic languages shy away from going deep in dynamic types.)

                          I think from a type theory perspective I finally understand what people say when they say “In {Ruby, Python}, everything is an object”.

                          Everything can also be an object in a static system, too. My rule of thumb for what “everything is an object” means: are object methods also objects? Is the call stack an object? If you are using message passing, are the messages objects? Can you send messages to other messages?

                          1. 1

                            RE: “Everything is an object”, I usually put it in quotes since the meaning is usually context-dependent. A colleague once said that to me regarding Java, and I replied that Java classes aren’t objects yet Python classes are; and Python’s if branches aren’t objects yet Smalltalk’s are (at least, the closures are); and so on.

                            RE: Unitypes, etc. yes the static/dynamic distinction can be built up from the dynamic side too (I tend not to think that way since fixed points and things trip me up). The important thing is that people aren’t talking past each other, e.g. it’s fine to argue about whether “foo can/can’t do bar”, but not so useful when each person is using “foo” and “bar” to mean different things ;)

                        2. 3

                          The overloaded word “void” across languages is confusing. In Rust and Typescript (and others?) they called it “never”, which I think makes it really clear: values of this type never actually exist. And “this function returns never” is like a pun: “this function never returns” and “this function’s return type is ‘never’”.

                          1. 3

                            This was wonderfully clear. Thank you.

                      1. 3

                        Tiling WMs never appealed to me for one simple, shallow reason: I like seeing nice decorations on my windows. Having windows overlap doesn’t bother me. Indeed, I occasionally arrange them that way on purpose. I have 11 virtual desktops (KDE), and I map my numpad keys to them, so every desktop is a single keystroke away (modifier keys are not even needed). I can have 30+ windows, many of those having several tabs (web browser, terminal), and everything’s within reach via numpad, Alt-Tab and Ctrl-Tab. This setup has worked very well for me for many years. (Not to be construed as me saying that nobody should use tiling WMs.)

                        1. 3

                          My (tiling) window borders are a single pixel wide; just enough to highlight the focused window in blue, with unfocused in grey. I also don’t see my desktop background much, since windows are arranged to cover the whole screen (usually just one fullscreen window at a time, in my case).

                        1. 2

                          How does a tiling wm work for me? It’s my understanding that I have to set it up “my way”, according to each task in my daily flow. Whatif I don’t have a flow?

                          My usual gnome flow on my desktop is just meta key to find an app and launch it, alt tabbing my way to an open one, and meta+arrow to move my current app to this or that corner. When I hook my windows laptop from work, I might also need to move my thing to another window, ctrl-meta-arrow does it. I usually open terminator where I split it up with term panes with ctrl-e or ctrl-o and then with arrows, as by default I just split it in two panes. I open IntelliJ, and there I sometimes want to fiddle with its panes with a mouse, same with Firefox or Chrome dev tools. Occasionally mail app is open, and a comms like slack or teams.

                          Sometimes I need to adjust the width of the window manually, with the mouse, because I’m really interested in those logs, or that image. Or open an unusual app like calc or excel or whatnot.

                          Luckily, gnome and apps remember where they’ve been last, usually.

                          What would a tiling wm bring me? How would my flow look like? Do I have to setup config for each window?

                          What benefits would I see? Would it bring me discipline? How much work to set up initially and how much work later on? What happens when I open a new app?

                          I can’t find this information online. Anybody has a video or article or presentation at hand to share?

                          1. 3

                            How does a tiling wm work for me? It’s my understanding that I have to set it up “my way”, according to each task in my daily flow. Whatif I don’t have a flow?

                            If you don’t have a particular flow, you can use whatever is the tiling window manager does by default. Usually new programs will open full screen. If there is already a window open, it will split the screen either vertically or horizontally. And so on. Some tiling WMs let you configure how this splitting happens.

                            What would a tiling wm bring me? How would my flow look like? Do I have to setup config for each window?

                            A tiling WM would probably help you spend less time moving windows around. In your case, if you need to look at lots of terminals, or have an IDE + terminal + browser open all at once, you can do that in a way that uses all the screen estate without having to manually drag and resize windows.

                            I don’t know what your flow would look like. Personally I usually have an Emacs window and a terminal or browser open side by side on one workspace. This lets me look at my code and documentation at once easily. On another I will have my chat apps, usually Slack and FB Messenger open side by side, maybe Hangouts too. That way I can see all new messages at a glance. I find this is more useful when plugged into a larger monitor where I have more screen real estate than on my laptop.

                            You don’t have to set up a config at the beginning. You can use it purely interactively and if you find yourself using certain layouts all the time, your WM probably gives you a way of setting up layouts and restoring them or binding them to keybindings.

                            What benefits would I see? Would it bring me discipline? How much work to set up initially and how much work later on? What happens when I open a new app?

                            You will spend less time manually moving windows around and once you find layouts you like, you can save and restore them. I don’t know if it would bring you discipline (I don’t know what you mean by that). You should be able to get set up running without doing any configuration.

                            A few days ago there was a post here about Regolith which is an Ubuntu spin with a beginner friendly tiling setup that you can use to see if you like tiling. If you use macOS I recommend Moom.

                            1. 1

                              Thanks. I might try it sometime, but I’m not convinced. I don’t hear a “killer app” for me.

                              It may be “the dishwasher effect”. Before I had one (grew up without, then went on living without), I never thought it’s a big deal. Once I’ve bought one, I never think I’d go back.

                              But from my safe “defaults only on everything” perspective, I don’t see the incentive.

                              1. 2

                                That’s totally fine. If you are fine using the defaults on whatever system you have and that works for you, then you don’t need to be bothered. Tiling window managers (more specifically, the automated window management they allow) is attractive to me, but it doesn’t have to be for you.

                            2. 3

                              I’ve seen a couple of comments talking about configuration paralysis, but I haven’t really experienced that. I just have everything full-screen all the time; I only switch to a “proper” tiling arrangement occasionally, e.g. if I’ve opened the GIMP; then switch back. Note that this isn’t per-app or configuration-dependent, it just rearranges whatever windows happen to be open on the current desktop.

                              In general, I quickly stopped caring about the size and shape of windows. The only per-app configuration I have is which virtual desktop my auto-start programs appear on, which is the same as I had on floating WMs.

                              1. 2

                                》What would a tiling wm bring me? How would my flow look like

                                Nothing. You would need arrange every window at your desired size and move them cross the workareas spending lot of time learning new keystrokes. This every time you need two new different applications working at the same sight.

                                1. 2

                                  Not certain if this is a disgruntled or a troll reply :)

                                  1. 2

                                    sorry, not intended trolling. Its my opinion based in three years using i3. Sure I’m wrong.

                              1. 27

                                Interesting post. The main issue of Nix is its onboarding curve and lack of simple-to-grok documentation.

                                There’s a few things in this post worth responding to:

                                Now, you may ask, how do you get that hash? Try and build the package with an obviously false hash and use the correct one from the output of the build command! That seems safe!

                                Nix has the prefetch-* commands that can do this for you and output either the hash, or a full Nix expression that refers to that thing.

                                I could avoid this by making each dependency its own Nix package, but that’s not a productive use of my time.

                                It depends. My personal view recently has been that Nix should adopt a more Bazel-like model, in which the Nix language is also used for describing the actual software builds rather than just wrapping external package managers.

                                I have implemented this for Go (buildGo.nix / docs) and Common Lisp (buildLisp.nix), and with the Go one specifically external dependencies can be traversed and automatically transformed into a Nix data structure.

                                For example, here’s the buildGo.nix packaging of golang.org/x/net (from here):

                                { pkgs, ... }:
                                
                                pkgs.buildGo.external {
                                  path = "golang.org/x/net";
                                  src = builtins.fetchGit {
                                    url = "https://go.googlesource.com/net";
                                    rev = "c0dbc17a35534bf2e581d7a942408dc936316da4";
                                  };
                                
                                  deps = with pkgs.third_party; [
                                    gopkgs."golang.org".x.text.secure.bidirule.gopkg
                                    gopkgs."golang.org".x.text.unicode.bidi.gopkg
                                    gopkgs."golang.org".x.text.unicode.norm.gopkg
                                  ];
                                

                                This makes every subpackage available as an individual Nix derivation, which also means that those builds are cached across different software using those dependencies.

                                this is at least 200 if not more packages needed for my relatively simple CRUD app that has creative choices in technology

                                For most mainstream languages generators have been written to wrap 3rd-party package managers automatically. For some languages (e.g. Python), the nixpkgs tree actually contains derivations for all packages already so it’s just a matter of dragging them in.

                                Oh, even better, the build directory isn’t writable.

                                This isn’t true for Nix in general. The build directory is explicitly writable and output installation (into /nix/store) usually (in Nix’s standard environment) happens as one of the last steps of a build.

                                It might be that this was a case of some language-specific tooling implementing such a restriction, in which case there’s probably also a flag for toggling it.

                                You know, the things that handle STATE, like FILES on the DISK. That’s STATE. GLOBALLY MUTABLE STATE.

                                The conceptual boundary is drawn differently here. In some sense, we look at the artefacts of realised derivations (i.e. completed “builds”) as a cache. The hashes you see in the /nix/store reference the inputs, not the outputs.

                                Also, nothing written to the store is mutated afterwards so for any given store there is mutability, but it is append-only.

                                As a side effect of making its package management system usable by normal users, it exposes the package manager database to corruption by any user mistake, curl2bash or malicious program on the system.

                                I’m not sure what is meant by this.

                                Edit 1: Ah, on the above this tweet adds some background (I think):

                                It doesn’t matter how PURE your definitions are; because the second some goddamn shell script does anything involving open(), you lose. The functional purity of your build is gone.

                                By default, Nix sandboxes builds which means that this is not a problem. Only explicitly declared dependencies are visible to a builder, and only the build directory and output path are writeable. Users can enable various footguns, such as opting out of sandboxing or whitelisting certain paths for passthrough.

                                1. 6

                                  By default, Nix sandboxes builds which means that this is not a problem.

                                  Only on linux, unfortunately. The author seems to be on mac, which is probably why they didn’t know about the sandboxing.

                                  1. 3

                                    It seems like sandboxing is available on Mac (thanks puck for pointing this out), but for users running Nix in single-user mode (which OP might be doing) there is currently some extra hoop-jumping required to make it work correctly.

                                    1. 1

                                      I was thinking of this line from https://nixos.org/nix/manual/:

                                      In addition, on Linux, builds run in private PID, mount, network, IPC and UTS namespaces to isolate them from other processes in the system (except that fixed-output derivations do not run in private network namespace to ensure they can access the network).

                                      It looks like on mac it’s just a chroot to hide store paths but you can still curl install.sh | bash in your build. I didn’t know it even had that much sandboxing on mac though, so thanks for pointing it.

                                  2. 4

                                    You know, the things that handle STATE, like FILES on the DISK. That’s STATE. GLOBALLY MUTABLE STATE.

                                    The conceptual boundary is drawn differently here. In some sense, we look at the artefacts of realised derivations (i.e. completed “builds”) as a cache. The hashes you see in the /nix/store reference the inputs, not the outputs.

                                    Also, nothing written to the store is mutated afterwards so for any given store there is mutability, but it is append-only.

                                    I really like this aspect of Nix: it’s like all packages exist in some platonic library of babel, and we copy a few of them into our /nix/store cache as they’re needed. This style of reasoning also fits with the language’s laziness, the design of nixpkgs (one huge set, whose contents are computed on-demand) and common patterns like taking fixed points to allow overrides (e.g. all of the function arguments called self).

                                    A similar idea applies to content-addressable storage like IPFS, which I’m still waiting to be usable with Nix :(

                                    1. 2

                                      Nix should adopt a more Bazel-like model, in which the Nix language is also used for describing the actual software builds rather than just wrapping external package managers.

                                      Would that involve “recursive Nix” to allow builders to use Nix themselves, in order to build sub-components?

                                      1. 3

                                        Recursive Nix is not necessary. For some languages this can already been done. E.g. the buildRustCrate function reimplements (most of) Cargo in Nix and does not use Cargo at all. This is in contrast to buildRustPackage, which relies on Cargo to do the builds.

                                        You can convert a Cargo.lock file to a Nix expression with e.g. crate2nix and build crates using buildRustCrate. This has the same benefits as Nix has for other derivations: each compiled crate gets its own store path, so builds are incremental, and crate dependencies with the same version/features can be shared between derivations.

                                        1. 2

                                          No, I’m not using recursive Nix for these. In my opinion (this might be controversial with some people) recursive Nix is a workaround for performance flaws of the current evaluator and I’d rather address those than add the massive amount of complexity required by recursive Nix.

                                          What’s potentially more important (especially for slow compilers like GHC or rustc) is content-addressed store paths, which allow for early build cutoff if two differing inputs (e.g. changes in comments or minor refactorings) yield the same artefact. Work is already underway towards that.

                                        2. 2

                                          Can you please edit documentation somewhere to note the existence of the prefetch commands and how to use them?

                                          Does that buildGo.nix thing support Go modules?

                                          1. 7

                                            Can you please edit documentation somewhere to note the existence of the prefetch commands and how to use them?

                                            nix-prefetch-url is part of Nix itself and is documented here, nix-prefetch-git etc. come from another package in nixpkgs and I don’t think there’s any docs for them right now.

                                            Nix has several large documentation issues and this being undocumented is a symptom of them. The two most important ones that I see are that the docs are written in an obscure format (DocBook) that is not conducive to a smooth writing flow and that the docs are an entirely separate tree in the nixpkgs repo, which means that it’s completely unclear where documentation for a given thing should go.

                                            The community disagrees on this to various degrees and there is an in-progress RFC (see here) to determine a different format, but that is only the first step in what is presumably going to be a long and slow improvement process.

                                            Does that buildGo.nix thing support Go modules?

                                            I’ve never used (and probably won’t use) Go modules, but I believe Go programs/libraries written with them have the same directory layout (i.e. are inspectable via go/build) which means they’re supported by buildGo.external.

                                            If your question is whether there’s a generator for translating the Go module definition files to Nix expressions, the answer is currently no (though there’s nothing preventing one from being written).

                                            1. 1

                                              Is there a way to get a hash of a file without making it available over HTTP?

                                              1. 6

                                                Yep!

                                                /tmp $ nix-store --add some-file 
                                                /nix/store/kwg265k8xn9lind6ix9ic22mc5hag78h-some-file
                                                

                                                For local files, you can also just refer to them by their local path (either absolute or relative) and Nix will copy them into the store as appropriate when the expression is evaluated, for example:

                                                { pkgs ? import <nixpkgs> {} }:
                                                
                                                pkgs.runCommand "example" {} ''
                                                  # Compute the SHA256 hash of the file "some-file" relative to where
                                                  # this expression is located.
                                                  ${pkgs.openssl}/bin/openssl dgst -sha256 ${./some-file} > $out
                                                ''
                                                

                                                Edit: Oh also, in case the question is “Can I get this hash without adding the file to the store?” - yes, the nix-hash utility (documented here) does that (and supports various different output formats for the hashes).

                                          2. 1

                                            For example, here’s the buildGo.nix packaging of golang.org/x/net (from here):

                                            Proxy error (the link obvisouly).

                                            Edit: Back up!

                                            1. 2

                                              Hah, sorry about that - I’m running that web UI on a preemptible GCP instance and usually nobody manages to catch an instance cycling moment :-)

                                            2. 1

                                              Oh, even better, the build directory isn’t writable.

                                              This isn’t true for Nix in general. The build directory is explicitly writable and output installation (into /nix/store) usually (in Nix’s standard environment) happens as one of the last steps of a build.

                                              It might be that this was a case of some language-specific tooling implementing such a restriction, in which case there’s probably also a flag for toggling it.

                                              It’s most likely caused by the derivation either trying to build inside a store path, e.g. cd "${src}" && build, or inside a copy of a store path (which preserves the read-only flags), e.g. cp -a "${src}" src && cd src && build. We can see if that’s the case by looking at the build script in the failing .drv file: they’re plain text files, although they’re horrible to read without a pretty-printer like pretty-derivation. This is probably quicker than trying to get hold of and inspecting the failing derivation in nix repl, since it may be buried a few layers deep in dependencies.

                                              I actually make this mistake a lot when writing build scripts; I usually solve it by putting chmod +w -R after the copy. If someone else has written/generated the build script it may be harder to override; although in that case it would presumably break for every input, so I’d guess the author might be calling it wrong (AKA poor documentation, which is unfortunately common with Nix :( )

                                              It might be a symptom of the Koolaid, but I find this a feature rather than a bug: Nix keeps each download pristine, and forces my fiddling to be done on a copy; although the need to chmod afterwards is admittedly annoying.

                                            1. 4

                                              I’ve found that the Nixpkgs approach to Haskell packages (choosing a subset of Hackage packages, picking one version of each which mostly work together) can often require a lot of overriding. Even worse, the error messages only appear after (re)compiling a bunch of dependencies, making each iteration painfully slow.

                                              IOHK’s Haskell infrastructure takes the opposite approach: running cabal or stack to solve the dependencies, then fetching and building whichever versions were chosen (unless they’re already in /nix/store, of course). I’ve found this works quite well: it requires a larger up-front build time, but is more reliable so doesn’t need hand-holding.

                                              1. 1

                                                I haven’t seen this before. Cool. Starred it and scheduled for a read on the commute.

                                              1. 1

                                                I’ve been using XMPP for years; I’ve not heard of any of these clients other than Gajim, so it’s nice to see development continuing.

                                                Back in the day I used XMPP for microblogging, via the XMPP gateway on identi.ca. I also used gateways to Yahoo messenger and MSN messenger to talk to friends, before everyone switched to Facebook/whatever. It would be interesting to know if there are any stable gateways to such protocols running (bonus points if they don’t require signing up to Facebook/whatever).

                                                After identi.ca turned off their XMPP gateway, I wrote a simple Python script to do the same thing on my own Web site (minus the social aspect, of course). It was nice to throw random thoughts into a Pidgin window and have them archived on my site. I added a stripped-down version of this to a repo of small but useful Python programs for learners (in the ‘Microblogger’ directory).

                                                I also made a pub/sub library, and couple of XMPP service discovery clients as a way to learn GTK+ and Qt (I can’t find their code at the moment). Probably wildly out-of-date with current best-practices now ;)

                                                1. 4

                                                  This is a case of improper data modeling, but the static type system is not at fault—it has simply been misused.

                                                  The static type system is never at fault, it behaves just like the programmer tells it to. But this is kind of handwaving over the very point this article attempts to address. This particular case of “improper data modeling” would never be a problem on dynamically-typed systems.

                                                  Bad analogy time: it is pretty much like advocating the use of anabolic steroids, because they make you so much stronger, but when the undesired side effects kick in, you blame the hormonal system for not keeping things in balance.

                                                  1. 9

                                                    Bad analogy time: it is pretty much like advocating the use of anabolic steroids, because they make you so much stronger, but when the undesired side effects kick in, you blame the hormonal system for not keeping things in balance.

                                                    To me it feels like that’s exactly what proponents of dynamic typing often do “I can write all code super fast” and then when people say there’s issues when it is accidentally misused (by another programmer or the same programmer, in the future) it is always “you should’ve used more hammock time to think about your problem real hard” or “you should’ve written more tests to make sure it properly handles invalid inputs”.

                                                    1. 5

                                                      You are not wrong and this is just a proof that the debate around type systems is still too immature. There is certainly a component of dynamism in every computer system that programmers crave, and it usually lives out of bounds of the language environment, on the operating system level. Dynamically typed languages claim to offer that dynamism inside their own environment, but most of the programs don’t take advantage of that.

                                                      There is no known definitive argument on either side that would definitively bury its respective contender. Programmers sometimes seem too afraid of some kind of Tower of Babel effect that would ruin the progress of Computer Science and I believe that the whole debate around static and dynamic type systems is just a reflex of that.

                                                    2. 2

                                                      This particular case of “improper data modeling” would never be a problem on dynamically-typed systems.

                                                      I think this is addressed in the appendix about structural vs nominal typing. In particular, very dynamic languages like Python and Smalltalk still allow us to do such “improper data modelling”, e.g. by defining/using a bunch of classes which are inappropriate for the data. Even if we stick to dumb maps/arrays, we can still hit essentially the same issues once we get a few functions deep (e.g. if we’ve extracted something from our data into a list, and it turns out we need a map, which brings up questions about whether there’ll be duplicates and how to handle them).

                                                      Alternatively, given the examples referenced by the author (in the linked “parse, don’t validate” post) it’s reasonable to consider all data modelling in dynamically-typed systems to be improper. This sounds a bit inflammatory, but it’s based on a core principle of the way dynamically-typed languages frame things: they avoid type errors in principle by forcing all code to “work” for all values, and shoehorning most of those branches into a sub-set of “error” or “exceptional” values. In practice this doesn’t prevent developers having to handle type errors; they just get handled with branching like any other value (with no compiler to guide us!). Likewise all dynamic code can model all data “properly”, but lots of code will model lots of data by producing error/exceptional values; that’s arguably “proper” since, after all, everything in a dynamic system might be an error/exception at any time.

                                                      Side note: when comparing static and dynamic languages, it’s important to remember that using “exceptions” for errors is purely a convention; from a language/technology standpoint, they’re just normal values like anything else. We can assign exceptions to variables, make lists of exceptions, return exceptions from functions, etc. it’s just quite uncommon to see. Likewise “throwing” and “catching” is just a control-flow mechanism for passing around values; it doesn’t have anything to do with exception values or error handling, except by convention. I notice that running raise 42 in Python gives me TypeError: exceptions must derive from BaseException, which doesn’t seem very dynamic/Pythonic/duck-typical; yet even this “error” is just another value I can assign to a variable and carry on computing with!

                                                      1. 1

                                                        The point I was trying to make is that, in the example mentioned in the article, the reason why the type description was inaccurate at first has only to do with the fact that the programmer must provide the checker information about UserId‘s subtype. On a dynamically-typed system, as long as the JSON type supports Eq, FromJSON and ToJSON, you are fine, and having to accurately determine UserId‘s subtype would never be a problem.

                                                        So I do understand the appeal of static typing in building units, but not systems, especially distributed ones, and this is why I believe the article is myopic, but dynamic language advocates do a terrible job in defending themselves. Having to process JSON payloads is the least of the problems if you are dealing with distributed systems; how would you accurately type check across independent snippets of code in a geographically-distributed network over which you have no control is a much more interesting problem.

                                                        1. 2

                                                          On a dynamically-typed system, as long as the JSON type supports Eq, FromJSON and ToJSON, you are fine, and having to accurately determine UserId‘s subtype would never be a problem.

                                                          That’s not true. At some point your dynamically typed system will make an assumption about the type of value (the UserId in this case) that you’re applying some function to.

                                                          1. 1

                                                            For practical purposes, it is true. The system internals indeed need to resolve the dependency on that interface, indeed, either with fancy resolution mechanisms or attempting to call the function in a shoot-from-the-hip fashion. But it is not common between dynamic language implementations that the programmer needs to specify the type, so it is not a problem.

                                                    1. 1

                                                      Mine’s at http://chriswarbo.net

                                                      It’s been through a few iterations:

                                                      • I started with blogspot, as a quick way to get things up
                                                      • A few years later I wanted to self-host, so I got a domain, an EC2 instance and copied the content into a CMS
                                                      • After a few more years I was fed up with maintaining the CMS’s permissions, updates, PHP updates, etc. so I turned it all into Markdown and used a static site generator (Hakyll)
                                                      • I eventually found myself working around Hakyll too much, so I replaced it with Pandoc and a Makefile
                                                      • When my Makefile became unwieldy I switched to using Nix

                                                      Since I mostly wirte programming things, I made a couple of Pandoc “filters” for executing arbitrary programs during rendering. This is useful for keeping things “honest”, e.g. checking that the code in a blog post actually runs, and splicing the real output into the document. Running things via Nix has meant that old posts are still reproducible, and I don’t need to care whether something’s installed globally on my system (e.g. I wrote a bunch of posts about PHP, which still execute properly in their Nix sandbox, despite me not having/using PHP for 5 years).

                                                      I’m currently updating those scripts (and hence the site) to use the latest Pandoc (from 1.x to 2.x).

                                                      I’ve been told that LightSail would be a cheaper option than EC2, whilst still giving me a machine with SSH access for the few other things I use it for (unlike, e.g. S3). I still haven’t gotten around to switching it over.

                                                      I am interested in hosting things using IPFS or similar, but their mutability options aren’t great (e.g. IPNS) and I found that the default Go implementation would eat all of my RAM :(

                                                      1. 2

                                                        (I don’t use proprietary GitHub things like “bots” and “pull requests”, so I’ll just talk about the “conventional commits” stuff in general)

                                                        I like the idea of standardising “unimportant” things like commit message format, since it’s one less thing to I have to think about right now, but may help me out in the future. Tooling is the best way to standardise these things, since following the standard becomes the path of least resistance (compared to, e.g. forcibly skipping git hooks).

                                                        One thing I don’t see from clicking around the links is how I’d run this “commitizen” stuff from Magit; the use of an interactive CLI (rather than, say, stdio, env vars or arguments) makes me think it would be tricky.

                                                        Incidentally, I was thinking earlier today how I’m terrible at keeping changelogs and if there’s a simple way to enforce it. This would do the job nicely!

                                                        1. 1

                                                          Obligatory plug for ASV as a benchmark runner, tracker and regression-finder; and my Nix plugin which generalises it to non-Python projects :)

                                                          1. 6

                                                            I have also thought about this from another perspective. If you have an application you make for yourself, it can be worth it to improve the performance. If you do this, and it costs you 10 minutes, but it saves you 10 seconds every time you use the program, you need to use it 60 times before it pays off.

                                                            On the other hand, if you have a site or program that is intensively used by a lot of people, it becomes more attractive to improve performance. There are 500 million tweets that are sent per day. Twitter is horribly slow to load on my phone, even over a perfect connection (a tweet takes about 10 seconds to load, while, for example, wikipedia pages load in less than 1). Every second that twitter is loading costs humanity at least 550000000/60/60/24/365=17.44 years per day. The actual number is probably way, way higher, since most tweets are read multiple times.

                                                            Now, I don’t claim that this is anywhere near as important as saving lives, but it’s still a baffling figure. If you work on an application that is used repeatedly by even some users, it’s often worth it to improve the performance (of course, it’s often not worth it to optimize it if the delay isn’t noticeable or if there are other steps in the process which take way longer). If you think about things this way it’s ridiculous how much time is wasted by not optimizing performance in some systems.

                                                            1. 1

                                                              I have also thought about this from another perspective. If you have an application you make for yourself, it can be worth it to improve the performance. If you do this, and it costs you 10 minutes, but it saves you 10 seconds every time you use the program, you need to use it 60 times before it pays off.

                                                              In this situation performance is also important, but what is usually more important is that the task is automated and that you can free up a human for other things.

                                                              I usually stick to the rule of thumb that I do things manually the first three times, but if it pops up a fourth time, I start automating it. But if I get that far, I usually also start optimizing things, because I will usually reach the said 60 times without question.

                                                              1. 3

                                                                In this situation performance is also important, but what is usually more important is that the task is automated and that you can free up a human for other things.

                                                                Two other bonuses:

                                                                • We can do a bunch of sanity checks, etc. in the automated solution, which would seem like too much hassle when doing it manually. Counterpoint: automated scripts might do stupid things that a person wouldn’t, e.g. rm -rf $DIR/* when $DIR hasn’t been set.
                                                                • Once something’s automated, we can work at a higher level of abstraction, e.g. using a script can ensure that generated files follow some particular naming pattern and directory structure; that makes it easier to make tools for browsing or querying those files. Counterpoint: easy to end up with a house of cards, which does bad things if some underlying job breaks.
                                                                1. 2

                                                                  Counterpoint: easy to end up with a house of cards, which does bad things if some underlying job breaks.

                                                                  Another rule of thumb: Only use shell scripting at the top level of your processing (for example to start things up), or use shell scripting all the way down to the one but lowest level like Slackware does. My present self regularly thanks my past self for sticking to this.

                                                            1. 3

                                                              Git-for-Windows has a nice checkbox in its installer to install “common unix tools” in the PATH. It has a nice big warning that it’ll overwrite some existing command names (that no one has ever used in Win land anyway) in cmd, so I think many don’t choose it. But you get a lot of bang for the buck with just one installer (meaning in your case, just one ticket) that way, you get sh and bash and ls and rm and whereis and quite the number of additional quite-common commands. Very base tools, doesn’t go all the way to top or ed etc. It’s not all-in-one like Busybox but for your situation pretty nice bang for the buck so you won’t lose much time initially figuring out cmds or powershells own peculiarities if you tend to be efficient doing stuff in a terminal, I mean “command prompt”.

                                                              Or get a ticket for installing an ssh client (“putty” is one that I know of in Win land) and a cheap Linux box online.

                                                              Is this a dev role? Even big-corps very often tend to give devs admin-powers on their own machines, and try to ensure some security via network rules/config.

                                                              If on Win7, they’re bound to migrate to 10 before long. Some reason or other will eventually pressure towards this. That gives a possibility of WSL down the road eventually.

                                                              But by then you might already be highly productive in your env — most of this is formed-habits after all. I say this as an at-home Linux user and freelancer who at current client corp also needs to use Win10 and has formed muscle memories from earlier days of a Win-only existence. Before Win8 it wasn’t all that bad anyway. Happily hacked in go/node/python in Sublime in it back in the day. Of course the terminals-and-shells situation was and is atrocious.

                                                              Get some sort of (lean & mean, non-Enterprise-bloated) language going on there that will let you easily and joyously recreate any programs or tooling missing over the weeks and months: sounds like a place where there will be much X waiting on Y waiting on Z anyway, and that’s one way to stay sane and not feel drained at the end of the day. Plus gives a nice old-school computing retro feeling of “my machine can do nothing so I can do anything”.

                                                              1. 5

                                                                Or get a ticket for installing an ssh client (“putty” is one that I know of in Win land) and a cheap Linux box online.

                                                                Taking company data to a cheap Linux box online is likely to get you fired.

                                                                1. 2

                                                                  Very quickly.

                                                                2. 3

                                                                  Even big-corps very often tend to give devs admin-powers on their own machines

                                                                  Think bigger. Think “need to ask the second in command in the org for permission to have your IO ports turned on.” Think “even the fourth in command visits your campus maybe once or twice a year.”

                                                                  No, they will not have local admin. I think I shall have to embrace the Windows somehow.

                                                                  1. 1

                                                                    That is not about embracing Windows. Its about educating superiors that local administration is not a wish but a must, and that security can be done correctly with some people having local admin rights. I am not sure what kind of job you do, but any serious development without local administration is simply impossible without it on Windows.

                                                                    1. 2

                                                                      Most large corporations are OK with sacrificing 90% productivity in the name of conformance with the company processes. It’s OK if you spend a month doing nothing while waiting for something essential to be installed. It’s not uncommon to meet people that have spent 18 months doing nothing due to an organizational glitch and it wasn’t their making or fault at all. With their sheer size, connections and monopolies, they have access to such ludicrous deals that they can afford to hire 10 people instead of 1. So, your “impossible” (that rounds 0.1 down to 0) is “good enough” for them.

                                                                  2. 3

                                                                    Or get a ticket for installing an ssh client (“putty” is one that I know of in Win land) and a cheap Linux box online.

                                                                    Don’t send things outside the company intranet/firewall.

                                                                    I worked somewhere with Windows desktops, but Linux servers for Web stuff (separate VMs for dev, staging and production). The dev VMs were meant for testing, but I did all my development on them too. As a bonus, everything I was using (usually just Emacs) would stay running when my desktop machine powered off each evening (thanks to GNU screen).

                                                                    I actually had local admin access, so I installed cygwin and ran X full screen over SSH. If you can’t do that, a terminal might be friendly enough.

                                                                    1. 1

                                                                      Get some sort of (lean & mean, non-Enterprise-bloated) language going on there that will let you easily and joyously recreate any programs or tooling missing over the weeks and months

                                                                      Reminds me of this talk which I really enjoyed. As for the language I suggest Racket, with comes with a massive stdlib including a GUI library, and offline docs to all of it.

                                                                    1. 14

                                                                      great talk just one note

                                                                      Because, barring magic, you cannot send a function.

                                                                      This is trivial in erlang, and even between nodes in a cluster, and commonly used, not some obscure language feature. So there we go I officially declare Erlang as Dark Magic.

                                                                      1. 4

                                                                        I suppose it’s not that “you cannot send a function”, but more like “you cannot send a closure, if the language allows references which may not resolve from other machines”. Common examples are mutable data (if we want both parties to see all edits), pointers, file handles, process handles, etc.

                                                                        I’m not too familiar with Erlang, but I imagine many of these can be made resolvable if we’re able to forward requests to the originating machine.

                                                                        1. 7

                                                                          It’s possible to implement this in Haskell, with precise control over how the closure gets de/serialized and what kinds of things can enter it. See the transient package for an example. This task is a great example of the things you can easily implement in a pure language, but very dangerous in impure ones.

                                                                        2. 1

                                                                          I don’t know Erlang, but… I can speculate that it doesn’t actually send the functions. It sends their bytecode representation. Or a pointer to the relevant address, if the two computers are guaranteed to share code. I mean, the function has to be transformed into a piece of data somehow.

                                                                          1. 13

                                                                            it doesn’t actually send the functions. It sends their bytecode representation

                                                                            How is that different from not actually sending an integer, but sending its representation?

                                                                            In Erlang any term can be serialized, including functions, so sending a function to another process/node isn’t different from sending any other term. The nodes don’t need to share code.

                                                                            1> term_to_binary(fun (X) -> X + 1 end).
                                                                            <<131,112,0,0,2,241,1,174,37,189,114,105,121,227,76,88,
                                                                              139,139,101,146,181,186,175,0,0,0,7,0,0,...>>
                                                                            
                                                                            1. 1

                                                                              Would that also work if the function contains free variables?

                                                                              That is what’s the result of calling this function:

                                                                              fun(Y) -> term_to_binary(fun (X) -> X + Y end) end.
                                                                              

                                                                              (Sorry for the pseudo-Erlang)

                                                                              1. 5

                                                                                Yep!

                                                                                1> F1 = fun(Y) -> term_to_binary(fun (X) -> X + Y end) end.
                                                                                #Fun<erl_eval.7.91303403>
                                                                                2> F2 = binary_to_term(F1(2)).
                                                                                #Fun<erl_eval.7.91303403>
                                                                                3> F2(3).
                                                                                5
                                                                                

                                                                                … or even

                                                                                1> SerializedF1 = term_to_binary(fun(Y) -> term_to_binary(fun (X) -> X + Y end) end).
                                                                                <<131,112,0,0,3,96,1,174,37,189,114,105,121,227,76,88,139,
                                                                                  139,101,146,181,186,175,0,0,0,7,0,0,...>>
                                                                                2> F1 = binary_to_term(SerializedF1).                                           #Fun<erl_eval.7.91303403>
                                                                                3> F2 = binary_to_term(F1(2)).                                                  #Fun<erl_eval.7.91303403>
                                                                                4> F2(3).                                                                       5
                                                                                

                                                                                The format is documented here: http://erlang.org/doc/apps/erts/erl_ext_dist.html

                                                                              2. 0

                                                                                How is that different from not actually sending an integer

                                                                                Okay, it’s not. It’s just much more complicated. Sending an integer? Trivial. Sending a plain old data structure? Easy. Sending a whole graph? Doable. Sending code? Scary.

                                                                                Sure, if you have a bytecode compiler and eval, sending functions is a piece of cake. Good luck doing that however without explicit language support. In C for instance.

                                                                                1. 6

                                                                                  You can do it, for instance, by sending a DLL file over a TCP connection and linking it into the application receiving it. It’s harder, it’s flakier, it’s platform-dependent, and it’s the sort of thing anyone sane will look at and say “okay but why though”. It’s just that Erlang is designed to be able to do that sort of thing easily, and accepts the tradeoffs necessary, and C/C++ are not.

                                                                                  1. 3

                                                                                    The Morris Worm of 1988 used a similar method to infect new hosts.

                                                                          1. 1

                                                                            My first thought was static linking, which the authors mention; but then my next thought was build systems like Nix where every binary is hard-coded to its particular dependencies. Is the sharing on a NixOS system still sufficient to merit not statically linking (given the arguments in this paper)?

                                                                            I appreciate that this is a separate issue to the interpreter hardening.

                                                                            1. 1

                                                                              Very nice. Is the (b ~ Double) trick needed because () gets picked by defaulting rules; and if so, is this only a problem in GHCi (which does more defaulting) or would it also affect .hs files given to GHC?

                                                                              Nitpick: I would define KB as 1000, MB as 1000000, KiB as 1024 and MiB as 1024*1024. I would definitely avoid the names KBi and MBi, which look like typos for the latter ;)

                                                                              1. 2

                                                                                Very nice. Is the (b ~ Double) trick needed because () gets picked by defaulting rules

                                                                                No. I described in the blog post the transition steps:

                                                                                • Bytes -> Double
                                                                                • Bytes -> b
                                                                                • (b ~ Double) => Bytes -> b

                                                                                It’s needed because otherwise type inference doesn’t work. The defaulting is a different error/issue that is an artifact of GHCi trying to help the inferencer, but it doesn’t work if you use it outside of that context.

                                                                                You need this trick to make the syntactic pattern work with type inference at all, it’s not particular to my demonstration using GHCi. I wouldn’t go to the effort of writing a blog post about it otherwise. This demonstration/trick links up with a related issue in other languages where you can’t really have return type polymorphism. You’re absolutely permitted this in Haskell but it’s easy to break inference if you don’t use type families, functional dependencies, associated types, or this trick. What all of these techniques have in common is that they let you create a relationship or equality between types that helps type inference. Haskell has no reason to assume that just because you used a numeric literal as a function that you mean the particular instance Bytes -> Double. The argument type of a function does not imply the return type, it could be anything.

                                                                                This article about functional dependencies describes the problem and motivates a particular solution: https://wiki.haskell.org/Functional_dependencies

                                                                                Nitpick: I would define KB as 1000, MB as 1000000, KiB as 1024 and MiB as 1024*1024. I would definitely avoid the names KBi and MBi, which look like typos for the latter ;)

                                                                                Seems reasonable. I’m not trying to publish a library here so I’m not going to change it, but anybody that feels inspired to do so should probably follow your suggestion here.

                                                                              1. 7

                                                                                See also https://github.com/zimbatm/shab which is a templating language written in 6 lines of bash. It just wraps the template in a heredoc, turning the language inside out. I am very proud of that hack :p

                                                                                1. 2

                                                                                  I also went with heredocs initially, problem is they’re really slow https://www.oilshell.org/blog/2016/10/18.html.

                                                                                  1. 3

                                                                                    That overhead is not really noticeable in my experience. In my mind shab is mostly useful to generate a config file from a few variables. Hopefully it will never be used to render a web page or anything big.

                                                                                    Imagine a context like a docker container where bash is already available. Just add 6 lines and you have a templating language.

                                                                                  2. 1

                                                                                    Heh, that looks scarily handy. I especially like the Nix example, where the template syntax needs escaping, to prevent Nix doing the same job without the need for shab at all :P.

                                                                                    1. 1

                                                                                      hehe yeah. I wish it was possible to resume Nix evaluation so it could also be used as a templating language when variables are resolved later at runtime.