1. 20

    The “lacks” of Go in the article are highly opinionated and without any context of what you’re pretending to solve with the language.

    Garbage collection is something bad? Can’t disagree harder.

    The article ends with a bunch of extreme opinions like “Rust will be better than Go in every possible task

    There’re use cases for Go, use cases for Rust, for both, and for none of them. Just pick the right tool for your job and stop bragging about yours.

    You love Rust, we get it.

    1. 2

      Yes, I would argue GC is something that’s inherently bad in this context. Actually, I’d go as far as to say that a GC is bad for any statically typed language. And Go is, essentially, statically typed.

      It’s inherently bad since GC dictates the lack of destruction mechanisms that can be reliably used when no reference to the resource are left. In other words, you can’t have basic features like the C++ file streams that “close themselves” at the end of the scope, then they are destroyed.

      That’s why Go has the “defer” statement, it’s there because of the GC. Otherwise, destructors could be used to defer cleanup tasks at the end of a scope.

      So that’s what makes a GC inherently bad.

      A GC, however, is also bad because it “implies” the language doesn’t have good resource management mechanisms.

      There was an article posted here, about how Rust essentially has a “static GC”, since manual deallocation is almost never needed. Same goes with well written C++, it behaves just like a garbage collected language, no manual deallocation required, all of it is figured out at compile time based on your code.

      So, essentially, a GC does what language like C++ and Rust do at compile time… but it does it at runtime. Isn’t this inherently bad ? Doing something that can be done at CT during runtime ? It’s bad from a performance perspective and also bad from a code validation perspective. And it has essentially no upsides, as far as I’ve been able to tell.

      As far as I can tell the main “support” for GC is that they’ve always been used. But that doesn’t automatically make them good. GCs seem to be closer to a hack for a language to be easier to implement rather than a feature for a user of the language.

      Feel free to convince me otherwise.

      1. 11

        It’s inherently bad since GC dictates the lack of destruction mechanisms that can be reliably used when no reference to the resource are left.

        Why do you think this would be the case? A language with GC can also have linear or affine types for enforcing that resources are always freed and not used after they’re freed. Most languages don’t go this route because they prefer to spend their complexity budgets elsewhere and defer/try-with-resources work well in practice, but it’s certainly possible. See ATS for an example. You can also use rank-N types to a similar effect, although you are limited to a stack discipline which is not the case with linear/affine types.

        So, essentially, a GC does what language like C++ and Rust do at compile time… but it does it at runtime. Isn’t this inherently bad ?

        No, not necessarily. Garbage collectors can move and compact data for better cache locality and elimination of fragmentation concerns. They also allow for much faster allocation than in a language where you’re calling the equivalent of malloc under the hood for anything that doesn’t follow a clean stack discipline. Reclamation of short-lived data is also essentially free with a generational collector. There are also garbage collectors with hard bounds on pause times which is not the case in C++ where a chain of frees can take an arbitrary amount of time.

        Beyond all of this, garbage collection allows for a language that is both simpler and more expressive. Certain idioms that can be awkward to express in Rust are quite easy in a language with garbage collection precisely because you do not need to explain to the compiler how memory will be managed. Pervasive use of persistent data structures also becomes a viable option when you have a GC that allows for effortless and efficient sharing.

        In short, garbage collection is more flexible than Rust-style memory management, can have great performance (especially for functional languages that perform a lot of small allocations), and does not preclude use of linear or affine types for managing resources. GC is hardly a hack, and its popularity is the result of a number of advantages over the alternatives for common use cases.

        1. 1

          What idioms are unavailable in Rust or in modern C++, because of their lack of GC, but are available in a statically typed GC language ?

          I perfectly agree with GC allowing for more flexibility and more concise code as far as dynamic language go, but that’s neither here nor there.

          As for the theoretical performance benefits and real-time capabilities of a GCed language… I think the word theoretical is what I’d focus my counter upon there, because they don’t actually exist. The GC overhead is too big, in practice, to make those benefits outshine languages without runtime memory management logic.

          1. 9

            I’m not sure about C++, but there are functions you can write in OCaml and Haskell (both statically typed) that cannot be written in Rust because they abstract over what is captured by the closure, and Rust makes these things explicit.

            The idea that all memory should be explicitly tracked and accounted for in the semantics of the language is perhaps important for a systems language, but to say that it should be true for all statically typed languages is preposterous. Languages should have the semantics that make sense for the language. Saying a priori that all languages must account for some particular feature just seems like a failure of the imagination. If it makes sense for the semantics to include explicit control over memory, then include it. If it makes sense for this not to be part of the semantics (and for a GC to be used so that the implementation of the language does not consume infinite memory), this is also a perfectly sensible decision.

            1. 2

              there are functions you can write in OCaml and Haskell (both statically typed) that cannot be written in Rust because they abstract over what is captured by the closure

              Could you give me an example of this ?

              1. 8

                As far as I understand and have been told by people who understand Rust quite a bit better than me, it’s not possible to re-implement this code in Rust (if it is, I would be curious to see the implementation!)

                https://gist.github.com/dbp/0c92ca0b4a235cae2f7e26abc14e29fe

                Note that the polymorphic variables (a, b, c) get instantiated with different closures in different ways, depending on what the format string is, so giving a type to them is problematic because Rust is explicit about typing closures (they have to talk about lifetimes, etc).

                1. 2

                  My God, that is some of the most opaque code I’ve ever seen. If it’s true Rust can’t express the same thing, then maybe it’s for the best.

                  1. 2

                    If you want to understand it (not sure if you do!), the approach is described in this paper: http://www.brics.dk/RS/98/12/BRICS-RS-98-12.pdf

                    And probably the reason why it seems so complex is because CPS (continuation-passing style) is, in general, quite hard to wrap your head around.

                    I do think that the restrictions present in this example will show up in simpler examples (anywhere where you are trying to quantify over different functions with sufficiently different memory usage, but the same type in a GC’d functional language), this is just a particular thing that I have on hand because I thought it would work in Rust but doesn’t seem to.

                    1. 2

                      FWIW, I spent ~10 minutes trying to convert your example to Rust. I ultimately failed, but I’m not sure if it’s an actual language limitation or not. In particular, you can write closure types in Rust with 'static bounds which will ensure that the closure’s environment never borrows anything that has a lifetime shorter than the lifetime of the program. For example, Box<FnOnce(String) + 'static> is one such type.

                      So what I mean to say is that I failed, but I’m not sure if it’s because I couldn’t wrap my head around your code in a few minutes or if there is some limitation of Rust that prevents it. I don’t think I buy your explanation, because you should technically be able to work around that by simply forbidding borrows in your closure’s environment. The actual thing where I got really hung up on was the automatic currying that Haskell has. In theory, that shouldn’t be a blocker because you can just introduce new closures, but I couldn’t make everything line up.

                      N.B. I attempted to get any Rust program working. There is probably the separate question of whether it’s a roughly equivalent program in terms of performance characteristics. It’s been a long time since I wrote Haskell in anger, so it’s hard for me to predict what kind of copying and/or heap allocations are present in the Haskell program. The Rust program I started to write did require heap allocating some of the closures.

        2. 5

          It’s inherently bad since GC dictates the lack of destruction mechanisms that can be reliably used when no reference to the resource are left. In other words, you can’t have basic features like the C++ file streams that “close themselves” at the end of the scope, then they are destroyed.

          Deterministic freeing of resources is not mutually exclusive with all forms of garbage collection. In fact, this is shown by Rust, where reference counting (Rc) does not exclude Drop. Of course, Drop may never be called when you create cycles.

          (Unless you do not count reference counting as a form of garbage collection.)

          1. 2

            Well… I don’t count shared pointers (or RC pointers or w/e you wish to call them) as garbage collected.

            If, in your vocabulary, that is garbage collection then I guess my argument would be against the “JVM style” GC where the moment of destruction can’t be determined at compile time.

            1. 8

              If, in your vocabulary, that is garbage collection

              Reference counting is generally agreed to be a form of garbage collection.

              I guess my argument would be against the “JVM style” GC where the moment of destruction can’t be determined at compile time.

              In Rc or shared_ptr, the moment of the object’s destruction can also not be determined at compile time. Only the destruction of the Rc itself; put differently the reference count decrement can be determined at compile time.

              I think your argument is against tracing garbage collectors. I agree that the lack of deterministic destruction is a large shortcoming of languages with tracing GCs. It effectively brings back a parallel to manual memory management through the backdoor — it requires manual resource management. You don’t have to convince me :). I once wrote a binding to Tensorflow for Go. Since Tensorflow wants memory aligned on 32-byte boundaries on amd64 and Go allocates (IIRC) on 16-byte boundaries, you have to allocate memory in C-land. However, since finalizers are not guaranteed to run, you end up managing memory objects with Close() functions. This was one of the reasons I rewrote some fairly large Tensorflow projects in Rust.

              1. 2

                However, since finalizers are not guaranteed to run, you end up managing memory objects with Close() functions.

                Hmm. This seems a bit odd to me. As I understand it, Go code that binds to C libraries tend to use finalizers to free memory allocated by C. Despite the lack of a guarantee around finalizers, I think this has worked well enough in practice. What caused it to not work well in the Tensorflow environment?

                1. 3

                  When doing prediction, you typically allocate large tensors relatively rapidly in succession. Since the wrapping Go objects are very small, the garbage collector kicks in relatively infrequently, while you are filling memory in C-land. There are definitely workarounds to put bounds on memory use, e.g. by using an object pool. But I realized that what I really want is just deterministic destruction ;). But that may be my C++ background.

                  I have rewritten all that code probably around the 1.6-1.7 time frame, so maybe things have improved. Ideally, you’d be able to hint the Go GC about the actual object sizes including C-allocated objects. Some runtimes provide support for tracking C objects. E.g. SICStus Prolog has its own malloc that counts allocations in C-land towards the SICStus heap (SICStus Prolog can raise a recoverable exception when you use up your heap).

                  1. 1

                    Interesting! Thanks for elaborating on that.

              2. 3

                So Python, Swift, Nim, and others all have RC memory management … according to you these are not GC languages?

            2. 5

              One benefit of GC is that the language can be way simpler than a language with manual memory management (either explicitly like in C/C++ or implicitly like in Rust).

              This simplicity then can either be preserved, keeping the language simple, or spent on other worthwhile things that require complexity.

              I agree that Go is bad, Rust is good, but let’s be honest, Rust is approaching a C++-level of complexity very rapidly as it keeps adding features with almost every release.

              1. 1

                you can’t have basic features like the C++ file streams that “close themselves” at the end of the scope, then they are destroyed.

                That is a terrible point. The result of closing the file stream should always be checked and reported or you will have buggy code that can’t handle edge cases.

                1. 0

                  You can turn off garbage collection in Go and manage memory manually, if you want.

                  It’s impractical, but possible.

                  1. 2

                    Is this actually used with any production code ? To my knowledge it was meant to be more of a feature for debugging and language developers. Rather than a true GC-less option, like the one a language like D provides.

                    1. 1

                      Here is a shocking fact: For those of us who write programs in Go, the garbage collector is actually a wanted feature.

                      If you work on something where having a GC is a real problem, use another language.

              1. 3

                That was a bit disappointing. It became very vague just as it was getting interesting.

                1. 2

                  It’s an abstract for a workshop talk of work that is quite early. So, perhaps it shouldn’t have been shared at this point (but on the other hand, we believe in openness…). Regardless, it’s certainly not finished work, and if it sounds somewhat interesting, that’s basically the goal, as it is just a workshop talk!

                  1. 1

                    I saved the paper to check on them later. I found it irritating, though. Like CompSci vaporware.

                  1. 2

                    Possibly relevant is this (old, but accomplishing what you describe; not sure about how costly though): https://lobste.rs/s/kun536/radio_e_mail_west_africa_2002

                    1. 1

                      Just read through this from another post above, awesome implementation and really good detail, taking notes down and see if I can get my hand on one of these modems (as well as seeing if using it where I am violates any laws)…

                    1. 2

                      This seems very relevant to the thread posted just the other day: https://lobste.rs/s/cpbngl/remote_data_access_is_hard_need_lobste_rs , which is asking for essentially a solution to the same problem, >15yrs later!

                      I wonder if those modems still exist!

                      1. 2

                        Your local second-hand store is probably the best place to begin searching for a modem! I know mine is for sure.

                      1. 2

                        Do people know of any fully abstract compiler? The only one I know of is Microsoft Research’s Fully Abstract Compilation to JavaScript.

                        They wrote a compiler from F* to JavaScript and proved full abstraction in Coq. It is kind of amazing, considering that full abstraction means it is secure and immune from changes to Object.prototype, etc.

                        1. 6

                          Just a minor correction – that paper is for a language called “f*” (a little f, not a big F), which is a much much smaller language (no dependent types, for example) than F*. It’s an unfortunately close name, as it can be confusing!

                          Are you looking for industrial / “large” compilers? The answer to that is no (or, not yet… there didn’t used to be any realistic correct compilers either, so…).

                          As yet, most of the fully abstract compilers that exist are for small languages, in addition to the f*->JS one you mentioned, Devriese et al built one from the simply typed lambda calculus with recursion to the untyped lambda calculus (pdf), and people in my group have built a couple – e.g., from a lambda calculus with recursive types to a language with exceptions (pdf), from a language with information flow properties into F omega (pdf), etc. And of course there are probably others that I don’t know about!

                        1. 5

                          One thing I never understood (after hearing this sequence countless times) is why it was audible. Once the handshake was completed, the speaker is turned off. I guess if things were going wrong, I could kind of figure out (as it didn’t sound normal), but presumably the modem would know that anyway and would give that feedback (I can’t remember that well, but when I had an external modem, it definitely had lights that let you know what was going on, and when the modems were internal, presumably software could do that).

                          1. 6

                            Well, it was pretty fast to hear if the handshake was, say, hitting the wrong number or if your own line had somebody talking on it.

                            Would not surprise me if the real reason was just as a marketing gimmick or similar.

                            1. 5

                              Here’s a StackOverflow answer that makes sense. Corroborates friendlysock’s prediction, too.

                            1. 21

                              Fake news has nothing to do with “sounding journalistic”. This is how one ends up confusing propaganda (fake news) with truth, which is at the heart of a lot of journalism today.

                              The only thing such software will accomplish is further reinforce and convince the gullible that if someone sounds like a journalist then what they have to say must be true.

                              1. 13

                                Yeah, this is an incredibly terrible idea.

                                It’s akin to other (terrible) ideas where people think they can solve (really hard) societal problems with (insert hot technical solution), e.g., using algorithms to “certify” that certain algorithms are fair according to either naive or just plain wrong understanding of what fairness is (like, assigning money bail amounts based on an algorithm that is “fair” according to some specification: for example, that it’ll produce the same result whether or not race is an input, ignoring: A. the entire money bail system is terrible, doesn’t work, and shouldn’t be given further credibility and B. algorithms can easily recreate bias working around whatever minor corrections are attempted)

                              1. 11

                                you’ll only be disappointed by this if you trusted slack in the first place

                                1. 12

                                  For many of us I assume this isn’t so much of a choice because it’s mandated by our employer.

                                  1. 2

                                    unionize!

                                    1. 7

                                      then the union is on slack too

                                      1. 1

                                        yours is?

                                        1. 7

                                          Slack often is desired by users, sometimes even set up as clandestine shadow IT uncontrolled by corporate sysadmins. If organized labour is made of these people, it seems logical to assume those people would want Slack there too.

                                          1. 1

                                            it’s logical that they would want some form of communication, if that’s what you mean. but i don’t see where you get to slack being the obvious choice.

                                            1. 1

                                              What makes you think that a union wouldn’t choose Slack?

                                              1. 1

                                                union members are likely to care about freedom

                                          2. 2

                                            Yup.

                                            1. 1

                                              :(

                                        2. 9

                                          Regrettably, in the US, most software professionals are opposed to unionization; and outspokenly supporting them is hazardous to one’s employment. Furthermore, unionization presents a path to removing Slack from the workplace, but certainly does not guarantee it.

                                          1. 2

                                            supporting them is hazardous to one’s employment

                                            it’s illegal to fire someone for organizing a union

                                            1. 3

                                              It’s illegal but it happens all the time.

                                            2. 2

                                              Regrettably, in the US, most software professionals are opposed to unionization

                                              They are?

                                      1. 5

                                        I’m mostly using borg for backups, but still use duplicity where I want the backup source to only be capable of encrypting new backups and not decrypting old ones. Is there another backup system, nicer than duplicity, that allows you to make backups using only the public part of a keypair?

                                        1. 1

                                          Duplicity can also use your public key. Instead of providing a passphrase you can use the --encrypt-key flag to provide your key’s fingerprint.

                                          1. 2

                                            Exactly, that’s why I’m still using it. But is there a nicer alternative?

                                            1. 2

                                              Not sure if it’s nicer, but tarsnap does it and hasn’t been mentioned so far

                                              1. 2

                                                While tarsnap is really cool, the pricing makes it sort of in a different category (i.e., it’s good for backing up important stuff, whereas duplicity, restic, borg, duplicacy, etc, can be affordable to backup everything).

                                        1. 3

                                          I tried Duplicity with GPG but sadly I found it lacking, even for rarely looked at archives. I eventually moved to restic and it works splendidly.

                                          1. 3

                                            I also do backups using restic against a cloud storage (in my case a Ceph cluster), this has two advantages:

                                            1. backups are stored redundantly
                                            2. restic backups against an HTTP endpoint are much faster than over SSH
                                            1. 2

                                              My biggest complaints about restic are the lack of access controls and slow pruning of data. Perhaps those may be fixed one day.

                                              1. 2

                                                What were you missing from duplicity?

                                                1. 5

                                                  Not the OP, but the fact that you can’t delete intermediate incremental backups is pretty bad… Pruning is a pretty key aspect of most backup strategies (as I want daily going back N days, weekly going back N weeks, monthly going back N months, etc). Also, duplicity would run out of memory for me (but restic would too – I eventually settled on the free-to-use-but-not-free-software duplicacy, as I wrote about https://dbp.io/essays/2018-01-01-home-backups.html – some more details about the OOM stuff on the lobsters thread https://lobste.rs/s/pee9vl/cheap_home_backups )

                                                  1. 3

                                                    For one, being able to restore single files without scanning through the archive. The duplicity guys do know about the problems with using tar, but I don’t know when they’ll be able to move away from it.

                                                    1. 3

                                                      Are you sure this is not possible with using –file-to-restore ?

                                                      1. 2

                                                        I’m not 100% sure, I’m just going by my limited knowledge of the tar format and what my link says:

                                                        Not seek()able inside container: Because tar does not support encryption/compression on the inside of archives, tarballs nowadays are usually post-processed with gzip or similar. However, once compressed, the tar archive becomes opaque, and it is impossible to seek around inside. Thus if you want to extract the last file in a huge .tar.gz or .tar.gpg archive, it is necessary to read through all the prior data in the archive and discard it!

                                                        My guess is that –file-to-restore has to search for the file in the .tar.gz. If you find otherwise, I’d be interested to know!

                                                1. 8

                                                  If you’re interested in that era, some other fun papers (that aren’t as dense / theory heavy as Girard’s original one introducing linear logic) are:

                                                  • “The Linear Abstract Machine” by Yves Lafont (1988)
                                                  • “Lilac, a functional language based on linear logic” by Ian Mackie (1991)
                                                  • “Computational Interpretation of Linear Logic” by Samson Abramsky (date is a little complicated. The journal pub everyone cites is from 1993, but he was presenting the material in 1990, as publications as early as that or 1991 were referencing unpublished drafts of his, and he did conference versions in 1991/1992)

                                                  The most famous, of course, is Wadler’s “Linear Types Can Change The World” (1990) but IMO it’s a pretty overrated (all the rest I’ve mentioned, while less known, are more interesting…)

                                                  1. 1

                                                    Thanks!

                                                  1. 0

                                                    I’ve seen this said before, and I can personally endorse it – the best supported linux laptop that I’ve ever used is a mac + vmware fusion (and my last laptop was a dell XPS developer edition that shipped with ubuntu… the wifi was much less reliable, the whole thing made an annoying whine constantly, etc. Prior to that I had various thinkpads going back to an X40, which is probably the only other contender for best supported linux laptop I’ve used. If they had kept the quality at that level, I’d still be buying them…). The idea that flaky external displays and no wifi is considered success is baffling to me!

                                                    1. 2

                                                      The author (@asrp) writes:

                                                      But a (bytecode) interpreted language already has a speed penalty so why not just make that a bit greater and allow simpler internals? I’d be happy with a 100x slowdown (compared to, say C) to get direct AST interpretation in a dynamic language.

                                                      Why? The amount of energy consumed by execution time for any common language is astounding, if you can improve average performance for a language like Python or Ruby by 10%, you are talking about massive energy savings (and there are lots of other related benefits). Compilers have been around for a long time for a reason – you can speed up code a lot by doing transformations ahead of time (you can also work at higher levels of abstraction, rule out unsafe patterns, etc). Are you opposed to any compilation, or just ones that transform between very different ASTs? What about if they just eliminate forms? (i.e., desugaring) What about arithmetic simplification? Inlining? Should DSLs no longer be implemented (they are little compilers)?

                                                      Having reference interpreters with the simplest possible implementation makes sense as specification, but ruling out having any faster implementation for real execution seems… bizarre.

                                                      1. 4

                                                        This should probably have a date on it – the last update for the code was 2005; I’m not sure about the paper, but it looks like it formed part of Toshiyuki Maeda’s thesis (2006), and the article was written about in 2008 here: http://lambda-the-ultimate.org/node/2717

                                                        1. 1

                                                          I didn’t have a date on it probably cuz I found it on mobile. Thanks for the dates. I’ll put 2006.

                                                        1. 4

                                                          I find debate around this weird. This code is protected by a license, just like the Linux kernel is protected by a license (and the Linux kernel has 20 year old bits still protected by that license today). Anyone who would be angry with a company for violating the GPL should applaud sending the disc back. Agreeing with a license is different from honouring it, but we should honour others licenses just as we expect the open source code we love and use to be honoured.

                                                          1. 6

                                                            It’s actually not hard to understand. If you support the GPL because it is a pragmatic way to, within our legal system, push for more software to be open source, then it is not inconsistent at all to also want people to violate proprietary licenses in order to make more software open source. You have a goal (people should be able to read the source code of software they use) and you use whatever means you have available to make that happen.

                                                            The idea that this is hypocritical (not saying you are necessarily saying this, but it’s a common argument) is based on a particular liberal political philosophy (liberal as in the individualist, equal application of rules, not its use to mean further to the left on the political spectrum). Certain people take that political philosophy as somehow a ground truth, and then claim people are hypocritical if they don’t fit their beliefs into it.

                                                          1. 4

                                                            What kind of industrial jobs have you found that are centered on formal methods?

                                                            I was under the impression that that it would be a small part of any software development job.

                                                            For example, I read the article about Amazon using TLA+. I’m pretty sure they didn’t hire a bunch of people with formal methods experience. It sounds more like the senior engineers tried it, decided it was a good idea, and then taught/spread formal methods within the existing dev team.

                                                            Does anyone here use formal methods in their day job? If I had to guess, I would probably guess that less than 1% or even zero people reading this have applied formal methods to a piece of code that ships (I certainly haven’t). It seems like it’s more in the research phase, whereas you are talking about getting an “adjacent role at a company”. I could be wrong though.

                                                            http://lamport.azurewebsites.net/tla/amazon.html

                                                            1. 5

                                                              I have! It was a really powerful tool in shipping code, but just that- a tool. It definitely wasn’t what I was hired to do, and in fact it wasn’t even expected at all: I just wedged it in and found it worked.

                                                              To my understanding, most of the people whose job is specifically “use formal methods” are mostly in either hardware or academia.

                                                              1. 7

                                                                To my understanding, most of the people whose job is specifically “use formal methods” are mostly in either hardware or academia.

                                                                That’s probably most of them, especially hardware if you want an industry job. Besides hardware companies using formal methods, there’s a little ecosystem of middleware companies selling the formal-methods tools, mainly to hardware companies. I’ve run across a handful elsewhere though. Financial companies occasionally advertise specifically formal-methods jobs (example). I’ve also seen ads on occasion from aerospace companies (no link handy). There’s Microsoft Research as well, which might even employ more people working on formal methods full-time than any single university does, but MSR is kind of an honorary member of academia.

                                                                1. 3

                                                                  There’s Microsoft Research as well, which might even employ more people working on formal methods full-time than any single university does

                                                                  Maaaaaaaaaaaybe MIT is comparable, but that’s the only candidate I can think of. Outside of academia you also have INRIA, which is a French national research agency that produces a lot of stuff in formal methods too. Coincidentally enough, MSR and INRIA run a joint lab, which IIRC is where all the work in TLA+ happens.

                                                                  1. 5

                                                                    Calling INRIA outside academia is… at least, a slight bit misleading, while technically true. People get PhDs at INRIA (technically they are issued by some other university, but their advisor and research is done at INRIA), and people who are at INRIA sometimes lecture at regular universities (I’m not actually sure how common this is, but talked to someone just a few months ago who was talking about doing this).

                                                                    1. 1

                                                                      Did not know this. Thanks for the correction!

                                                                2. 1

                                                                  Awesome! How did you learn about formal methods – through school or self-taught?

                                                                  This kind of confirms what I was thinking. If you want to use formal methods at a software company, it’s unlikely you’ll be “hired in” for that. It’s more like you’re already a developer on a team, and you have to take the initiative to use it and teach your coworkers it and sell management on its utility.

                                                                3. 4

                                                                  Data61 down here in Australia are often hiring Proof Engineers:

                                                                  https://ts.data61.csiro.au/jobs/proof-engineers2015

                                                                  1. 4

                                                                    In industry I have have used Agda to prove code correct instead of testing it, because the tooling made it quicker to do than using property tests in Haskell.

                                                                    1. 4

                                                                      Whaaaaat!? You may be first person I’ve seen say it was quicker to prove code correct than use testing approaches. Makes me wonder if the code you were working on was a lot of straight-forward math in a functional style or something. You did mention Haskell. In any case, that sounds like it’s worth a write-up and submission to Lobsters for being practical, efficient, and about Agda in industry.

                                                                  2. 3

                                                                    What kind of industrial jobs have you found that are centered on formal methods?… It seems like it’s more in the research phase, whereas you are talking about getting an “adjacent role at a company”.

                                                                    As you suggest, not many :). Mostly hardware companies as others have pointed out. I was including industrial research without being explicit, so Microsoft Research and some other places, e.g. Galois count in my book. I think, in addition to the other suggestions here, I will start going through conference and journal publications to find some more leads.

                                                                    1. 2

                                                                      One easy trick is just going through the formal methods submissions in the publications looking at their University or corporate affiliation. I’ve also submitted quite a few tagged with formal methods here that you can just use Lobsters search for. You can also look for tool vendors such as company doing Atlier-B to see who their customers are. ProB is another with Z and TLA+ support. For instance, ProofPower led me to QinetiQ. AbsInt owns CompCert. Rosu’s group is doing Runtime Verification Inc. Smartcard vendors like Infineon or Gemalto use them a lot for higher-security stuff. Kesterel Institute is an old one that many outsourced to.

                                                                      Just looking at the tools, papers, and reference sections takes you a lot of places. Helps to know the jargon that pops up a lot, too. Examples are Common Criteria EAL7 or DO-178B (recently DO-178C) since they often sold as a tool to aid certification. Or “formal methods industry.”

                                                                  1. 6

                                                                    If you want to validate email addresses in a basic way, look for an @ and a . after it. That’ll catch a lot of mistakes.

                                                                    If you want to validate them in a more thorough way, do that, then split on the @, and then do a DNS lookup on what comes after. If there are MX records, then you know that you can send mail to that. Whether what comes before the @ is deliverable will be impossible to say (only the mail server can decide that), but you’ve done absolutely the best you possibly can.

                                                                    Anything else just seems silly.

                                                                    1. 3

                                                                      Has anybody else seen issues with Duplicity running out of memory? I’ve been backing up ~100GB without issues on a machine with 8GB RAM.

                                                                      1. 3

                                                                        Just a note – your scenario is nearly two orders of magnitude off of what I was observing (100GB vs 600GB data, and 8GB vs 1GB ram), and I didn’t see the error at first (did a full backup, and several incremental additions before it ran into trouble). If it’s linear (for software that’s been around so long, I would be surprised if it were worse than that), then for you to run into the same problem you would need an archive that is about 4.8TB.

                                                                        1. 1

                                                                          Thanks!

                                                                          Do you happen to remember what block size you used? The default of 25MB is way too small, and I believe signatures for all blocks are required to reside in memory.

                                                                          1. 2

                                                                            I just looked at the config file I had (for the duply frontend), and there wasn’t anything for block size, so presumably it was set to the default? That could explain it, though I think there is a little more going on, unless signatures for all individual files also are in memory (as if it were just on every 25MB block, 600GB is only about 25,000, and each signature would have to take up about 36K of memory for it to use up the 900+MB that I was seeing).

                                                                            1. 1

                                                                              I just downloaded the duply script from http://duply.net, and it does look like the default is 25MB. One correction to my comment: the terminology is “volume” (--volsize) not “block”.

                                                                              You’re right that it would have to be saving more metadata per volume for my hypothesis to bear out.

                                                                          2. 1

                                                                            Can you elaborate on what operations were running out of memory?

                                                                            1. 2

                                                                              The actual operation was it unpacking the signatures file from a previous backup (and it retried it I think 4 times, each time running for a while, gradually using up all available memory, before giving up). I think I was just trying to make a new incremental backup. I had made one full backup and several incremental backups, and had just added a bunch of new files and was trying to make a new incremental backup.

                                                                              1. 1

                                                                                Was it on a new machine or anything like that? I’m wondering if I should retry a backup after blowing away my signature cache.

                                                                                Thanks a lot for answering these questions! A potential issue with my backups is extremely worrying.

                                                                                1. 2

                                                                                  Nope, on the same machine, but it had been wiped and reinstalled at least once (so it’s possible that library versions had changed, and perhaps the memory efficiency of some of them got slightly worse). It’s pretty confusing, because previous incremental backups had worked. The upside with duplicity is that in a catastrophe, you pretty much don’t even need duplicity itself to restore (tar and gpg and a bunch of tedium should do it). :)

                                                                        1. 13

                                                                          Thanks for writing this. We see a lot of people write posts saying they moved to DO, but I don’t often see the opposite. The things you wrote speak exactly to our strengths, and value proposition. :)

                                                                          1. 5

                                                                            I spent a good 5 years of my career writing PaaS applications (on AppEngine, although I’ve used Heroku before) and the cost of structuring your app to ‘fit’ is paid back a hundredfold in operations and maintenance ease. Gonna post the 12-factor app URL since I haven’t read it in a while. That document has held up extremely well over time.

                                                                            edit looks like it was posted 4 years ago so I’ll just leave it there.

                                                                            1. 4

                                                                              the cost of structuring your app to ‘fit’ is paid back a hundredfold in operations and maintenance ease.

                                                                              Unless it’s a legacy enterprise app and fit would be better expressed as shoehorning without a major architecture re-creation. I have a fun job.

                                                                              1. 2

                                                                                :(

                                                                            2. 1

                                                                              I just recently moved an application that had been running on Digital Ocean for 3+years (not actually sure how long, I moved it to DO soon after they started) to Heroku and it was such a relief.

                                                                              If you want an actual server with all the complexity that that has (and sometimes, I do – like to be able to tunnel through it, run random commands, serve random files, etc), then DO is great, but if you just want an application, it brings so much baggage that is so unnecessary, especially for the type of low traffic apps that are served by single server deployments that Digital Ocean supports (I’ve also had pretty terrible experiences with DO ops in the past – primarily in the original NYC1 data center, which I’ve heard is the worst, but, I’m happy to not be dealing with that anymore).

                                                                            1. 1

                                                                              So, could you do this from the other end with a scheme-like syntax for Haskell itself?

                                                                              You’d have to implement the macro language from scratch I guess.

                                                                              1. 12

                                                                                You would lose a lot of expressive power if you just tried to re-use the GHC typechecker. I’ve spent a lot of time explaining why this is the case in the /r/haskell thread for this blog post, so I will not re-explain here. Instead, I’ll reproduce part of that discussion.

                                                                                Okay, I’ve been sort of avoiding getting into this particular discussion because it’s really complicated, but it seems like a point of confusion, so let me try and clear a couple things up.

                                                                                First of all, GHC is not a single thing, obviously. It has a lot of different pieces to it: it has a parser, a typechecker, a desugarer, an optimizer (itself composed of many different pieces), and a set of backends. When you say “reuse GHC”, you have to be specific about which pieces of GHC you are talking about. Obviously, if you just reuse all of it, then you just have Haskell, so presumably you mean one of two things: reusing the typechecker or reusing the optimizer and backends. It’s possible you also mean both of those things, in which case the compilation strategy would basically just be “compile to Haskell, then run the whole compilation pipeline”.

                                                                                However, let me make this perfectly clear: based on the way Hackett works, Hackett cannot reuse the GHC typechecker. The typechecking algorithms are fundamentally incompatible. If you are advising reusing GHC’s typechecker implementation, then the answer is “no, it can’t be done, no buts, full stop”. Why? Well, again, it’s the thing I keep referencing and quoting; Hackett requires typechecking to be interleaved with macroexpansion, but GHC’s typechecking algorithm is a whole-program analysis. These are incompatible ideas.

                                                                                GHC’s current typechecking algorithm is obviously wildly different from classic Hindley-Milner, but it keeps the general technique of generating a big bag of constraints and solving them at appropriate times (generally just before generalization). This technique has some really good properties, but it also has some bad ones. The good properties are that it provides fantastic type inference for basically all programs, and it does not impose any particular program order since it is such a global transformation. However, the downsides are that error messages can be frustratingly nonlocal and that it requires a full-program traversal to know the types of anything at all.

                                                                                For Haskell, this isn’t so bad. But what does it mean for macros? Well, keep in mind that a macro system wants all sorts of useful things, like the ability to inspect the type of some binding in order to direct its expansion. You can see this yourself in a highly limited form with Template Haskell, which has the reify and reifyModule. Of course, Template Haskell is not designed to be nearly as expressive as a macro system, but it still imposes severe constraints on the typechecker! From the section of the GHC Users Guide on Template Haskell:

                                                                                Top-level declaration splices break up a source file into declaration groups. A declaration group is the group of declarations created by a top-level declaration splice, plus those following it, down to but not including the next top-level declaration splice. N.B. only top-level splices delimit declaration groups, not expression splices. The first declaration group in a module includes all top-level definitions down to but not including the first top-level declaration splice.

                                                                                Each declaration group is mutually recursive only within the group. Declaration groups can refer to definitions within previous groups, but not later ones.

                                                                                Accordingly, the type environment seen by reify includes all the top-level declarations up to the end of the immediately preceding declaration group, but no more.

                                                                                Unlike normal declaration splices, declaration quasiquoters do not cause a break. These quasiquoters are expanded before the rest of the declaration group is processed, and the declarations they generate are merged into the surrounding declaration group. Consequently, the type environment seen by reify from a declaration quasiquoter will not include anything from the quasiquoter’s declaration group.

                                                                                These are serious restrictions, and they stem directly from the fact that GHC’s typechecking algorithm is this sort of whole-program transformation. In Hackett, every definition is a macro, and macro use is likely to be liberal. This restriction would be far to severe. To combat this, Hackett uses a fundamentally different, bidirectional typechecking algorithm, very similar to the one that PureScript uses, which allows the implementation of a Haskell-style type system without sacrificing modularity and incremental typechecking.

                                                                                My implementation of this type system has been remarkably successful given the novelty of the implementation and the amount of time I have spent on it, not least in part due to the availability of the PureScript source code, which has already solved a number of these problems. I don’t think that there’s reason to suggest that getting a large set of useful features will be difficult to achieve in a timely manner. The key point, though, is that the easy solution of “just call into GHC!” is a non-starter, and it is a dead end just for the reasons I mentioned above (and I haven’t even mentioned all the myriad problems with error reporting and inspection that sort of technique would create).

                                                                                Okay, so using GHC’s typechecker is out. What about reusing the optimizer and compiler? Well, this is actually something I do want to do! As far as I know, this should be completely feasible. It’s a lot more work than just compiling to the Racket VM for now, though, since the Racket VM is designed to be easy to compile to. In general, I want to support multiple backends—probably at least Racket, GHC, and JavaScript—but that is a big increase in work and complexity. Building for the Racket ecosystem to start gives me a trivial implementation with acceptable speed, an easy ecosystem of existing code to leverage, a host of useful built-in abstractions for building language interoperation, a fully-featured IDE that automatically integrates with my programming language, and an extensible documentation tool that can be used to write beautiful docs to make my new programming language accessible. Building a new language on the Racket platform is an obvious choice from a runtime/tooling point of view, it’s only the typechecker that is a lot of work.

                                                                                So, to summarize: reusing the typechecker is impossible, reusing the compiler optimizer/backend is feasible but extra work. If you have any additional suggestions for how I could take advantage of GHC, I’d love to hear them! But hopefully this explains why the simplest-looking route is not a viable choice for this project.

                                                                                — me, on /r/haskell


                                                                                Here’s some more context about what that additional expressive power actually is, from another part of the thread:

                                                                                I’m particularly interested about the metaprogramming aspect. At which point are macros run? In particular, can I get access to type info in a macro? That would allow implementing things like idiom brackets as a macro which would be pretty cool.

                                                                                — cocreature, on /r/haskell

                                                                                This is a great question, and it’s absolutely key to the goal of Hackett. Hackett macros are run at compile-time, obviously, but importantly, they are interleaved with typechecking. In fact, it would probably be more accurate to say that typechecking is subsumed by macroexpansion, since it’s the macros themselves that are actually doing the typechecking. This technique is described in more detail in the Type Systems as Macros paper that Hackett is based on.

                                                                                This means that yes, Hackett macros have access to type information. However, the answer is really a little trickier than that: since the Haskell type system is relatively complex but does not require significant type annotation, sometimes types may not be known yet by the time a macro is run. For example, consider typechecking the following expression:

                                                                                (f (some-macro (g x)))
                                                                                

                                                                                Imagine that f and g both have polymorphic types. In this case, we don’t actually know what type g should be instantiated to until some-macro is expanded, since it can arbitrarily change the expression it is provided—and it can even ignore it entirely. Therefore, the inferred type of (g x) is likely to include unsolved type variables.

                                                                                In many cases, this is totally okay! If you know the general shape of the expected type, you can often just introduce some new type variables with the appropriate type equality relationships, and the typechecker will happily try to solve them when it becomes relevant. Additionally, many expressions have an “expected type” that can be deduced from user-provide type annotations. In some situations, this is obvious, like this:

                                                                                (def x : Integer
                                                                                  (my-macro (f y)))
                                                                                

                                                                                In this case, my-macro has access to the expected type information, so it can make decisions based on the expectation that the result expression must be an Integer. However, this information can also be useful in other situations, too. For example, consider the following slightly more complicated program:

                                                                                (def f : (forall [a] (SomeClass a) => {(Foo a) -> a}) ...)
                                                                                
                                                                                (def x : Integer
                                                                                  (f (my-macro (g y)))
                                                                                

                                                                                In this case, we don’t directly know what the expected type should be just by observing the type annotation on x, since there is a level of application in between. However, we can deduce that, since the result must be an Integer and f is a function from (Foo a) to a, then the expected type of the result of my-macro must be (Foo Integer). This is a deduction that the typechecker already performs, and while it doesn’t work for all situations, it works for many of them.

                                                                                However, sometimes you really need to know exactly what the type is, and you don’t want to burden users with additional type annotations. Typeclass elaboration is a good example of this, since we need to know the fully solved type of some expression before we can pick an instance. In order to solve this problem, we make a promise to the typechecker that our macro’s expansion has a particular type (possibly in terms of existing unsolved type variables), and the typechecker continues with that information. Once it’s finished typechecking, it returns to expand the macro, providing it a fully solved type environment. This is not currently implemented in a general way, but I think it can be, and I think many macros probably fit this situation.

                                                                                In general, this is not a perfectly solvable problem. If a macro can expand into totally arbitrary code, the typechecker cannot proceed without expanding the macro and typechecking its result. However, if we make some restrictions—for example, by weakening what information the macro can obtain or by restricting the type of a macro’s expansion—we can create macros that can implement many different things while still being quite seamless to the user. I think implementing idiom brackets should not only be possible, but it should probably be a good test at whether or not the implementation is really as powerful as I want it to be.

                                                                                For a little bit more discussion along these lines, see this section of a previous blog post.

                                                                                — me, on /r/haskell

                                                                                1. 1

                                                                                  Right, so the short version is: yes, but a naive implementation would be impoverished by comparison because in Hackett macro expansion is tightly integrated with type checking which means that macros have access to type information at expansion time which enables richer macro definitions.

                                                                                  If you rewrote the Haskell frontend to do that, then you’d have to re-write the type checker along the way in order to end up with something that looked a lot like the existing Hackett compiler.

                                                                                  I guess you’d also have to deal with all the questions about how the macro expansion would integrate with the endless extensions to the Haskell type system. Not a small task!

                                                                                  I’ll look forward to seeing more of Hackett in the future!

                                                                                2. 2

                                                                                  Reading her prior posts on the subject, she talks about the tools the racket ecosystem provides for creating programming languages. So I’m guessing if she ever does implement it in haskell (for instance, make it easier to import haskell libraries) she’ll have to wait until she’s gathered a few more helping hands.

                                                                                  1. 4

                                                                                    (The author is not a he – as per her twitter, https://twitter.com/lexi_lambda, she identifies as she/her).

                                                                                    1. 2

                                                                                      My bad, I’ll correct it!

                                                                                  2. 1

                                                                                    haskell has template haskell, which is it’s version of macros/metaprogramming, so it might not necessarily need to be done from entirely from scratch.

                                                                                    1. 1

                                                                                      Sure, but template Haskell’s syntax is legendarily awkward compared to the equivalent macro in lisp / scheme. I dread to think what implementing a macro language in Template Haskell would look like.

                                                                                      But maybe I’m overthinking it :)