Threads for athas

    1. 29

      There is a “GOTO” keyword, but instead of something as prosaic as affecting the internal control flow of the program, its argument is in fact an IP address, and denotes that execution of the program immediately moves from the current machine to the indicated machine.

      1. 18

        This of course is mirrored by its sister keyword COMEFROM which, thanks to the distributed nature of GOTO, can redirect any currently executing code on a remote server to be halted and resumed locally.

      2. 3

        Ah, this is a feature in General Magic’s programming language Telescript.

    2. 5

      Haskell is a unique vehicle for bringing modern programming language theory (PLT) to a production-quality programming language

      What bothers me with this quote is the latter part. What makes Haskell a production-quality language? There is only a single compiler. There are so many different dialects that different team members will write different code hampering later understanding for anyone joining the club. On top of that, debugging and deployment tools are lacking. In addition, upgrading language and libraries gets one in a funny situation that not everything works due to something down the chain not being updated. Finally, support between different platforms is so different in quality that support becomes a nightmare in fighting GHC.

      1. 10

        This is an odd comment. I’ve been an almost exclusive full time Haskell developer for the last 3 years, using the language as the main workhorse on multiple profitable products. I find it funny that you’re bothered by the notion that Haskell is production-quality.

        There are so many different dialects that different team members will write different code hampering later understanding for anyone joining the club.

        My experience has been the exact opposite. My latest project involved many Haskellers many of whom never saw each other. And there have been periods in the project when no Haskellers were left. Which means the lore of the program were carried purely through the code, and yet, this project has seen constant improvement without a single rewrite. I was able to push my first features and bug fixes into production within the first week I’ve joined with virtually zero introduction to the codebase. I understand where your “dialects” prejudice is coming from, but it doesn’t apply to Haskell for some reason. Maybe the extremely strong type system coupled with purity makes it so that it’s always easy to understand what your predecessor meant.

        On top of that, debugging and deployment tools are lacking.

        Haskell has a debugger, though I’ve never bothered to try it, because functional programming (code is split into small independent functions) + a good REPL means you don’t need to step through your code to understand it. Well, even “stepping through code” has the sequential mentality built into it, but I digress. From this point of view, OOP languages aren’t production quality because you don’t have a good REPL experience with them, even when the REPL is good as is the case in Python, because OOP and its twisted notion of encapsulation is hostile to REPLs.

        I don’t know what you mean by a deploymeny tool? The compiler compiles a binary and you deploy it using whatever…

        upgrading language and libraries gets one in a funny situation that not everything works due to something down the chain not being updated

        Never experienced that, upgrading to never dependencies has always been a breeze. You get 2-3 stupid type errors and you fix them quickly and it just works.

        1. 3

          Never experienced that, upgrading to never dependencies has always been a breeze. You get 2-3 stupid type errors and you fix them quickly and it just works.

          I think you must have gotten lucky. For example, these days I am fighting regex-base, which is an old and unmaintained package that lies at the root of the fairly popular regex-* package hierarchy, and which has become broken in GHC 8.8 due to the MonadFail change. There are lots of cases like this for every new GHC release. It usually takes a month or two of frustrating poking to shake it all out. For GHC 8.8 specifically I also remember that Happy (the parser generator) and vector-binary-instances (some other foundational utility package) needed code modifications.

          1. 3

            Why do you want to upgrade to GHC 8.8 before it hits a stackage lts though? Before there’s a stackage lts, I regard new GHC releases to be previews for library authors to catch up.

            My comment applies to the case where you give a new GHC release a decent amount of time to settle before upgrading.

            1. 5

              How do you think GHC 8.8 becomes ready for Stackage? It’s due to library users going through all their dependencies and fixing all these small things. The Haskell ecosystem has a relatively large amount of important packages with relatively absentee maintainers (sometimes because they are busy with other Haskell things), and it can take quite a while to chase all of them down when new GHC releases come out.

              Just because you are not the one feeling the pain, that doesn’t mean someone else isn’t paying the cost. While I do use Stackage, I usually have to go through a bunch of busywork to get my dependencies re-added after every GHC release. (I’m actually so exhausted of it, that this time I think I will stick with just getting the fixes on Hackage and then hope someone else deals with Stackage, and maybe just abandon Stackage entirely.)

              1. 4

                While this is definitely true, I’m not sure how it impacts “production readiness”. Many languages face serious issue when they make backwards-breaking changes. Some just outlaw these. Most companies which use those languages in production thus work only conservatively to follow these updates, or may not at all.

                From a business production-readiness POV, the risk is either (a) older versions of GHC will eventually become completely broken due to loss of some critical infra and my strategy of holding back will bite hard or (b) new growth in the community is hampered so completely by some change that I can’t participate in upgrades when I want to pay down the internal cost.

                For a lot of “production” languages, only (a) matters because people just won’t upgrade after a while. I think Haskell is largely safe for (a) because it maintains old packages (is idemopotent). The work you do is deeply appreciated and enables both (b) and general health of the community under new development (non stagnation).

              2. 1

                I’m sorry, I was really focused on the “production-quality” point that I didn’t even consider you were talking from a library-maintainer/author standpoint. I really appreciate all the effort people put into making stackage lts’es into what they are. The dream package manager experience in my opinion. I think the ergonomics of maintaining a Haskell package is also very important, but I don’t think it affects the standing of the language in application development as long as Stackage snapshots are so well.

        2. 2

          Maybe we have different experiences. I was dealing with codebases written by different teams and each is a separate islands with their idioms and use of pragmas. Onboarding security researchers without prior knowledge of Haskell was a nightmare, because the learning surface just kept expanding in front of their eyes.

          REPL doesn’t mean it’s production ready. Production ready (at least for me) means having the tools to deploy to a large scale cluster, to inspect the running machine, to be able to introspect the running system for all necessary metrics (the last one would mostly apply to languages running on a VM).

          1. 2

            Onboarding security researchers

            I imagine this could be a problem. Learning Haskell is a significant effort.

            the learning surface just kept expanding in front of their eyes

            I wonder why the domain experts had to understand all of Haskell? I don’t know the circumstances, but you often try to expose a DSL to domain experts, or at least make them responsible for a part of the codebase that’s mostly written in an EDSL, so they don’t have to know what monads are.

            Production ready (at least for me) means having the tools to deploy to a large scale cluster …

            I think I’m missing something; What’s wrong with just producing a binary, build a docker image around it and deploy it as such and get all the features of its ecosystem.

      2. 7

        If you use the nix-style install and build commands available with cabal 2 and later, libraries aren’t installed globally, and the early problems with upgrading globally installed libraries go away. I suggest “cabal new-build” and “cabal new-install” for all projects.

        Similarly, ghcup is a good way to install multiple versions of the GHC compiler installed at the same time, and easily switch among them.

        1. 8

          I suggest “cabal new-build” and “cabal new-install” for all projects.

          These are the default as of Cabal 3 which was released a few weeks ago.

      3. 4

        Nothing against Haskell, even though I don’t agree with “unique” part, but the update part is really done much better in the OCaml ecosystem, and that’s one of the things keeping me there. It’s dead simple to keep multiple compilers on the same machine, import/export installed package lists, and test any latest versions and experimental flavours without breaking your main development setup. You can vendor libraries, too.

        1. 2

          These days I use ghcup ( https://github.com/haskell/ghcup ) to install and switch among multiple versions of the GHC compiler. It’s very much a copy of rustup. At the moment it supports ten different versions, though I only have two versions installed.

          What’s the OCaml tool that does the things you describe above?

          1. 3

            As @sanxiyn says, it’s opam, the standard package manager.

          2. 1

            Presumably parent is referring to opam and opam switch.

        2. 1

          OPAM is good, but Stack IMO is better. The compiler installations in Stack are implicit, the choice of the compiler depends on your stack.yaml, and you can have multiple stack.yamls in a project to test with different compilers&package-sets.

      4. 4

        No company actually cares about their programming languages having multiple compilers in 2019.

        1. 4

          There will only ever be a single compiler for Scala. Nobody will be able to successfully recreate its behaviour.

        2. 0

          Monoculture is such a great concept, doing wonders around the world. Just look at bananas.

          1. 3

            Monoculture existential risk is real. It’s also tiny. It’s the annoying low prevalence, high priority style risk.

            So, yes, we’re all fucked if GHC/bananas goes off the deep end and dies somehow and we institute other kinds of management techniques to mitigate that risk. For GHC, we have steering committees and community-led development which both seek to limit the rate at which bad decision-making can kill the project.

            1. 1

              Yes, and it’s a good thing there is a well structured steering committee. But my comment was in jest, as one of the great things of multiple compilers is that they will focus on different aspects, bringing a lot of good to the ecosystem. Or, at least, companies could aim for something that of interest. I constantly look at Java ecosystem, and for better or worse, there are some pretty good runtimes there.

          2. 1

            Because compilers are exactly like horticultural diversity, of course.

    3. 14

      Because paying lawyers is better than actually fixing your site using basic well-established tech that has been around for decades 🙄

      1. 0

        Every large company is going to be paying the lawyers regardless.

        1. 2

          Yeah, how often is the Domino’s legal department going to have the opportunity of going to the Supreme Court? They might be doing this for the novelty of it all!

        2. 2

          Well perhaps they would need fewer layers if they were not pursuing cases like this

    4. 2

      The idea of a language specifically targeting GPUs is interesting. One thing I’d mention here is that such a language actually would not have to be only vector-based.

      A project I’ve been interested in for a bit is Harry Dietz’ MOG; this translates general purpose parallel code (MIMD, multiple instruction, multiple data) to the GPU with at most a modest slowdown time (1/6 + running vectorized instructions at nearly full speed).

      See: http://aggregate.org/MOG/

      1. 3

        GPUs are quite a bit more flexible in their control flow than traditional SIMD machines (NVIDIA calls this SIMT), so I think it’s quite clear that you could have each thread do quite different work. The problem is that this is going to be very inefficient, and I don’t think a x6 slowdown is the worst it can get. Worst case warp/wavefront divergence is a x32 slowdown on NVIDIA and a x64 slowdown on AMD (or maybe x16; I find the ISA documentation unclear). Further, GPUs depend crucially on certain memory access patterns (basically, to exploit the full memory bandwidth, neighbouring threads must access neighbouring memory addresses in the same clock cycle). If you get this wrong, you’ll typically face a x8 slowdown.

        Then there’s a number of auxiliary issues: GPUs have very little memory, and if you have 60k threads going, that’s not a lot of memory for each (60k threads is a decent rule of thumb to ensure that latency can be hidden, and if the MOG techniques are used it looks like there’ll be a lot of latency to hide). With MIMD simulation, you probably can’t estimate in advance how much memory each thread will require, so you need to do dynamic memory management, likely via atomics, which seems guaranteed to be a sequentialising factor (but I don’t think anyone has even bothered trying to do fine-grained dynamic allocation on a GPU).

        Ultimately, you can definitely make it work, but I don’t think there will be much point to using a GPU anymore. I also don’t think the issue is working with vectors, or data-parallel programming in general. As long as the semantics are sequential, that seems to be what humans need. Lots of code exists that is essentially data-parallel in a way roughly suitable for GPU execution - just look at Matlab, R, Julia, or Numpy. (Of course, these have lots of other issues that make general GPU execution impractical, but the core programming model is suitable.)

        1. 1

          Thank you for the reply.

          I assume you have more experience in making things work on a GPU than I. Still, I’d mention that the MOG project tries to eliminate warp divergence by using byte and a byte-code interpreter. The code is a tight loop of conditional actions. Of course, considerations of memory access remain. I think the main method involved in devoting a bit of main memory to each thread.

          I believe Dietz is aiming to allow traditional supercomputer applications like weather and multibody gravity simulations to run on a GPU. One stumbling block is people who buy supercomputers are working for a large institution and aren’t necessarily that interested in saving their last dime.

    5. 1

      My main question: is it pronounced “futt hark”, “footh ark”, or “futh ark”?

      1. 1

        Etymologically, foo-thark, with th pronounced as in the. But fut-ark is also common.

    6. 2

      This is nice. The best Makefiles are nearly empty and make heavy use of templates and implicit rules. I would make a couple small changes:

      1. I’m not sure why the target that generates dependency Makefile fragments renames the generated file. This should work:

        %.d: %.c Makefile
        $(CPP) $(CPPFLAGS) -M -MM -E -o “$@” “$<”

      2. You might want to prevent generating Makefile fragments for the clean goal. A conditional include can help:

        ifneq ($(MAKECMDGOALS),clean)
        -include $(DEPS)
        endif

      3. Remaking target objects if the Makefile changes can be simply:

        $(OBJS): Makefile

      1. 3

        While I also do use templates and implicit rules when convenient (your example is certainly one of these), my experience is that Makefiles are best when they try not to be clever, and simply define straightforward from->to rules with no room for subtlety. As an example, make treats some of the files produced through chains of implicit rules as temporary, and will delete them automatically. In some cases, I have found this will cause spurious rebuilds. There is some strangely named variable you can set to avoid this deletion, but I’d rather such implicit behaviour be opt-in than opt-out.

        Sometimes a little duplication is better than a little magic.

        1. 3

          Yes, the special target .PRECIOUS can be used to mark intermediate files that should be kept. Cf. https://www.gnu.org/software/make/manual/make.html#index-_002ePRECIOUS-intermediate-files

          My recommendation for anyone who wants to learn to effectively use make: Read the manual. All of it. Keep it handy when writing your Makefile.

          People have already done the hard work of getting it to work right under most circumstances. I don’t consider it clever to stand on their shoulders.

    7. 2

      I 100% sympathize from the perspective of a scientist… But most of computer and program design since the 1950s has been computer engineering, which includes the uncertain art of choosing tradeoffs between perfect science and ugly practical needs. This case is no different, even when we have billions of transistors at our command.

      More specifically, what this article discusses is a trade-off in GPU computation overhead vs. aggregate performance. This is an optimization problem. The tradeoffs that make sense now are not the ones that made sense ten years ago, and will not be the ones that make sense ten years from now when the balance of CPU computation speed vs memory bandwidth and GPU computation speed vs CPU<->GPU transfer speed is different.

      So what it sounds like, without being critical, is that the compiler writer needs to step back from writing compilers, consider this problem as a more abstract balance of trade-offs, and consider their goals to see where they fall in the spectrum of options. Then go back to writing compilers with that goal in mind.

      1. 2

        You can always change the compiler as hardware changes. That’s the point of a compiler - that you can put local, hardware-specific information in it, and then change it as hardware changes, without changing the code that uses the compiler.

    8. 13

      I’m upvoting this mostly in the hope that HPC programmers will comment on it. I’m quite curious about how HPC programmers actually see the world, but they appear quite elusive online, or maybe I just can’t find their meeting places. I only ever meet them at conferences and such.

      1. 15

        I have a masters degree in CS with a focus in HPC. This article is mostly correct for commodity HPC. The lack of fault tolerance primitives in MPI was a pain, meaning you’d start a job and hope for no serious errors, checkpointing as often as possible, and then if something went wrong (hardware, net, etc) you’d have to restart. HPC for me was molecular dynamics simulations and things like that, the control of MPI was needed if you were going to run your systems on large supercomputer setups like the USG has. Still that would often require porting and compiler fun to make things work.

        I wouldn’t say HPC is dying, it’s just diffusing from “hard physics” (mostly rote floating point vectorized calculations) into the worlds of bio and data science that have different needs and are often just as much about data processing as anything. Physics like astronomy and particle physics have been dealing with scads of data already and have their own 20-30 year old data formats and processes.

        The article is correct, the sort of big 1000 core simulation groups are limited to maybe 10-25 research groups worldwide if that (in the multi-disciplinary world of materials science) and in my time in grad school I met most of the big names. That’s not a market of people, that’s a niche user group with their own needs and they can do what they want with the primitives available. I don’t know much about large scale simulation (ie that isn’t ‘embarrassingly parallel’ and requires near lock step execution across tons of machines) in other fields like civil, molecular bio, etc but I’m sure their user bases are small as well.

        In the end the needs of a handful users won’t sway the direction of the market. See for instance the bet on the Cell processor at BlueGene/L not making sony/toshiba/ibm pursue the design (even though it’s influenced CPU/GPU designs to this day). There’s your ramble. :)

        1. 2

          I’m here to agree with this. HPC traditionalists are largely struggling to achieve performance in worlds like deep learning, where their tools and architectures are designed for the wrong problem (e.g. Lustre is great for random access to files, not so great for AI applications where you do a whole lot more reading than writing).

          Meanwhile, cloudy novelty fans struggle to achieve performance in areas where traditional HPC both performs well and has been optimised over decades. I remember a fluid simulation demo, though not the domain, where some Apache-stack people wanted to show off how “performant” Apache-stack was. The MPI code was done before the MapReduce thing had finished launching.

    9. 4

      We really have three options open to use:

      And yet none of these options include what most good C libraries do, which is let the programmer worry about allocation.

      1. 6

        That’s not really a good fit for a high-level language, nor if you want to expose functionality that may need to do allocation internally. I do think that the module approach (where the programmer specifies the representation) is morally close.

      2. 4

        Wait, why do we want programmer’s to worry about allocation? Isn’t that prone to error and therefore best automated?

        1. 3

          Because the programmer theoretically knows more about their performance requirements and memory system than the library writers. There are many easy examples of this.

          1. 5

            Theoretically, yes. In practice, it is an enormous source of bugs.

            1. 3

              In practice, all programming languages are enormous sources of bugs. :)

              But, here, from game development, here are reasons not to rely on library routines:

              • Being able to audit allocations and deallocations
              • Knowing that, at level load, slab allocating a bunch of memory, nooping frees, and rejiggering everything at level transition is Good Enough(tm) and will save CPU cycles
              • Having a frame time budget (same as you’d see in a soft real-time system) where GCing or even coalescing free lists takes too long
              • Knowing that some library (say,std::vector) is going to be doing lots of little tiny allocations/deallocations and that an arena allocator is more suited to that workload.

              Like, sure, as a dev I don’t like debugging these things when they go wrong–but I like even less having to rewrite a whole library because they don’t manage their memory the same way I do.

              This is also why good libraries let the user specify file access routines.

        2. 3

          It’s not the allocation that’s error-prone, it’s the deallocation.

          1. 6

            And not even the deallocation at time of writing. The problems show up ten years later with a ninja patch that works and passes tests but fails the allocation in some crazy way. “We just need this buffer over here for later….”

          2. 3

            How would a library take control of deallocations without also taking control of the allocations, too?

            1. 4

              As I understand, a library does not allocate and does not deallocate. All users are expected to BYOB(Bring Your Own Buffer).

              1. 2

                In which case, it really didn’t matter (in this context) if allocation-isn’t-hard-it’s-deallocation-that. The library is leaving both up to the application anyway.

      3. 3

        Yeah, we saw what that’s like with MPI. Those bad experiences led to languages like Chapel, X10, ParaSail, and Futhark. Turns out many app developers would rather describe their problem or a high-level solution instead of micromanage the machine.

    10. 2

      I thought the punchline was macros but alas it was map and reduce which still relies on compiler magic. If it was macros then the programmer could decide on the threshold themselves.

      1. 3

        There is nothing that prevents a programmer from providing a module that implements map and reduce with some threshold mechanism. It’s as flexible as macros in that regard.

        1. 1

          So I can reflect over the structure and count elements to decide how many can be inlined before recursing?

          1. 5

            Yes. The only thing that is missing from the vector package is that there is no dynamic value exposing the size of the vector, so you’d have to roll your own. However, you’d have to actually produce code that performs the branch dynamically, and then depend on the compiler doing constant-folding to remove the branch (but this is pretty much guaranteed to work).

            It’s certainly not fully as powerful as Lisp-style macros, but good enough for this purpose.

    11. 3

      Gods, for a moment I saw REMY.DAT;2 and thought I’d been writing articles about VMS while I thought I was sleeping.

      VMS is the best OS in history, and nothing has quite matched it. And I say that even after learning to do assembling on a VAX machine. Which was horrible, but that’s not VMS’s fault.

      1. 2

        Haha, another Remy here. Shell accounts are available on DECUS if you want to play around again

      2. 1

        What makes OpenVMS better than its competitors (mostly Unix I guess)? From the article, it seems fascinatingly different, but ultimately it just looks more complex in terms of feature count.

        1. 5

          Admittedly, it’s a bit of hyperbole. But there are a lot of things VMS did before Unix. When I used VMS, it was on a mildly large cluster. Clustering at that scale just wasn’t a thing in Unixland at the time. The filesystem is itself interesting, and the inherent versioning doesn’t make things all that much more complex.

          But the biggun is its binary formats. VMS had a “common language environment” which specified how languages manage the stack, registers, etc., and it meant that you could call libraries written in one language from any other language. Straight interop across languages. COBOL to C. C to FORTRAN. FORTRAN into your hand-coded assembly module.

        2. 3

          As a newbie, I notice more consistency. DCL (shell) options and syntax is the same for every program, no need to remember if it’s -h, –help or /? etc. Clustering is easy, consistent and scales. Applications don’t have to be cluster aware and you are not fighting against the cluster (as compared to Linux with keepalived, corosync, and some database or software).

        3. 3

          I’ll add on the clustering that it got pretty bulletproof over time with many running years, even 17 claimed for one. Some of its features included:

          1. The ability to run nodes with different ISA’s for CPU upgrades

          2. A distributed, lock protocol that others copied later for their clustering.

          3. Deadlock detection built into that.

          There was also the spawn vs fork debate. The UNIX crowd went with fork for its simplicity. VMS’s spawn could do extra stuff such as CPU/RAM metering and customize security privileges. The Linux ecosystem eventually adopted a pile of modifications and extensions to do that sort of thing for clouds. Way less consistent than VMS, though, with ramifications for reliability and security.

          EDIT: In this submission, I have a few more alternative OS’s that had advantages over UNIX. I think it still can’t touch the LISP machines on their mix of productivity, consistency, maintenance, and reliability. The Smalltalk machines at PARC had similar benefits. Those two are in a league of their own decades later.

    12. 6

      I’ve never understood why Scroll Lock doesn’t do something sensible by default on Unix systems. For example, locking the terminal so further command output does not cause further scrolling. This is probably what I would a priori assume was the purpose of Scroll Lock (related to what Ctrl-s does).

      1. 1

        I think this is troublesome when you have more than one terminal, but could be fun to cook something up.

      2. 1

        Looked into this. xterm seems to have pretty intelligent handling. Locks scrolling while active, turns on led, etc.

    13. 20

      Impressive! I particularly like the smoke effects that come from fire. Are you doing some fluid dynamics to get that behaviour?

      1. 21

        yep, the fluid simulation also runs completely on the GPU, which is one reason it’s so smooth. I adapted this excellent implementation for my needs: https://github.com/PavelDoGreat/WebGL-Fluid-Simulation

    14. 4

      I don’t understand why the author compares an Amiga (model not mentioned) with a laptop and remarks:

      How long do you think you could keep a modern laptop working? Four or five years? Maybe?

      When in the next paragraph it states:

      While the system has been in service all these years, it hasn’t always been a smooth ride. The monitor, mouse, and keyboard have all broken at one time or another.

      So, it is not that surprising, one could easily keep a desktop computer operational for all these years, if we disregard any philosophical paradox which might occur.

      1. 2

        Laptops are built with compromises to permit portability. I would not be surprised if they are not as long-lasting as machines built with fewer such compromises. There are also stories of ancient DOS machines still running, so I’m not sure the Amiga was anything special in that regard. Unfortunately, I doubt anyone has done proper studies on the long-term durability of 80s microcomputers!