1. 11

    Python is oriented towards productivity

    I think this is not invalid, but if you are looking for productive languages in 2021 you could do much better than python. The place where python does still have a competitive edge, of course, is hiring. But then again, you may rue the easiness of hiring because there quite a few footguns and being easy to hire increases the likelihood that you wind up getting someone that is amateurish (even though they are ‘seniors’), or if you get a junior, there aren’t guardrails around those footguns.

    1. 7

      I think this is not invalid, but if you are looking for productive languages in 2021 you could do much better than python.

      This seems incredibly subjective to me. It’s great that you feel that way, but when you word it this way you make it sound like an absolute which it most assuredly is not.

      1. 6

        If you don’t agree that Python is optimizing for developer speed (and I certainly don’t) then the whole article falls apart.

        You say hiring might be easier, but I decline anything from recruiters if it’s Python. I’m so done trying to make sense of code bases with no typing. Mypy helps, but real static types it is not.

        As far as I’ve seen, Python is picked because it’s the language everyone on the team knows.

        1. 4

          Python productivity

          There are two types of programmer productivity, addressing different problems:

          1. I don’t know much about the underlying tech. Can I get this script running in 30 minutes?
          2. I don’t know much about the problem domain. Can I change this multi-million line behemoth without breaking everything and getting fired?

          Python excels at (1) and fails miserably at (2). Yes, I’m aware of mypy and its attempts at solving the second problem. It’s not there yet.

          1. 2

            think this is not invalid, but if you are looking for productive languages in 2021 you could do much better than python

            Such as? And why?

            1. 4

              I’m a plain old C/asm (or Lisp) guy, but it seems to me that if you’re thinking about using Python then modern Javascript (e.g. TypeScript if you like type annotations, as I do) running in node.js does pretty much everything Python does but fifty times faster if you’re actually writing algorithms not just gluing together other people’s C libraries. There’s a similar extensive library of modules ready to use. There’s a slightly different set of footguns. but are they worse? Probably not.

              1. 5

                GP was claiming that there were probably “much better than python” out there, not “roughly comparable”.

                But your specific claim is not even true. Node.js doesn’t do everything python does, at all. It doesn’t do sync io nor does it have support for multi threading or multi processing. It does allow you to start your application in multiple processes but doesn’t offer you a way to control them like you do in C for example. This is a huge deal. You have no way to control basic scalability of your application. It will queue up all io calls until it exausts the resources. And has no other concept of concurrency than essentially making everything in paralell. It’s a memory leak by design.

                It has an old fashioned and less ergonomic syntax with many more corner cases and qwirks than python. And to my knowledge, it has no well established (or probably not any at all)? GPU library.

                The point is that python is versatile. Node js is not even primarily a programming language implementation. It is a single thread event machine that ships with a JavaScript API, provided by an implementation extracted from a browser. Still puzzles me that people don’t find this weird. A library is the main piece and the programming language comes as an addition.

                1. 1

                  No two things that are distinct do all the same things as each other. You choose the features that matter to you.

                  “It doesn’t do sync io nor does it have support for multi threading”

                  That’s a contradiction. If you’re programming in plain old JS then it doesn’t do sync io because while you’re waiting it goes off and does things for other threads. But if you’re programming in Iced Coffescript or TypeScript then you can write your program as if the IO was synchronous, without explicitly writing the underlying callback yourself – exactly the same as happens in C where when you do synch io your thread library (or the OS) saves all your registers and loads someone else’s registers and runs another thread until your IO is done.

                  1. 1

                    If you’re programming in plain old JS then it doesn’t do sync io because while you’re waiting it goes off and does things for other threads.

                    What makes you think that is how it works? It is not. Browser JavaScript and nodejs are single threaded. In fact, the whole reason nodejs was created was to provide a build of a single threaded js implementation with async io. Everything is performed in the same thread. Perform an operation that takes perceptible time and your whole application freezes. Your browser will get unresponsive. Put a for cycle with a few million iterations inside a callback and see what happens.

                    Nodejs is standalone build of V8, which is chrome’s JavaScript engine. Early nodejs builds for Windows did execute IO by relying on worker threads, but that is not the point as you don’t have access to them. It was a workaround not intended for production usage but rather to provide a solution for people to use their windows machines for development. In the end you only have access to one thread and you have to do everything there, by design. You have to trust it to use whatever resources it needs to complete io (essentially starving the machine resources if you push it) and have no API to control what gets done simultaneously or when to wait for what. Notice that this is a perfectly acceptable value proposition for a script that is executed in in the context of a webpage. But it is absurd in simple cases such as writting on a text file line by line sequentially. You have no way of doing it without letting nodejs opening as many IO descriptors as quick as possible. Sort of like fork bombing your machine. Nodejs did include synchronous io, but they deprecated because people put it inside callbacks and flooded GitHub with tickets claiming their application would ‘randomly’ freeze.

                    To this day, I still haven’t heard the reason why one would choose to write a server application or even a command line utility using this IO model. From what I gathered, the rational includes people not being familiar with multi threading and synchronisation APIs and creating race conditions.

                    I am not sure what you are referring to when you mention typescript. Last time I checked, it was simply a compiler targeting JavaScript with no functionality whatsoever. But that was a few years ago, I don’t know about presents day. Iced coffescript does provide alternative IO APIs though.

          1. 1

            I have a few ex-corporate 4 MB SE/30 with thinnet ethernet cards in storage. I haven’t tried to start them in 15 years probably. In the late 90s I loaned one for a while (with a quick and dirty FIlemaker database app) to a friend with a retail store whose Windows PC had just died.

            The thing that absolutely blew me away in 1989 when the SE/30 came out was that it supported 128 MB RAM (with large DIMMs not yet available) while the RAM you could buy one with was 1 MB or 4 MB.

            The first computer I paid my own money for was a Mac II/cx, which is the same machine as an SE/30 except for being in a box with expansion slots and no build in monitor. The prices were pretty much the same. I bought it as a 0/0 – 0 RAM and 0 disk – and put 3rd party RAM and disk in it. I used that until I bought a PowerMac 6100AV.

            I thought it was pretty cool you could completely disassemble a IIcx, IIci, or Quadra 700 by removing only one screw which secured the hard drive & floppy drive bracket to the chassis bottom. Everything else just clipped into place.

            1. 2

              This is pretty amazing! The CPU core, admittedly without register file, takes less logic than many of my simple cores that perform much more mundane tasks.

              The presented applications, i.e. deeply embedded CPUs are also intriguing. A difference to most other deeply embedded processors is that RISC-V is a pretty generous architecture compared to e.g. an 8051 or a TMS1000, with many and wide registers. I am not sure, though, that the benefits can be upheld when looking at the processor paired with a memory, I/O and program storage.

              Nevertheless, a really cool project.

              1. 2

                Yes; to put a number to it, the first linked introductory video shows claims to implement the core on an Xilinx Artix 7 for 130 LUTs plus 206 Flip-Flops, plus memory (but the register file can be stored in the same block RAM holding code and data memory.)

                1. 2

                  Thanks for pointing out that the code/data memory and the register file can be colocated. This is certainly attractive to keep the whole system resource usage down.

                  I just realized that the external register file, especially if done with block RAM, lends itself to implement hardware supported threads or something similar to hyperthreading. By changing a base pointer into the RAM, the CPU can have multiple register files, with one of them being active at any given time.

                  1. 2

                    Many of the applications for a CPU like this don’t need any state outside of the CPU registers – especially as RISC-V lets you do multiple levels of subroutine call without touching RAM if you manually allocate different registers and a different return address register for each function (which means programming in asm not C). A lot of 8051 / PIC / AVR have been sold without any RAM (or with RAM == memory mapped registers)

              1. 12

                I don’t understand the musl decision to have such small stacks. FreeBSD uses 1 MiB * sizeof(void*) as the default stack size (I think thread stacks are smaller by default, but they’re controllable via the pthread APIs). On 32-bit platforms, a 4 MiB stack for every thread can be a bit too much overhead. On a real 32-bit system (as opposed to a 32-bit userspace on a 64-bit kernel) you typically have only 2 GiB of virtual address space and so a 4 MiB stack per thread would limit you to at most 512 threads (assuming TLS is allocated in the stack and you have no other consumers of memory such as code or the heap), which is a bit constrained. On a modern CPU with a 48-bit address space, you have space for 2^15 8 MiB stacks, which is plenty (if you have anything like that many kernel threads, you have other problems).

                On a system that can allocate physical memory on demand, there’s no reason to not reserve quite a lot of virtual address space for stacks.

                1. 6

                  I think you’ve partially answered your own question: IIRC the original impetus for musl libc was frustration with other “lightweight” libcs at the time, which were used on smaller machines and embedded contexts where glibc was too heavy to be workable. So it was very much targeted at environments where big stacks would, in fact, be a problem.

                  Perhaps you could argue that on other platforms, where this is not an issue, the default stack size should be different. I think this is defensible, but a counterargument is that if something must fail on one platform, making it fail on all platforms is a good way to shake out portability problems.

                  1. 4

                    It seems that some of the design decisions in musl are aimed at minimizing virtual memory requirements, which makes some sense on 32-bit archs but much less on 64-bit archs.

                    1. 1

                      I wonder what happened in the 30 years since 16M of RAM and 1G of drive space appeared limitless (where just six years prior to that computer was my first computer with 16K of RAM and cassette storage) …

                  2. 5

                    On architectures with lots of callee-save registers, stack space requirements also increase. For the 32-bit PowerPC JavaScript JIT in TenFourFox, because IonMonkey likes non-volatile registers, we have to save nearly all of them as part of every generated function prologue. This really bloats stack frames and means a fair bit of recursion can run you out of stack space in a hurry. It ran with a 1GB stack space for a very long time, much more than the default on OS X. Sure, we could use less registers, or use more volatile ones, but then it’s still spilling them to the stack with the added performance issues that entails besides fewer registers in flight.

                    1. 2

                      That just sounds like bad register allocation.

                      If you didn’t have enough callee-save registers then you’d just have to allocate values you want to have preserved over function calls in the stack frame in the first place. So it makes no difference to space used. (If they’re used more than once then it’s faster to have them in registers than on the stack, but that’s a different question)

                    2. 3

                      Did you mean “On a modern CPU with a 48-bit address space, you have space for 2^25 8 MiB stacks”? 2^25 threads * 2^23 bytes = 2^48 bytes.

                      1. 3

                        Fwiw if I wanted to start hundreds of threads in a 32 bit process I’d call pthread_attr_setstacksize() to lower the limit.

                        Also I’d have to ask the obvious questions about why so many native threads in the first place? If they’re all blocking on network then maybe I should use green threads. If they’re all on CPU maybe I should use a work queue. If they’re all blocking on local disk access (which is the one thing I know of where having lots more threads than CPUs helps) then maybe I could use io_uring.

                      1. 3

                        The fact that it works at all is amazing. However, 6502 is a really tough target for compiled languages. Even something as basic as having a standard function calling convention is expensive.

                        1. 3

                          GEOS has a pretty interesting calling convention for some of its functions (e.g. used at https://github.com/mist64/geowrite/blob/main/geoWrite-1.s#L82): Given that there’s normally no concurrency, and little recursive code, arguments can be stored directly in code:

                          jsr function
                          .byte arg1
                          .byte arg2
                          

                          function then picks apart the return address to get at the arguments, then moves it forward before returning to skip over the data. A recursive function (where the same call site might be re-entered before leaving, with different arguments) would have to build a trampoline on a stack or something like that:

                          lda #argcnt
                          jsr trampoline
                          .word function
                          .byte arg1
                          ...
                          .byte argcnt
                          

                          where trampoline creates jsr function, a copy of the arguments + rts on the stack, messes with the returrn address to skip the arguments block, then jumps to that newly created contraption. But I’d rather just avoid recursive functions :-)

                          1. 1

                            Having to need self-modifying code to deal with function calls is reminding me of the PDP-8, which didn’t even have a stack - you had to modify code to put your return address in.

                            1. 1

                              Are those the actual arguments and self-modifying code is used to get non-constant data there? Or are the various .byte values the address to find the argument, in Zero Page?

                              That’s pretty compact at the call site, but a lot of work in the called function to access the arguments. It would be ok for big functions that are expensive anyway, but on 6502 you probably (for code compactness) want to call a function even for something like adding two 32 bit (or 16 bit) integers.

                              e.g. to add a number at address 30-31 into a variable at address 24-25 you’d have at the caller …

                                  jsr add16
                                  .byte 24
                                  .byte 30
                              

                              … and at the called function …

                              add16:
                                  pla
                                  sta ARGP
                                  tax
                                  pla
                                  sta ARGP+1
                                  tay
                                  clc
                                  txa
                                  adc #2
                                  pha
                                  tya
                                  adc #0
                                  pha
                                  ldy #0
                                  lda (ARGP),y
                                  tax
                                  iny
                                  lda (ARGP),y
                                  tay
                              
                              add16_q:
                                  clc
                                  lda $0000,y
                                  adc $00,x
                                  sta $00,x
                                  lda $0001,y
                                  adc $01,x
                                  sta $01,x
                                  rts
                              

                              So the stuff between add16 and add16_q is 26 bytes of code and 52 clock cycles. The stuff in add16_q is 16 bytes of code and 28 clock cycles. The call to add16 is 5 bytes of code and 6 clock cycles.

                              It’s possible to replace everything between add16 and add16_q with a jsr to a subroutine called, perhaps, getArgsXY. That will save a lot of code (because it will be used in many such subroutines) but add even more clock cycles – 12 for the JSR/RTS plus more code to pop/save/load/push the 2nd return address on the stack (26 cycles?).

                              But there’s another way! And this is something I’ve used myself in the past.

                              Keep add16_q and change the calling code to…

                                  ldx #24
                                  ldy #30
                                  jsr add16_q
                              

                              That’s 7 bytes of code instead of 5 (bad), and 10 clock cycles instead of 6 – but you get to entirely skip the 52 clock cycles of code at add16 (maybe 90 cycles if you call a getArgsXY subroutine instead).

                              You may quite often be able to omit the load immediate of X or Y because one or the other might be the same as the previous call, reducing the calling sequence to 5 bytes.

                              If there’s some way to make add16 more efficient I’d be interested to know, but I’m not seeing it.

                              Maybe you could get rid of all the PLA/PHA and use TSX;STX usp;LDX #1;STX usp+1 to duplicate the stack pointer in a 16-bit pointer in Zero Page, grab the return address using LDA instead of PLA, and increment the return address directly on the stack. It’s probably not much better, if at all.

                              1. 1

                                These calling conventions are provided for some functions only, and mostly the expensive ones. From the way it’s implemented for BitmapUp, without looking too closely at the macros, it seems they store the return address at a known address and index through that.

                                GEOS has pretty complex functions and normally uses virtual registers in the zero page, so I guess this is more an optimization for constant calls: no need to have endless lists of lda #value; sta $02; ... in your code - as GEOS then copies it into the virtual registers and just calls the regular function, the only advantage of the format is compactness.

                            2. 2

                              Likewise, I’m very impressed it works. Aside from you correctly pointing out how weak stack operations are on the 6502, however, it doesn’t generate even vaguely idiomatic 6502 assembly. That clear-screen extract was horrible.

                              1. 2

                                The 6502 is best used treating zero page as a lot of registers with the same kind of calling convention as modern RISC (and x86_64) use: some number of registers that are used for passing arguments and return values and for temporary calculations inside a function (and so that leaf functions don’t have to save anything), plus a certain number of registers that are preserved over function calls and you have to save and restore them if you want to use them. The rest of zero page can be used for globals, the same as .sdata referenced from a Global Pointer register on machines such as RISC-V or Itanium.

                                If you do that then the only stack accesses needed are push and pop or a set of registers. If you generate the code appropriately then you only have to know to save N registers on function entry and restore the same N and then return on function exit. You can use a small set of special subroutines for that, saving code size. RISC-V does exactly the same thing with the -msave-restore option to gcc or clang.

                                Of course for larger programs you’ll want to implement your own stack (using two zero page locations as the stack pointer) for the saved registers. 256 bytes should be enough for just the function return addresses.

                                1. 1

                                  But I wonder how much of the zero page you can use without stepping on the locations reserved for ROM routines, particularly on the Apple II. It’s been almost three decades since I’ve done any serious programming on the Apple II, but didn’t its ROM reserve some zero-page locations for allowing redirection of ROM I/O routines? If I were programming for that platform today, I’d still want to use those routines, so that, for example, the Textalker screen reader (used in conjunction with the Echo II card) would work. My guess is that similar considerations would apply on the C64.

                                  1. 1

                                    The monitor doesn’t use a lot. AppleSoft uses a lot more, but that’s ok because it initialises what it needs on entry.

                                    https://pbs.twimg.com/media/E_xJ5oWUYAAUo3a?format=jpg&name=4096x4096

                                    Seems a shame now to have defaced the manual, but in my defence I did it 40 years ago.

                                  2. 1

                                    Now I’ve looked into the implementation I see they’re doing something like this, but using only 4 zero page bytes as caller-saved registers. This is nowhere near enough!

                                    Even 32 bit ARM uses 4 registers, which should probably translate to 8 bytes on 6502 (four pointers or 16 bit integers).

                                    x86_64, which has the same number of registers as arm32, uses six argument registers. RISC-V uses 8 argument registers, plus another 7 “temporary” registers which a called function is free to overwrite. PowerPC uses 8 argument registers.

                                    6502 effectively has 128 16-bit registers (the size of pointers or int). There is no reason why you shouldn’t be at least as generous with argument and temporary registers as the RISC ISAs that have 32 registers.

                                    I’d suggest maybe 16 bytes for caller-save (arguments), 16 bytes for temporaries, 32 bytes for callee-save. That leaves 192 bytes for globals (2 bytes of which will be the software stack pointer).

                                    1. 1

                                      Where are you going to save them? In the 256 BYTE stack the 6502 has? Even if the stack wasn’t limited, you still only have as most 65,536 bytes of memory to work with.

                                      1. 1

                                        Would be cool to see if this stuff were built to expect bank switching hardware.

                                        1. 1

                                          I quote myself:

                                          Of course for larger programs you’ll want to implement your own stack (using two zero page locations as the stack pointer) for the saved registers. 256 bytes should be enough for just the function return addresses.

                                          64k of total memory is of course a fundamental limitation of the 6502, so is irrelevant to what details of code generation and calling convention you use. Other than that you want as compact code as possible, of course.

                                  1. 2

                                    I think it’s the wrong question.

                                    A beginner should be taught a language that is simple enough they can learn essentially all of the language itself (or at least a useful subset) and exactly what the constructs mean as quickly and easily as possible, so that they can then spend their time learning how to program. Ideally you’d like to be able to teach the language itself in an hour.

                                    You also want a language that you can write arbitrarily large and complex programs in without bashing your head against the wall.

                                    Scheme is good. It’s very easy to teach just a handful of constructs which can be combined together in any way and it will always work as expected. It’s easy to describe the semantics using a substitution model.

                                    Some real machine/assembly languages are simple enough to do this.

                                    • 6502 or 6800 – 6800 is simpler but more annoying and is basically dead at this point, while the 6502 community is thriving and you can buy brand new 65C02s for under $10.

                                    • AVR. Tons of real hardware available cheap.

                                    • RISC-V RV32I. Less hardware but some of it is very cheap (e.g. Longan Nano board $5 including a small LCD display). Lots of software support, lots of simulators including GUI ones.

                                    • Maybe a simple flavour of 32 bit ARM. Thumb suffers from arbitrary restrictions and has toooo many instruction formats (as does RISC-V C extension) while original ARMv2 - ARMv4 has very few instruction formats (which is good) but the presence of conditional execution in every instruction and a shifted/rotated operand in most of them complicates matters right from the start. Obviously there are a huge number of boards available.

                                    • MIPS. Popular in the past, but zero reason to use it now there’s RISC-V with the same good features but none of the bad features.

                                    The difficulty with machine/assembly languages is that while it’s easy to understand exactly what any given instruction will do, it’s difficult to understand how to put them together to achieve something useful. The 8 bit ones are worse for this than the 32 bit ones both because it’s hard to deal with 16 bit or 32 bit data and because the shortage of registers (except AVR) forces you to learn about RAM early. Managing register allocation is one of the worst points.

                                    FORTH is easy to explain, but hard to use. Postscript might be better – the model for defining named variables and functions is easier, and you can draw pretty things with it. Logo and Smalltalk fall into this space too.

                                    BASIC meets the “can teach it in an hour” standard but … ugh. You hit the complexity wall in your own program code very quickly.

                                    Unix shell might not be a bad place to start. Arithmetic is annoying, but pipes are a powerful construct and it’s all very useful stuff to know anyway.

                                    Writing custom filters to fit into pipes is a good motivating example to use your assembly language (or other) code for. On Linux you can use binfmt_misc wth qemu or another emulator to transparently run ARM or MIPS or RISC-V or other machine code as if it was native code (just five or so times more slowly, which is often unnoticeable).

                                    1. 3

                                      Back then the HiFive Unleashed was $999 and the Microsemi expansion board was $1999.

                                      Today, the HiFive Unmatched has better CPUs (the dual-issue U74 instead of the single-issue U54), double the RAM (16 GB instead of 8 GB), a PCIe slot (typically used for a video card), two M.2 sockets (typically one for NVMe SSD, one for WIFI if you don’t want to use Ethernet), four USBs. And it comes in a standard Mini-ITX form factor and uses an ATX power supply.

                                      And it’s $665 (for motherboard with CPU and RAM) not $2998 for two linked boards.

                                      https://www.mouser.com/ProductDetail/SiFive/HF105-000?qs=zW32dvEIR3vHEV%2FPYYkdMA%3D%3D

                                      https://www.youtube.com/watch?v=PD3fkHkCAnw

                                      Performance is not blazing – it’s about like a 2nd gen Raspberry Pi 3 CPU-wise but with better I/O and capability because of the ability to use a real video card, SSD, and all the extra RAM.

                                      Boards with the same or higher capability (at least CPU-wise) should be available much cheaper in the coming year. BeagleBoard had a project using the same CPU cores which they expected to sell for $119 with 4 GB RAM and $149 with 8 GB RAM. They distributed 300 beta boards (which work fine – I have one) for free but then some months later cancelled the project. They say they’ll announce a new design after New Year.

                                      There is also the Allwinner Nezha evaluation board for $99. It’s got a single core 1 GHz CPU with 1 GB RAM. Companies such as Sipeed and Pine64 have pre-announced intentions to make boards using the same SoC for $10 to $20. That would make good competition for the Raspberry Pi Zero, which it has similar performance to.

                                      1. 2

                                        Nit: MIPS may be a shadow of its former self, but it’s not gone. PowerPC (at least in terms of Power ISA) is definitely not gone; POWER10 such as it is just came out. However, I do agree that hardware innovation, at least in terms of CPU design, is not progressing anywhere near like it used to. For all the ballyhoo of RISC-V, it has not magically translated into amazing microarchitectures because the people capable of amazing microarchitectures are working for Intel, AMD, Apple and ARM licensees.

                                        1. 1

                                          According to several recent news reports a good number of those people have recently moved to startups making high performance RISC-V CPUs. It will of course take two or three years before the resulting designs arrive.

                                          Speculation this week in the Apple press is that loss of CPU designers is already the cause of the unusually small improvements in the CPUs in the new iPhones.

                                          1. 1

                                            And those reports are?

                                        1. 3

                                          No comments here yet, but I’ll just point out that in fact this has nothing to do with the ISA, but only with the standard privileged architecture, and in particular the ability for an infinite loop of traps if the trap handler base address is not yet properly initialized when an interrupt comes in.

                                          The simple fix, as they suggest themselves, is either initialize the trap handler vector to point to a known-good address (e.g. a simple or dummy handler in on-chip ROM) on power-on, or to detect that the trap handler address is the same as the faulting address and halt.

                                          1. 1

                                            Incredible. The simplest unit test checking that the naive version and the optimised version produce the same output would have caught that.

                                            1. 2

                                              I ran out of time after 30 numbers.

                                              Does it tell you if you get something wrong?

                                              Update: ah, yeah, you lose immediately.

                                              1. 1

                                                I don’t get the problem with cooperative context switches.

                                                Cooperative context switches look like a function call, therefore if your code is written according to the standard ABI none of a0-a7 and t0-t6 need to be saved and the context switch code can use/clobber as many of them as it needs.

                                                If anything, they should be set to 0 or 0xDEADBEEF or something just before returning to the new process.

                                                1. 1

                                                  Yeah, I think my problem with context switches was I did things the wrong way around. I wrote a Rust function to do it, then tried to inline asm inside the function to juggle the registers, but the Rust function was using some of the registers already so things got complicated. Should have just written a repr(C) function that takes a couple register structures or such as arguments and does the switching for you.

                                                  1. 1

                                                    A context switch is fundamentally doing things behind the compiler’s back. It should be written in assembly language.

                                                    Often you can get away with using setjmp() and longjmp(). But those are written in assembly language – just not by you. And to create a new thread you need to fiddle with saved registers inside the jmp_buf, which supposed to be an opaque structure, so that’s undocumented and non-portable. You might as well define your own thread structure and thread switch function anyway.

                                                1. 13

                                                  Nuclear take: I think it’s interesting so many “computer engineering/enthusiast” types (for lack of a better term) tended to gravitate towards DEC systems when their design is full of bonkers mistakes no EE should repeat: PDP-10’s recursive indirect addressing, PDP-11’s segmentation and PC in memory (ok, DSPs do this, but that’s an acceptable optimization for a DSP, not a general-purpose CPU), the absurd CISCiness of VAX, etc. (Alpha was pretty reasonable.) I say this as someone who likes VMS.

                                                  I think 360/370 is much better designed, and the influence in modern CPUs design is more obvious (lots of GPRs, clean instruction formats, pipelining, virtualization, etc.). Plus they had the also influential ACS/Stretch to draw from. I can’t say the same for many DEC designs. It’s amusing Unix types are so obsessed with VAX when Unix would feel far more at home on 370.

                                                  1. 5

                                                    I suspect a variety of factors are to blame:

                                                    IBM in the ’70s and ’80s had the reputation that Microsoft had in the ‘90s and 2000s and Google, Amazon, and Facebook are competing for now: the evil empire monopolist that the rest of the industry stands against. There’s a story around the founding of Sun that they got a visit a few months in from the IBM legal department inviting them to sign a patent cross-licensing agreement and showing six patents that Sun might be infringing. Scott McNealy sat them down and demonstrated prior art for some and that Sun wasn’t infringing any of the ones that might be valid. The IBM lawyers weren’t phased by this and said ‘you might not be infringing these, would you like us to find some that you are?’ Sun signed the cross-licensing agreement. This kind of thing is why IBM’s legal department was referred to as the Nazgul. To add to this, IBM was famously business-facing. They required programmers to wear suits and ties. The hacker ‘uniform’ of jeans and t-shirts was a push-back against companies like IBM in general and IBM in particular and hacker culture in general was part of a counter-culture rebellion where IBM was the archetype of the mainstream against which they were rebelling.

                                                    The DEC machines were so closely linked to the development of UNIX. IBM’s biggest contribution with the 360 was the idea that software written for one computer could run on another. This meant that their customers were able to build up a large amount of legacy software by the ’80s so IBM had no incentive to encourage people to write new systems software for their machines: quite the reverse, they wanted you locked in. DEC encouraged this kind of experimentation. Universities may have had an IBM mainframe for the admin department to use but the computer science departments and research groups bought DEC (and other small-vendor) machines to tinker with.

                                                    Multics was developed for the GE45, which had all manner of interesting features (including a segmentation model that allowed a single-level store and no distinction between shared libraries and processes), Unics was written for the tiny PDP in the corner and it grew with that line.

                                                    There were a lot of other big-iron systems suffered from the rise of UNIX. I’m particularly sad about the Burroughs Large Systems architecture. The B5000 was released at almost the same time as the 360 and had an OS written in a high-level language (Algol-60), with hardware-assisted garbage collection, and provided a fully memory-safe (and mostly type-safe) environment with hardware enforcement. Most modern language VMs (JVM, CLR, and so on) are attempts to emulate something close to the B5000 on a computer that exposes an abstract machine that is basically a virtualised PDP-11. I wish CPU vendors would get the hint: if the first thing people do when they get your CPU is use it to emulate one with a completely different abstract machine, you’ve done something wrong.

                                                    Oh, and before you criticise the VAX for being too CISCy (and, yes, evaluate polynomial probably doesn’t need to be a single instruction), remember that the descendants of the 360 have instructions for converting strings between EBCDIC and unicode.

                                                    1. 2

                                                      I think you exaggerate about the IBM. There is a general 1:1 table based translate which can do EBCDIC to ACII or Unicode, and there are different instructions for converting between the different Unicode flavours. It can’t do it in one instruction, that I know of.

                                                      But anyway, those and VAX POLY aren’t the problem. You can happily use microcode or just trap and emulate and no one will care.

                                                      The problem with the VAX is that the extremely common ADDL3 instruction (to name just one) can vary in length from 4 to 19 bytes and cause half a dozen memory references / cache misses / page faults.

                                                      x86, for all its ugliness, never uses more than one memory address per instruction for common instructions e.g. code generated from C. Same for S/360. Both have string instructions, but those are not a big deal, and relatively uncommon.

                                                    2. 3

                                                      That’s an interesting observation.

                                                      I think there would be a lot to learn from comparing the two engineering cultures. I would specifically include the management style and the kind of money each company was dealing with. When IBM was developing ground-breaking products like the Stretch and the Selectric typewriters, half of the company’s income came from incredibly lucrative military contracts.

                                                      The kinds of pressures on an engineering team and the corner/cost-cutting they may take is dramatically different when they are awash with money.

                                                      1. 3

                                                        To elaborate, I feel the DEC did more influence to product segments than they did engineering. The PDP-8 and then PDP-11 redefined minicomputers, but the PDP-8’s influence was short-lived and the PDP-11’s influence….would have rather not been felt (i.e x86).

                                                    1. 2

                                                      Wow. So many subpages. And yet it’s hard to find what the abbreviation even means.

                                                      1. 3

                                                        https://en.wikipedia.org/wiki/International_Conference_on_Functional_Programming

                                                        The contest does not require use of FP languages, however.

                                                        1. 1

                                                          It looks to me that this task technically doesn’t require you to write a program at all.

                                                          1. 1

                                                            Thank you!

                                                        1. 5

                                                          Ahhh, that takes me back. I used to put together a team to enter this contest back in the early to mid-2000s, using the Dylan programming language. We did ok, claiming 2nd place in 2001 (markup language optimisation), the Judge’s Prize in 2003 (automated driving around a race track in minimum time), and both 2nd place and the Judge’s Prize in 2005 (cops & robbers on a map of Chicago).

                                                          I haven’t looked at it in a decade.

                                                          Looks like a fun task this year.

                                                          It reminds me of the 2003 task in that there is not really any requirement to write a program to generate the poses – or more practically, you might use a human to direct a program, with the program enforcing the constrains and calculating the score, and maybe pertubating the human’s input.

                                                          1. 4

                                                            In addition to those mentioned, the TI MSP430 and Hitachi SuperH are clearly heavily PDP-11 influenced, with hacks (primarily reducing the number of addressing modes, especially on the dst) to extend them from 8 to 16 registers while sticking to a 2-address format in a 16 bit instruction. The SuperH also extends the PDP11 to 32 bits.

                                                            1. 3

                                                              The SuperH felt more like the 68000 to me. But the 68000 was heavily influenced by the PDP-11. So here we are ;-)

                                                              1. 4

                                                                I can’t see that.

                                                                The defining characteristic of the 68k/ColdFire compared to the PDP-11 is that it keeps the 3 bits per operand for register and 3 bits for addressing mode, but some instructions/operands imply a reference to a data register and some to an address register. This is how they double the register set from 8 to 16.

                                                                In the SuperH, every instruction that has an explicit register operand can use all 16 registers for that operand. They get the encoding space for this by confining arithmetic to simple register-to-register (like a RISC), and putting all the indirect, autoincrement etc addressing modes only on the MOV instruction.

                                                                68k and SuperH depart from the PDP11 starting point in quite different directions.

                                                            1. 38

                                                              Well, since I’ve not seen anyone here post it yet, I’ll say it:

                                                              I use the Gmail web application for all my email sending and receiving, both for work and for my personal email, and I really like it.

                                                              I used to self-host email but through a mixture of inexperience configuring things and my domain getting flagged on spam lists due to me running an open link shortener on it, too many of my emails went to spam and I gave up and pay Google to host my email now.

                                                              Anyway, I’m very happy with it. It runs very well, is accessible from any device with no installation, and has some really nice features that aren’t really available anywhere else. Their predictive text feature while composing emails is something I’ve come to really like. Their spam filtering is as perfect as I can imagine it getting. The auto-sorting feature for Primary, Promotions, and Updates is really nice; it works quite well too considering how much variety there is in email. The site is very performant as well (although I do have high end hardware) and I never wait for things to sync or load.

                                                              When I self-hosted e-mail I used an open source web-based mail client called Roundcube (which really is incomparable to Gmail in terms of features and ease-of-use, but certainly did the job). I also used Thunderbird for a while. However I found that for me, there really is no advantage whatsoever over just using a web app, even Roundcube.

                                                              I totally understand that not everyone can use (or wants to use) Gmail for their email, and I’m not even trying to promote it. Maybe the story is different for people that make heavy use of things like git-by-email, are in huge mailing lists for discussion and questions, or stuff like that. However, I really do believe that an e-mail client is one of the use-cases where a web application can really show its advantages.

                                                              1. 3

                                                                Me too. The web client on desktop/laptop, and gmail app on mobile. It’s really great.

                                                                I have my own domain name. I don’t get Google to host it – I have my own mail server for receiving, but for the last 10 or 15 years it simply immediately forwards everything to my gmail account. When I send mail from gmail it sends as my personal domain.

                                                                If gmail ever becomes intolerable for some reason I can instantly change the forwarding to go somewhere else – or even (save me!) run my own full featured email server again.

                                                                1. 2

                                                                  I can’t find any alternative that has anything comparable and as customizable as Gmail’s priority inbox. Spark comes close with their smart inbox categories but it isn’t customizable.

                                                                  I have my unread messages on top, then my drafts, then the rest of my mail.

                                                                  1. 1

                                                                    I use the Gmail web application for all my email sending and receiving, both for work and for my personal email, and I really like it.

                                                                    Same. Email isn’t something I need or actually even want duplicated on any of my devices. The value is in its always-available ubiquity and search-ability, invariant of any forethought or planning on my part. A web app is the best kind of app for this.

                                                                  1. 3

                                                                    This is very cool for those looking to try out and learn a little BCPL for historical reasons, but I don’t think the language has any actual viable use besides its historical notability. Especially as this is touted as the “young persons” guide to BCPL, it’s strange that there’s no mention of why you’d want to learn BCPL, or what it offers for new/young coders that other languages don’t. The only reason given is that its simplicity makes it easy to pick up and learn, which is also true of other scripting languages like Python[1] or Lua (more so, in my opinion). Seems like a very strange choice. Am I missing something?

                                                                    [1]: Obviously Python isn’t simple in its design or implementation, but it’s simple in that it’s easy for anyone to pick up and understand, for the most part.

                                                                    1. 3

                                                                      Seems like a very strange choice. Am I missing something?

                                                                      Martin Richards, the author, created BCPL in the 60s. That might be enough to explain why he believes that “BCPL is particularly easy to learn and is thus a good choice as a first programming language”.

                                                                      Edit: don’t get me wrong, it’s great if he still teaches programming to young people using the language he invented! Many of us have started with some dialect of BASIC, which isn’t exactly better.

                                                                      1. 2

                                                                        Yup, that’s what I missed, lol. Thanks, that makes a lot more sense.

                                                                      2. 2

                                                                        Yeah, the fact that this assumes a tabula rasa makes it a little awkward to read. I’m curious about BCPL due to its influence, but I’m having to skip past a lot of both Linux tutorial and platform-specific setup instructions to get to the meat of this.

                                                                        1. 2

                                                                          BCPL obviously isn’t a scripting language. It’s proto-C. It’s a kind of high level assembly language, even more so than C. For example there are no structs, only arrays with constant offset (for which you can use MANIFEST declarations). It’s designed to not have or need a linker – which means the programmer has to manage global storage themselves. The only thing that is not a very good fit to modern machines is it implicitly assumes word addressing, not byte addressing. But that’s not a huge problem. Like early Pascal it assumes that characters are stored in full machine words for processing and supplies special functions to pack and unpack them for more compact storage.

                                                                        1. 8

                                                                          In experiments several years ago on EC2 I found that LLVM build times using ninja scale linearly out to around 48 to 64 CPUs. But that doesn’t get you anywhere near 90 seconds. From memory, about 4 minutes might have been the fastest I managed.

                                                                          Maybe they’ve improved the build system.

                                                                          Edit: they’re using much more stripped down builds than I was, -O0, only x86 code generation, other stuff.

                                                                          1. 2

                                                                            Are there still 6502 / Z80 / 68K chips being made? I haven’t heard of any modern-day hardware based on them.

                                                                            I do know the venerable 8051 is still popular for very-low-end embedded use cases, but even back in the day it was described as extremely awkward to program, so I’d be surprised if hobbyists used it! (But what do I know, I found the 6502 nearly impossible back in my teens, so I never got into assembly on my Apple II. My friend’s Z80-based system was easier, I thought.)

                                                                            1. 3

                                                                              Yes. 6502 chips are still being made/sold by Western Design Center (https://www.westerndesigncenter.com/). Z80s are still around in the Zilog Z80, Z180 and eZ80 line. Freescale (descendant of Motorola) produces variants of the 68000/68020 as embedded products (or did until recently). There were also dozens second source and derivatives of these processors, and someone might still be selling those; I haven’t checked in a while.

                                                                              There are also specialty shops like Innovasic that recreate/reverse engineer old processors like this in FPGA/ASICs for long term support purposes. They aren’t cheap.

                                                                              I still use 8051s for hobby stuff. They’re weird but there’s, what, 50 years of tooling and experience to work with.

                                                                              1. 4

                                                                                Freescale (descendant of Motorola)

                                                                                Who are now part of NXP :)

                                                                                1. 1

                                                                                  My dad, who worked for Zilog in their heyday circa 1980, would be happy to hear that.

                                                                                  (He later worked for Xilinx, so he’d be happy about all the hobbyists using FPGAs too!)

                                                                                2. 1

                                                                                  z80 variant is still being made iirc as is the 6502 (and 65816). There are of course plenty of FPGA implementations.

                                                                                  1. 1

                                                                                    You can buy 17 MHz (?) 65C02 chips for $8, brand new. They’re still being made. There’s a very interesting new board being made using them running at 8 MHz: https://www.commanderx16.com/

                                                                                    8051 is not bad if most of your variables fit into 8 registers and everything fits into 256 bytes. Talking to more memory than that is pretty annoying

                                                                                    z80 and 6502 each have their own annoying quirks. With cunning you can hand-write some pretty tight stuff on either (in completely different ways) but both are awful for compiling C (or Pascal back then) to.