1. 3

    The fact that it works at all is amazing. However, 6502 is a really tough target for compiled languages. Even something as basic as having a standard function calling convention is expensive.

    1. 2

      Likewise, I’m very impressed it works. Aside from you correctly pointing out how weak stack operations are on the 6502, however, it doesn’t generate even vaguely idiomatic 6502 assembly. That clear-screen extract was horrible.

      1. 2

        The 6502 is best used treating zero page as a lot of registers with the same kind of calling convention as modern RISC (and x86_64) use: some number of registers that are used for passing arguments and return values and for temporary calculations inside a function (and so that leaf functions don’t have to save anything), plus a certain number of registers that are preserved over function calls and you have to save and restore them if you want to use them. The rest of zero page can be used for globals, the same as .sdata referenced from a Global Pointer register on machines such as RISC-V or Itanium.

        If you do that then the only stack accesses needed are push and pop or a set of registers. If you generate the code appropriately then you only have to know to save N registers on function entry and restore the same N and then return on function exit. You can use a small set of special subroutines for that, saving code size. RISC-V does exactly the same thing with the -msave-restore option to gcc or clang.

        Of course for larger programs you’ll want to implement your own stack (using two zero page locations as the stack pointer) for the saved registers. 256 bytes should be enough for just the function return addresses.

        1. 1

          But I wonder how much of the zero page you can use without stepping on the locations reserved for ROM routines, particularly on the Apple II. It’s been almost three decades since I’ve done any serious programming on the Apple II, but didn’t its ROM reserve some zero-page locations for allowing redirection of ROM I/O routines? If I were programming for that platform today, I’d still want to use those routines, so that, for example, the Textalker screen reader (used in conjunction with the Echo II card) would work. My guess is that similar considerations would apply on the C64.

          1. 1

            The monitor doesn’t use a lot. AppleSoft uses a lot more, but that’s ok because it initialises what it needs on entry.

            https://pbs.twimg.com/media/E_xJ5oWUYAAUo3a?format=jpg&name=4096x4096

            Seems a shame now to have defaced the manual, but in my defence I did it 40 years ago.

          2. 1

            Now I’ve looked into the implementation I see they’re doing something like this, but using only 4 zero page bytes as caller-saved registers. This is nowhere near enough!

            Even 32 bit ARM uses 4 registers, which should probably translate to 8 bytes on 6502 (four pointers or 16 bit integers).

            x86_64, which has the same number of registers as arm32, uses six argument registers. RISC-V uses 8 argument registers, plus another 7 “temporary” registers which a called function is free to overwrite. PowerPC uses 8 argument registers.

            6502 effectively has 128 16-bit registers (the size of pointers or int). There is no reason why you shouldn’t be at least as generous with argument and temporary registers as the RISC ISAs that have 32 registers.

            I’d suggest maybe 16 bytes for caller-save (arguments), 16 bytes for temporaries, 32 bytes for callee-save. That leaves 192 bytes for globals (2 bytes of which will be the software stack pointer).

            1. 1

              Where are you going to save them? In the 256 BYTE stack the 6502 has? Even if the stack wasn’t limited, you still only have as most 65,536 bytes of memory to work with.

              1. 1

                Would be cool to see if this stuff were built to expect bank switching hardware.

                1. 1

                  I quote myself:

                  Of course for larger programs you’ll want to implement your own stack (using two zero page locations as the stack pointer) for the saved registers. 256 bytes should be enough for just the function return addresses.

                  64k of total memory is of course a fundamental limitation of the 6502, so is irrelevant to what details of code generation and calling convention you use. Other than that you want as compact code as possible, of course.

            2. 2

              GEOS has a pretty interesting calling convention for some of its functions (e.g. used at https://github.com/mist64/geowrite/blob/main/geoWrite-1.s#L82): Given that there’s normally no concurrency, and little recursive code, arguments can be stored directly in code:

              jsr function
              .byte arg1
              .byte arg2
              

              function then picks apart the return address to get at the arguments, then moves it forward before returning to skip over the data. A recursive function (where the same call site might be re-entered before leaving, with different arguments) would have to build a trampoline on a stack or something like that:

              lda #argcnt
              jsr trampoline
              .word function
              .byte arg1
              ...
              .byte argcnt
              

              where trampoline creates jsr function, a copy of the arguments + rts on the stack, messes with the returrn address to skip the arguments block, then jumps to that newly created contraption. But I’d rather just avoid recursive functions :-)

              1.  

                Having to need self-modifying code to deal with function calls is reminding me of the PDP-8, which didn’t even have a stack - you had to modify code to put your return address in.

                1. 1

                  Are those the actual arguments and self-modifying code is used to get non-constant data there? Or are the various .byte values the address to find the argument, in Zero Page?

                  That’s pretty compact at the call site, but a lot of work in the called function to access the arguments. It would be ok for big functions that are expensive anyway, but on 6502 you probably (for code compactness) want to call a function even for something like adding two 32 bit (or 16 bit) integers.

                  e.g. to add a number at address 30-31 into a variable at address 24-25 you’d have at the caller …

                      jsr add16
                      .byte 24
                      .byte 30
                  

                  … and at the called function …

                  add16:
                      pla
                      sta ARGP
                      tax
                      pla
                      sta ARGP+1
                      tay
                      clc
                      txa
                      adc #2
                      pha
                      tya
                      adc #0
                      pha
                      ldy #0
                      lda (ARGP),y
                      tax
                      iny
                      lda (ARGP),y
                      tay
                  
                  add16_q:
                      clc
                      lda $0000,y
                      adc $00,x
                      sta $00,x
                      lda $0001,y
                      adc $01,x
                      sta $01,x
                      rts
                  

                  So the stuff between add16 and add16_q is 26 bytes of code and 52 clock cycles. The stuff in add16_q is 16 bytes of code and 28 clock cycles. The call to add16 is 5 bytes of code and 6 clock cycles.

                  It’s possible to replace everything between add16 and add16_q with a jsr to a subroutine called, perhaps, getArgsXY. That will save a lot of code (because it will be used in many such subroutines) but add even more clock cycles – 12 for the JSR/RTS plus more code to pop/save/load/push the 2nd return address on the stack (26 cycles?).

                  But there’s another way! And this is something I’ve used myself in the past.

                  Keep add16_q and change the calling code to…

                      ldx #24
                      ldy #30
                      jsr add16_q
                  

                  That’s 7 bytes of code instead of 5 (bad), and 10 clock cycles instead of 6 – but you get to entirely skip the 52 clock cycles of code at add16 (maybe 90 cycles if you call a getArgsXY subroutine instead).

                  You may quite often be able to omit the load immediate of X or Y because one or the other might be the same as the previous call, reducing the calling sequence to 5 bytes.

                  If there’s some way to make add16 more efficient I’d be interested to know, but I’m not seeing it.

                  Maybe you could get rid of all the PLA/PHA and use TSX;STX usp;LDX #1;STX usp+1 to duplicate the stack pointer in a 16-bit pointer in Zero Page, grab the return address using LDA instead of PLA, and increment the return address directly on the stack. It’s probably not much better, if at all.

                  1. 1

                    These calling conventions are provided for some functions only, and mostly the expensive ones. From the way it’s implemented for BitmapUp, without looking too closely at the macros, it seems they store the return address at a known address and index through that.

                    GEOS has pretty complex functions and normally uses virtual registers in the zero page, so I guess this is more an optimization for constant calls: no need to have endless lists of lda #value; sta $02; ... in your code - as GEOS then copies it into the virtual registers and just calls the regular function, the only advantage of the format is compactness.

              1. 4

                I’m hoping I won’t need to make an urgent flight for my dad. Admitted with COVID. He’s immunocompromised.

                1. 2

                  Oh, that’s not good. Here’s hoping everything turns out for the best.

                1. 6

                  There is no downvoting and hasn’t been for years.

                  Flagging things as already posted is for catching reposts like slightly different query params on a URL (for example, every Wordpress post will have a couple viable permalinks even before adding ?something=nonsense), hot take responses to news, or thin rewrites that get called news.

                  Maybe you can suggest a better term for the flag to capture this?

                  1. 7

                    It still nails people’s karma, though. It feels like bad faith is being assumed.

                  1. 2

                    It’s a flag used to signal to the mods that this can be removed as a duplicate.

                    Merging is for different URIs discussing a common hot topic. AFAIK you can’t merge 2 submissions with the exact same URL (which ‘already posted’ targets).

                    1. 1

                      But you can’t even post another submission with the same URL (it gives you an error message).

                      1. 1

                        AIUI only if the older submission with the same URL was done recently.

                    1. 2

                      Nit: MIPS may be a shadow of its former self, but it’s not gone. PowerPC (at least in terms of Power ISA) is definitely not gone; POWER10 such as it is just came out. However, I do agree that hardware innovation, at least in terms of CPU design, is not progressing anywhere near like it used to. For all the ballyhoo of RISC-V, it has not magically translated into amazing microarchitectures because the people capable of amazing microarchitectures are working for Intel, AMD, Apple and ARM licensees.

                      1. 1

                        According to several recent news reports a good number of those people have recently moved to startups making high performance RISC-V CPUs. It will of course take two or three years before the resulting designs arrive.

                        Speculation this week in the Apple press is that loss of CPU designers is already the cause of the unusually small improvements in the CPUs in the new iPhones.

                        1. 1

                          And those reports are?

                      1. 1

                        There was at least one that had a soft-40 column display, if memory serves. There were several on the C64 with soft-80 displays like Pocket Writer (GEOS and its proportional fonts notwithstanding).

                        1. 2

                          Wow, I can’t believe that Shugart intentionally sent defective disks to Wozniak in a ploy to get Apple to buy their expensive controller. From an apple II history link I found:

                          When representatives from Apple returned to Shugart to place orders for more of the SA390 drives to sell under the Apple brand, one of Shugart’s engineers admitted to a deception. The prototype drives that had been provided to Apple had actually come from a pile of bad SA400 drives. They had expected that Apple’s engineers would be unable to make the drives work, and out of frustration would have come back and purchased the more expensive SA400 drives.

                          Either way, I love this series of talks. The sight of the 1541 collecting dust in my closet fills me with determination!

                          1. 2

                            I think it was Bob Russell who once remarked that the 1541 was the best computer Commodore ever made.

                          1. 2

                            seeing the SHOGO screenshot hits me straight in the feels. Its been a long time ago i had the demo running as a kid.

                            1. 1

                              “According to intelligence, eh? Then I’ve got nothing to worry about!” - Sanjuro

                            1. 1

                              Does this get you anything over extracting a rootfs tarball to the directory of your choice or, in the case of Debian, running debootstrap for the target arch and directory?

                              1. 2

                                It’s a squashfs you mount directly (in this case), so it’s smaller than those would be.

                                1. 1

                                  Ah. Nice. I’d forgotten about squashfs compression. Probably only a minor advantage on zfs or btrfs volumes with compression enabled, but useful elsewhere. I suppose having readonly mounting enforced is another plus for reliability.

                              1. 2

                                I’ve run my own mailserver for about 8 years, for personal email and a few businesses that I run (about 12 people use my server daily). I’ve had surprisingly few problems sending mail directly to recipient servers without using a relay. However in the last year, some mail servers started blocking the entire IP ranges of my VPS provider (DigitalOcean). So now instead of sending all email through a relay, I just use a relay for messages destined for those servers. All others still get sent directly.

                                I surveyed the following:

                                • SendGrid - all emails were delivered to junk mail
                                • sendinblue.com, mailjet.com - All emails got an unsubscribe header which makes it look like bulk email
                                • mailgun.com - This one worked well! It is technically not free, but so far I’ve been under the threshold for them to not bother actually charging my credit card
                                1. 1

                                  Wow, great survey. I might look into Mailgun. I don’t mind paying (in fact, I’d rather be paying).

                                1. 1

                                  I realize RCEs are bad regardless of the device’s function, but a baby monitor seems particularly egregious. The root shell in particular …

                                  1. 8

                                    I recently started running my own email server for receiving email, and sending email through Sendgrid. Sendgrid has a free plan that allows for 100 emails per day. Here’s the related Lobsters post in case you are interested: https://lobste.rs/s/s10jr0/running_my_own_email_server

                                    1. 2

                                      How good a sender reputation does Sendgrid have? Have you had any instances where mail bounced or didn’t get delivered?

                                      1. 1

                                        I haven’t had any problems. Sendgrid has a fairly good sender reputation as far as I know. It’s one of the larger email handling companies.

                                        1. 1

                                          In my experience if you’re on free tier, you get an IP with a bad reputation. You have to pay to get a good IP.

                                      2. 2

                                        I’m nervous about things like SendGrid. If I send a mail from my server to yours, then I can read it and so can anyone with access to your server. If I send through sendgrid, they’re able to see the plain text of every email that I send. I find it quite hard to believe that they’d offer this as a free service if they weren’t data mining that.

                                        1. 3

                                          They’re offering a free service because it costs them very little and is useful for getting people to buy their commercial offering. Test for free is an excellent method.

                                          As for security:

                                          • if you’re sending to a mailing list, you weren’t going to encrypt

                                          • if you’re sending private email, you need to encrypt the payload end-to-end

                                          • if you’re sending really secret messages, you want to avoid traffic analysis, too, so email is not for you

                                          The concerning thing about SendGrid et al is that they continue to devalue individual email servers and make it easier for the NSA or agency of your choice to do mass surveillance.

                                          1. 2

                                            Yes, I also consider E-mail “wide open,” so deliverability matters more to me than security, even though both are nice.

                                          2. 1

                                            I just started using SendGrid, and I’m a little wary, but I think they offer the service as a loss leader, more than a source of revenue. They did send a nag email every day for a couple of weeks after I signed up, but that seems to have stopped.

                                            That being said, I’ve always considered email definitely not secure, and open to be read by anyone, so them being able to see the plain-text is not really a concern for me.

                                        1. 4

                                          I’ve been running our own email hosting (personal) for only a decade, but haven’t run into any such deliverability related problems. If one is not sending bulk/unsolicited email, and is good on SPF/DKIM/DMARC practices, and not hosting any kind of malware, or malware operations, then that problem is mostly non-existent. The only time one may have trouble is when one end up being a participant in backscatter, which can be easily avoided by performing anti-spam checks at the time of connection, instead of delaying for later, therefore not bouncing back any email. Unless one is receiving tonnes of emails per minute that this strategy may start becoming a bottleneck. Even LKML subscriptions by couple of our users cause us no problem.

                                          OTOH, I would invest in a backup MX which can hold your email, if one’s primary MX is down for maintenance, but this I guess you’re already aware of.

                                          1. 1

                                            Yup, invested in a small slice for a backup MX years ago. It’s paid off several times. However, I have had deliverability problems with T-Mobile and Microsoft which required some cracking heads with their support staff (they simply had banned entire netblocks), so it certainly happens.

                                            1. 1

                                              I don’t think I sent as many emails to Microsoft, or T-Mobile to notice if my network block is blocked on their side. Or maybe just lucky in terms of hosting I use, which is Hetzner.

                                          1. 1

                                            Juggling the new gig and trying to get the POWER9 Wasm implementation in Firefox over the finish line. (Currently stalled due to a memory corruption problem in GC.)

                                            1. 1

                                              Thanks for it being Mojave compatible!

                                              1. 5

                                                I used my .plan with a cron job for automated news and weather circa 1993. Wish I could find where that script went (awk and csh); I was pretty proud of it.

                                                1. 2

                                                  Fascinating!

                                                  How did fork work in Venix originally on the 8086 without an MMU? Did it just make literal copies of segments using something like memcpy? Did fork have different semantics?

                                                  1. 2

                                                    Pretty sure it literally just copied the process image.

                                                    1. 1

                                                      For it to have “real fork semantics” I guess it would have to but that seems like it would be super slow without paging/COW, especially given the speed of the hardware.

                                                      I guess that’s how it was in the earliest Unicies too, so…authentic I guess.

                                                      1. 2

                                                        Slow is relative, of course. One thing to note is due to segment/process limitations, you’d be moving a few 10s of kilobytes of memory, not MB/GB. Impressive what they did with 64k+64k I+D.

                                                  1. 1

                                                    I have NetBSD on my 164LX, but Tru64 seems “more appropriate.”

                                                    1. 1

                                                      I’d love to have some actual Alpha hardware to play with. Shipping to NL is impossible or way too much cost sadly…

                                                      Does your box look like this one: https://www.vogons.org/viewtopic.php?t=54566 ?

                                                      1. 1

                                                        At the start of the millenium I picked up an AlphaStation that was being decommissioned by a customer and installed OpenBSD on it. It was kinda fun. The machine made a noise not unlike a jet engine when powered on.

                                                        1. 1

                                                          Yes, that’s the board, though I put it in a very sexy Nanoxia black case and custom printed Alpha stickers for it. It’s my “pimp box.” Goes well with the fire-engine red SGI Fuel next to it.

                                                      1. 1

                                                        And just a dash of Scanimate in a few of these.

                                                        The CBS/FOX video clip brings back memories.