1. 9

    Oh hey, I’m the author of the post! Happy to chat about any aspects of it. This project is a spiritual successor to an earlier project reverse engineering a gaming DRM system, so if you enjoy this post you might enjoy that older one too.

    1. 2

      I love this kinds of blog posts. Thank you for the link to the previous project

    1. 1

      I’ve got a Digilent Nexys board, which has a Xilinx FPGA - I’ve had some lovely times playing around with “pure” FPGA but also played around with MicroBlaze, which is a soft-kernel with the goal of running linux on it.

      I also got a few MikroE boards - a PIC v7 and a AVR v7 - for when you need to play around with the smaller 8 or 16 bitters.

      On top of that I have a stack of TI boards

      • LAUNCHPAD for MSP430s, and a bunch of different MSP430s
      • LAUNCHPADXL for the C2000 (not a fan of those, but that’s a different story)
      • Bluetooth development boards for Stellaris

      In the same drawer, there’s also an old KEIL development board with a ARM 7 processor, and a few mbed boards too, also ARM.

      I should probably get around to throw/give away the boards I don’t use anymore.

      1. 19

        I enjoy the ESP32 boards quite a bit. They are very cheap, they come with WiFi and BT built in, and you can write either C or Arduino code for them.

        There are the older esp8266 boards but they lack some cool power management stuff the newer ESP32 boards have.

        1. 3

          They also come with a pretty nice SDK - it’s usually the bane of the smaller boards, that the SDKs are terrible or you end up having to write a whole lotta a drivers yourself, which isn’t what I find fun ;-)

          1. 1

            I agree, the ESP32-IDF SDK is surprisingly pleasant. But there are some other embedded SDKs that seem nice too, like Zephyr and mbedOS, that support a lot of popular ARM-based boards. (Though I haven’t actually used them.)

          2. 3

            I have a bunch of the ESP8266s flashed with Tasmota, on breadboards with a Dallas temperature sensor for monitoring stuff around the house.

            One of them has a relay which controls my garage door in parallel with the garage door switch at the wall. I have a few that are embedded in “smart outlets” bought from Amazon that I flashed Tasmota over-the-air as to avoid closed source firmware.

            All of these are connected to my Hubitat for automation and pushing temps to InfluxDB. Here is a snapshot of one of my Grafana dashboards: https://snapshot.raintank.io/dashboard/snapshot/jzZznlmjEWaatSNw1AbN2znAPcC4N8wE

            1. 2

              Hubitat rules so hard.

              1. 1

                I bought the C5 shortly after it was released and they have done a great job improving it over time, the web UI is lot more responsive than it used to be and there has been a steady flow of sane features while still keeping it rather simple/focused. I like it a lot!

                1. 2

                  My only real issue is that geofencing with the mobile app doesn’t work consistently. Everything else is extremely reliable.

                  1. 2

                    Agreed, forgot about geofencing… My current workaround is the OwnTracks phone app with OwnTracks Presence app for Hubitat: https://github.com/bdwilson/hubitat/tree/master/OwnTracks-Presence - the downside is battery consumption on my phone is higher. I also use an app for presence based on WiFi since my phone has a static IP on my LAN.

            2. 2

              For the embedded category I really like the Nordic nRF52 chips. Well supported in Rust, nice bunch of peripherals onboard, debugging with OpenOCD+gdb works well, official docs are nice. Supports BLE, 802.15.4, ESB for radio. No Wi-Fi but that only makes everything lighter.

              1. 1

                That’s what I’ve started with, and they are fun for sure. I went with the Adafruit Huzzah32 boards, for a nice combination of features and low-enough price. (Might use something else if making a bunch of something.)

                Next up, using uLisp on the one I’ve got on order.

                1. 1

                  Alternative: WFI32

                1. 8

                  In the Arduino world, everything is done in C++, a language which is almost never used on 8-bit microcontrollers outside of this setting because it adds significant complexity to the toolchain and overhead to the compiled code.

                  I don’t buy this. C++ is C with extra features available on the principle that you only pay for what you use. (The exception [sic] being exceptions, which you pay for unless you disable them, which a lot of projects do.)

                  The main feature is classes, and those are pretty damn useful; they’re about the only C++ feature Arduino exposes. There is zero overhead to using classes unless you start also using virtual methods.

                  The C++ library classes will most definitely bloat your code — templates are known for that — but again, you don’t have to use any of them.

                  (Aside: can someone explain why anyone’s still using 8-bit MCUs? There are so many dirt cheap and low-power 32-bit SoCs now, what advantage do the old 8-but ones still have?)

                  1. 9

                    (Aside: can someone explain why anyone’s still using 8-bit MCUs? There are so many dirt cheap and low-power 32-bit SoCs now, what advantage do the old 8-but ones still have?)

                    They’re significantly cheaper and easier to design with (and thus less pretentious in terms for layout, power supply parameters, fabrication and so on). All of these are extremely significant factors for consumer products, where margins are extremely small and fabrication batches are large.

                    Edit: as for C++, I’m with the post’s author here – I’ve seen it used on 8-bit MCUs maybe two or three times in the last 15 years, and I could never understand why it was used. If you’re going to use C++ without any of the ++ features except for classes, and even then you still have to be careful not to do whatever you shouldn’t do with classes in C++ this year, you might as well use C.

                    1. 3
                      • RAII is a huge help in ensuring cleanup of resources, like freeing memory.
                      • Utilities like unique_ptr help prevent memory errors.
                      • References (&) aren’t a cure-all for null-pointer bugs, but they do help.
                      • The organizational and naming benefits of classes, parameter overloading and default parameters are significant IMO. stream->close() vs having to remember IOWriteStreamClose(stream, true, kDefaultIOWriteStreamCloseMode).
                      • As @david_chisnall says, templates can be used (carefully!) to produce super optimized type-safe abstractions, and to move some work to compile-time.
                      • Something I only recently learned is that for (x : collection) even works with C arrays, saving you from having to figure out the size of the array in more-or-less fragile ways.
                      • Forward references to functions work inside class declarations.

                      I could probably keep coming up with benefits for another hour if I tried. Any time I’m forced to write in C it’s like being given those blunt scissors they use in kindergarten.

                      1. 2

                        The memory safety/RAII arguments are excellent generic arguments but there are extremely few scenarios in which embedded firmware running on an 8-bit MCU would be allocating memory in the first place, let alone freeing it! At this level RAII is usually done by allocating everything statically and releasing resources by catching fire, and not because of performance reasons (edit: to be clear, I’ve worked on several projects where no code that malloc-ed memory would pass the linter, let alone get to a code review – where it definitely wouldn’t have passed). Consequently, you also rarely have to figure out the size of an array in “more-or-less fragile ways”, and it’s pretty hard to pass null pointers, too.

                        The organisational and naming benefits of classes & co. are definitely a good non-generic argument and I’ve definitely seen a lot of embedded code that could benefit from that. However, they also hinge primarily on programmer discipline. Someone who ends up with IOWriteStreamClose(stream, true, kDefaultIOWriteStreamCloseMode) rather than stream_close(stream) is unlikely to end up with stream->close(), either. Also, code that generic is pretty uncommon per se. The kind of code that runs in 8-16 KB of ROM and 1-2 KB of RAM is rarely so general-purpose as to need an abstraction like an IOWriteStream.

                        1. 2

                          I agree that you don’t often allocate memory in a low-end MCU, but RAII is about resources, not just memory. For example, I wrote some C++ code for controlling an LED strip from a Cortex M0 and used RAII to send the start and stop messages, so by construction there was no way for me to send a start message and not send an end message in the same scope.

                          1. 1

                            That’s one of the neater things that C++ allows for and I liked it a lot back in my C++ fanboy days (and it’s one of the reasons why I didn’t get why C++ wasn’t more popular for these things 15+ years ago, too). I realise this is more in “personal preferences” land so I hope this doesn’t come across as obtuse (I’ve redrafted this comment 3 times to make sure it doesn’t but you never know…)

                            In my experience, and speaking many years after C++-11 happened and I’m no longer as enthusiastic about it, using language features to manage hardware contexts is awesome right up until it’s not. For example, enforcing things like timing constraints in your destructors, so that they do the right thing when they’re automatically called at the end of the current scope no matter what happens inside the scope, is pretty hairy (e.g. some ADC needs to get the “sleep” command at least 50 uS after the last command, unless that command was a one-shot conversion because it ignores commands while it converts, in which case you have to wait for a successful conversion, or a conversion timeout (in which case you have to clear the conversion flag manually) before sending a new command). This is just one example but there are many other pitfalls (communication over bus multiplexers, finalisation that has to be coordinated across several hardware peripherals etc.)

                            As soon as you meet hardware that wasn’t designed so that it’s easy to code against in this particular fashion, there’s often a bigger chance that you’ll screw up code that’s supposed to implicitly do the right thing in case you forget to “release” resources correctly than that you’ll forget to release the resources in the first place. Your destructors end up being 10% releasing resources and 90% examining internal state to figure out how to release them – even though you already “know” everything about that in the scope at the end of which the destructor is implicitly called. It’s bug-prone code that’s difficult to review and test, which is supposed to protect you against things that are quite easily caught both at review and during testing.

                            Also, even when it’s well-intentioned, “implicit behaviour” (as in code that does more things than the statements in the scope you’re examining tell you it does) of any kind is really unpleasant to deal with. It’s hard to review and compare against data sheets/application notes/reference manuals, logic analyser outputs and so on.

                            FWIW, I don’t think this is a language failure as in “C++ sucks”. I’ve long come to my senses and I think it does but I don’t know of any language that easily gets these things right. General-purpose programming languages are built to coordinate instruction execution on a CPU, I don’t know of any language that allows you to say “call the code in this destructor 50us after the scope is destroyed”.

                    2. 7

                      While you can of course can put a 32 bit SoC on everything, in many cares 8 bitters are simpler to integrate into the hardware designs. A very practical point, is that many 8 bitters are still available in DIP which leads to easier assembly of smaller runs.

                      1. 5

                        Aside: can someone explain why anyone’s still using 8-bit MCUs? There are so many dirt cheap and low-power 32-bit SoCs now, what advantage do the old 8-but ones still have?

                        They’re dirt cheaper and lower power. 30 cents each isn’t an unreasonable price.

                        1. 3

                          You can get Cortex M0 MCUs for about a dollar, so the price difference isn’t huge. Depending on how many units you’re going to produce, it might be insignificant.

                          It’s probably a question of what you’re used to, but at least for me working with a 32 bit device is a lot easier and quicker. Those development hours saved pay for the fancier MCUs, at least until the number of produced units gets large. Fortunately most of our products are in the thousands of units…

                          1. 9

                            a 3x increase in price is huge if you’re buying lots of them for some product you’re making.

                            1. 4

                              Sure, but how many people buying in bulk are using an Arduino (the original point of comparison)?

                              1. 2

                                I mean, the example they gave was prototyping for a product..

                            2. 6

                              If you’re making a million devices (imagine a phone charger sold at every gas station, corner store, and pharmacy in the civilized world), that $700k could’ve bought a lot of engineer hours, and the extra power consumption adds up with that many devices too.

                            3. 2

                              The license fee for a Cortex M0 is 1¢ per device. The area is about the size of a pad on a cheap process, so the cost both of licensing and fabrication is pretty much as close to the minimum cost of producing any IC.

                              1. 1

                                The license fee for a Cortex M0 is 1¢ per device.

                                This (ARM licensing cost) is an interesting datapoint I have been trying to get for a while. What’s your source?

                                1. 2

                                  A quick look at the Arm web site tells me I’m out of data. This was from Arm’s press release at the launch of the Cortex M0.

                                  1. 1

                                    Damn. Figures.

                              2. 1

                                Could you name a couple of “good” 8-bit MCUs? I realized it’s been a while since I looked at them, and it would be interesting to compare my preferred choices to what the 8-bit world has to offer.

                              3. 2

                                you only pay for what you use

                                Unfortunately many arduino libraries do use these features - often at significant cost.

                                1. 2

                                  I’ve not used Arduino, but I’ve played with C++ for embedded development on a Cortex M0 board with 16 KiB of RAM and had no problem producing binaries that used less than half of this. If you’re writing C++ for an embedded system, the biggest benefits are being able to use templates that provide type-safe abstractions but are all inlined at compile time and end up giving tiny amounts of code. Even outside of the embedded space, we use C++ templates extensively in snmalloc, yet in spite of being highly generic code and using multiple classes to provide the malloc implementation, the fast path compiles down to around 15 x86 instructions.

                                1. 8

                                  I woke up first thing Christmas morning to find my software on this list. Best Christmas present ever.

                                  1. 2

                                    I just wanna say thank you for making yori - it was and is one of the first things I install on Windows servers still, when I’m administrating them.

                                  1. 1

                                    Going to play some more Talos Principle and Half Life Alyx, and tidy my apartment some more, to be ready to do some small fixes from when I moved in.

                                    1. 2

                                      Frogs are people

                                    1. 5

                                      I’ve used almost all the tools on this list when I was developing for Windows, and I know there’s a few Windows developers around here, so I figured it would be useful to share.

                                      1. 1

                                        Oh, that’s a good one for the toolbox. I had a rather big refactoring for the product I’m working on and this could definitely had helped. Next time :-D

                                        1. 2

                                          I’m going to go hiking alone in new places and cooking some great food :-)

                                          Then I can curl up on the couch with a good consciousness!

                                          1. 5

                                            One point: ARM instructions tend to fixed-width instructions (like UTF-32), vs x86 instructions tend to vary in size (like UTF-8). I always loved that.

                                            I’m intrigued by the Apple Silicon chip, but I can’t give you any one reason it should perform as well as it does, except maybe smaller process size / higher transistor count. I am also curious how well the Rosetta 2 can JIT x86 to native instructions.

                                            1. 10

                                              “Thumb-2 is a variable-length ISA. x86 is a bonkers-length ISA.” :)

                                              1. 1

                                                The x86 is relatively mild compared to the VAX architecture. The x86 is capped at 15 bytes per instruction, while the VAX has several instructions that exceed that (and there’s one that, in theory, could use all of memory).

                                                1. 2

                                                  If you really want to split your brain, look up the EPIC architecture on the 64-bit Itaniums. These were an implementation of VLIW (Very Long Instruction Word). In VLIW, you can just pass a huge instruction that tells what individual functional unit should do (essentially moving scheduling to the compiler). I think EPIC batched these in groups of three .. been I while since I read up on it.

                                                  1. 6

                                                    interestingly by one definition of RISC, this kind of thing makes itanium a RISC machine: The compiler is expect to work out dependencies, functional units to use, etc which was one of the foundational concepts of risc in the beginning. At some point RISC came to mean just “fewer instructions”, “fixed length instructions”, and “no operations directly with memory”.

                                                    Honestly at this point I believe it is the latter that most universally distinguishes CISC and RISC at this point.

                                                    1. 3

                                                      Raymond Chen also wrote a series about the Itanium.

                                                      https://devblogs.microsoft.com/oldnewthing/20150727-00/?p=90821

                                                      It explains a bit of the architecture behind it.

                                                    2. 1

                                                      My (limited) understanding is that it’s not the instruction size as much as the fact that x86(-64) has piles of prefixes, weird special cases and outright ambiguous encodings. A more hardwarily inclined friend of mine once described the instruction decoding process to me as “you can never tell where an instruction boundary actually is, so just read a byte, try to figure out if you have a valid instruction, and if you don’t then read another byte and repeat”. Dunno if VAX is that pathological or not, but I’d expect most things that are actually designed rather than accreted to be better.

                                                      1. 1

                                                        The VAX is “read byte, decode, read more if you have to”, but then, most architectures which don’t have fixed sized instructions are like that. The VAX is actually quite nice—each opcode is 1 byte, each operand is 1 to 6 bytes in size, up to 6 operands (most instructions take two operands). Every instruction supports all addressing modes (with the exception of destinations not accepting immediate mode for obvious reasons). The one instruction that can potentially take “all of memory” is the CASE instruction, which, yes, implements a jump table.

                                                  2. 6

                                                    fixed-width instructions (like UTF-32)

                                                    Off-topic tangent from a former i18n engineer, which in no way disagrees with your comment: UTF-32 is indeed a fixed-width encoding of Unicode code points but sadly, that leads some people to believe that it is a fixed-width encoding of characters which it isn’t: a single character can be represented by a variable-length sequence of code points.

                                                    1. 10

                                                      V̸̝̕ȅ̵̮r̷̨͆y̴̕ t̸̑ru̶̗͑ẹ̵̊.

                                                    2. 6

                                                      I can’t give you any one reason it should perform as well as it does, except maybe smaller process size / higher transistor count.

                                                      One big thing: Apple packs an incredible amount of L1D/L1I and L2 cache into their ARM CPUs. Modern x86 CPUs also have beefy caches, but Apple takes it to the next level. For comparison: the current Ryzen family has 32KB L1I and L1D caches for each core; Apple’s M1 has 192KB of L1I and 128KB of L1D. Each Ryzen core also gets 512KB of L2; Apple’s M1 has 12MB of L2 shared across the 4 “performance” cores and another 4MB shared across the 4 “efficiency” cores.

                                                      1. 7

                                                        How can Apple afford these massive caches while other vendors can’t?

                                                        1. 3

                                                          I’m not an expert but here are some thoughts on what might be going on. In short, the 4 KB minimum page size on x86 puts an upper limit on the number of cache rows you can have.

                                                          The calculation at the end is not right and I’d like to know exactly why. I’m pretty sure the A12 chip has 4-way associativity. Maybe the cache lookups are always aligned to 32 bits which is something I didn’t take into account.

                                                        2. 3

                                                          For comparison: the current Ryzen family has 32KB L1I and L1D caches for each core; Apple’s M1 has 192KB of L1I and 128KB of L1D. Each Ryzen core also gets 512KB of L2; Apple’s M1 has 12MB of L2 shared across the 4 “performance” cores

                                                          This is somewhat incomplete. The 512KiB L2 on Ryzen is per core. Ryzen CPUs also have L3 cache that is shared by cores. E.g. the Ryzen 3700X has 16MiB L3 cache per core complex (32 MiB in total) and the 3900X has 64MiB in total (also 16MiB per core complex).

                                                          1. 2

                                                            How does the speed of L1 on the M1 compare to the speed of L1 on the Ryzen? Are they on par?

                                                        1. 1

                                                          Have our yearly get together for my old group of friends from high school, so we’ll go out for dinner and then to mine to get a few beers afterwards.

                                                          I also need to put up the blinders I bought, and tomorrow my uni friends are dropping by for ranting and boardgames.

                                                          1. 3

                                                            Puppet or Chef are great for more complex systems - managing whole companies from servers to switches and everything in between. But for your usecase, I agree that it sounds like either an external managed Container Plattform (EKS, GCP, DO, …) or 1-n virtual machine(s) on your favourite provider (number depends on expected traffic and uptime contraints; want zero downtime deployments?)

                                                            For a small number of different machines, a simple ansible playbook (with or without docker or podman) seems like the right solution

                                                            (source/context: I do SysOps with bare metal as well as cloud providers for a living)

                                                            1. 2

                                                              Yes, this seems like the way to go forward, I thankfully only have a single dependency, so I don’t think I need docker.

                                                            1. 2

                                                              If you can package the app into a (docker or other) container I think this makes testing and deployment on Jenkins & elsewhere a much simpler issue - you just need to install Docker and run the containers.

                                                              Once you’ve a working container it should just be a case of installing docker on a server and running the app as a systemd unit. For this bit the tools you mentioned are all suitable - I’d also like to plug my own pyinfra if you know/prefer Python, Ansible is also pretty quick to get started with.

                                                              Hope this helps a bit! Happy to answer any further questions :)

                                                              1. 1

                                                                Yes, I figured it would be nice to put it all into a Docker container.

                                                                How do I then get the systemd unit file into the system - that’s sort of the missing piece for me right now.

                                                                1. 3

                                                                  Some people suggested using Ansible. You can manage containers on your host with the docker_container module (https://docs.ansible.com/ansible/latest/modules/docker_container_module.html). You don’t need to use systemd to manage your containers as the docker daemon is essentially a systemd for containers.

                                                                  1. 2

                                                                    Getting systemd to start/stop Docker containers is a bit of a pain. I think the easiest way to go is to just start your container with --restart unless-stopped (or docker update --restart unless-stopped $containername if you’ve already created a container).

                                                                1. 8

                                                                  traveling to Denmark to spend the week in a remote house near the sea.

                                                                  pandemic prevented us from taking any kind of vacation. so, really looking forward to it.

                                                                  1. 2

                                                                    I did that about a month ago and it was great. Enjoy!

                                                                    1. 2

                                                                      this sounds wonderful!

                                                                      1. 1

                                                                        Which part of Denmark? My wife is danish, so I know well the zone where she come from (Central Jylland)

                                                                        1. 1

                                                                          Rømø :)

                                                                          1. 1

                                                                            I went there. It’s a very nice place. Let’s hope you can find good weather. Have fun

                                                                        2. 1

                                                                          If you don’t mind sharing, how did you find the place? Are those listed on the airbnb or?

                                                                          1. 2

                                                                            You can find rentals on https://www.dansommer.dk/

                                                                            It seems to combine all the different rental services in their search.

                                                                        1. 1

                                                                          I’ve interpreted it as “projecting”. A bit like /u/formerly_a_trickster but without any lights involved ;-)

                                                                          1. 2

                                                                            Emptying even more boxes, then seeing two different groups of friends, one for socials and the other for boardgames.

                                                                            1. 3

                                                                              Painting even more! And going to a double 30 year birthday

                                                                              1. 3

                                                                                Been road trippin, then boardgames in a bit and then it’s time for watching formula 1 and editing photos tomorrow :-)

                                                                                1. 2

                                                                                  Going to paint a bit tomorrow and visiting a friend on Sunday

                                                                                  1. 4

                                                                                    Playing loads of computer and board games.