Threads for IcePic

    1. 3

      This is a great post, would love to see one where they dig into the assembly listings.

      In C, when you declare a function without preceding it by “static”, it has to be accessible to other compilation units, it must be there and fully usable: passing parameters on stack and all that.

      FWIW, Link Time Optimization (-flto) can greatly help with this by allowing inlining across compilation units.

      I expect 6502-gcc doesn’t have architecture support for LTO yet, but maybe some GCC wizard will implement it…

      1. 3

        I’m surprised anyone bothers with C on the 6502. A 256 byte stack, three 8-bit registers (only one of which can be used for math) and very poor index addressing modes makes for a very poor C experience. And as far as LTO goes, you only have a total of 64K RAM. In most cases, you don’t want to inline code.

        1. 5

          FORTH seems a better choice than C on the 6502, since it’s so damn compact. I used to use FigFORTH on my Apple ][ back in high school once I outgrew BASIC.

        2. 4

          C on low-level platforms also makes sense as a kind of glue language. So what you’ll commonly find is that large portions of a game or demo are in fact written in asm, with a bit of C sprinkled on top for tying the various bits and pieces together. Just saves a lot of work for those boring but necessary things where performance doesn’t really matter. cc65 is especially useful in this respect, since comes with a very good assembler.

          Anyway, great find, thanks for posting!

          1. 1

            I actually like being able to type an int as a 16bit entity and then just have the compiler fix the additions and subtractions for me (sprite X positions spring to mind), even if the code ends up exactly as a simple Add16() macro in asm would end up, I just don’t have to macro it every time. If inlining or linking pure asm is easy then you can well use C for the non-critical stuff, and asm for your interrupts and so on.

        3. 2

          That’s fair. I was also a bit amazed to see that there are not one but multiple C compilers available for 6502! Which is why I’d like to see the assembly listings and see what kind of different “tricks” these compilers are doing (possibly none I guess, but the benchmarks suggest some big differences in their code generation approach and/or possible optimisations).

          as far as LTO goes, you only have a total of 64K RAM. In most cases, you don’t want to inline code.

          If you only have limited program storage then inlining single use functions can help enormously! Reduced stack usage, dead code elimination, constant subexpression elimination, etc.

          As an example, I’ve enabled LTO to shrink a ~20KB ARM Thumb-2 firmware (-Os) down to fit into a part with 16KB of flash storage. (This is mostly due to vendor library abstraction layers that create a lot of nested function calls, but it shows there are potential big size reductions.)

        4. 2

          A lot, if not most? of the code still being written for the 6502 today is in the form of retrogames, which are a… surprisingly active market for software that’s written for fossil hardware :). That makes code reuse worthwhile and many developers and game studios tend to shy away from writing everything in ASM and Forth. There are a bunch of reasonably good libraries (see e.g. neslib), written in pretty well-optimized assembly. In lots of games, C is to these libraries as Lua is to a modern game engine, more or less. Plus ROM space is pretty cheap. You can go a long way writing the boilerplate in C, and the tight loops in ASM.

          (Sauce: running wc on my current NES project says about 21K lines and I’m about 2/3rds of the way through :P. It would’ve probably been way better if I’d done it in ASM but it would’ve taken a lot longer, especially since it’s, you know, a game, where lots of things have been written two or even three times until they were juuuust right. You can’t quite waterfall your way through this.)

    2. 9

      pass in on $int_if proto { udp, tcp } from any to any port domain rdr-to [pihole_ip] port domain

      1. 5

        I wish Linux had PF. iptables feels so clunky by comparison.

        1. 4

          Apparently PF is a bit of a hack, but it’s quite nice to use. iptables is…terrible. When I used a Linux firewall for my home network I used shorewall to not have to write iptables rules by hand. shorewall is/was amazing!

          netfilter is allegedly more or less lock-less and scales better with more cores, while pf is still essentially single threaded due to the overall OpenBSD design. Not sure how FreeBSDs pf is behaving, at this point I think they share name and syntax but the rest is different.

          I’ve tried reading up a bit on nftables recently but I never quite got into it - probably not enough examples available for me. I’m currently going through the VyOS documentation and writing an Ansible playbook to generate the rules needed. I think I like this setup, but have yet to test it even in a VM ^_^

          1. 5

            What do you mean with, “PF is a bit of a hack”? They spent quite a bit of effort on it, and from what I know it’s quite well done. The multicore support isn’t there because it’s just not seen as a priority or something that’s needed. Maybe there’s something I don’t know?

            1. 1

              What do you mean with, “PF is a bit of a hack”?

              I think I got that sentiment from others when I was researching googling ways to improve my home router/firewalls performance, and trying to understand the difference between FreeBSD PF and OpenBSD PF.

              I really love using PF and have been using it on and off since OpenBSD 3.3. But,

              The multicore support isn’t there because it’s just not seen as a priority or something that’s needed

              This still amazes me! Who would need any kind of decent performance with a multicore computer?! Multicore has been the norm for 15 years and lots of slow, but multicore CPUs are out there. Even NetBSD has a great performing, lock-less firewall (npf)!

              I know this current situation is by design, simplifies the kernel design et c. But it’s such a common situation and use case that it amazes me that we still are in this situation. I know that work has been ongoing to make OpenBSD PF more lock-less, tho.

              1. 2

                I think if you check where time is spent on a firewall, then the ethernet drivers, tcp stack, route lookups and all those things might be the points where MP support is better spent than the PF ruleset evaluation, at least for relatively trivial pf.confs.

                To me, people seem to notice “my box runs PF and is not pushing 10GE” and shortcut this into “PF must be slow” as if there wasn’t tons of other parts involved in figuring out if an incoming tagged vlan packet going into a software ethernet bridge and out on a gre tunnel interface should be allowed or not. There is a lot of other layers in there to consider for MP improvements and driver unlocking before you can be sure the thing that slows you down is PF.

                Or, inversely, when/if PF gets declared completely unlocked and MP-safe, the perf situation might just not change as much as the above-mentioned people expect if you don’t improve the rest also.

                1. 1

                  Agreed! There are probably lots of things that could be improved/made MP-safe in OpenBSD (I’ve read that vlan handling could be improved, for example).

                  My experience, using the same hardware with both OpenBSD & Linux, is that OpenBSD simply performs worse than Linux. I’m not even reaching gigabit between vlans, which I’ve done before. This is partly why I’m exploring VyOS configured with ansible.

              2. 1

                What do you mean with, “PF is a bit of a hack”?

                I think I got that sentiment from others when I was researching googling ways to improve my home router/firewalls performance, and trying to understand the difference between FreeBSD PF and OpenBSD PF.

                Dunno; I never heard it described like a “hack” before. I know there’s some people with strong negative opinions on OpenBSD, partly justified, partly … less so. The design of most non-trivial systems is an exercise making trade-offs, OpenBSD (and by extension, pf) makes some fairly non-standard trade-offs in some regards which benefits some cases and disadvantages others. I feel that a lot of the dislike towards OpenBSD is because people don’t appreciate this and come at it expecting the same set of trade-offs that e.g. Linux made. You’re not going to have a good time hammering in a nail with a power drill either, even though a power drill is obviously much more advanced and useful in many ways than a power drill.

                One of the ways I’ve seen Theo describe OpenBSD in recent years is as an “idea incubator”. It makes a lot of sense to keep OpenBSD/pf simple and “elegant” if your goal – or one of your goals anyway – is to provide a platform for new ideas. This obviously comes with downsides too; you can’t have your cake and eat it too.

                All of the above is pretty much the reason I don’t use OpenBSD myself, by the way. Not because I don’t like it – I think it’s a great system – but just because OpenBSD’s goals don’t align with some things I expect.

        2. 1

          How does nftables compare?

          1. 1

            Not had a ton of experience using it, unfortunately. Looks nicer, though.

          2. 1

            nftables is a lot nicer to use, much more readable than iptables. Haven’t used PF/others yet, but definitely a big step over iptables.

          3. 1

            I use nftables, and find the syntax a lot nicer and easier to write and maintain than iptables.

      2. 2

        A decent solution, at least until these things start using dns-over-https (DoH).

    3. 2

      Docker has become like a “universal server executable” by now. It does not matter if other container-systems can do the same, if you want 100 developers at your company to quickly have the same development environment, tell them to pull the versioned build docker from the local docker registry server, and most of them should be able do this without any major issues. Want to quickly set up a service somewhere? They are likely to support Docker too. FreeBSD jails are unique to FreeBSD.

      Also, most modern Linux distros now place binaries in just /usr/bin, then add symlinks from /bin,/sbin and /usr/sbin. Why FreeBSD thinks it’s a good idea to place binaries in these six directories: /bin, /sbin, /usr/bin, /usr/sbin, /usr/local/bin and /usr/local/sbin, is beyond me. What does it solve that a modern Linux distro can not?

      I get the separation between “system packages” and “user installed packages that are also available to other users”, but this should IMO be handled by a package manager, not by letting users place stray files outside of /home. I assume that’s why FreeBSD has an emphasis of being able to reset everything to the “base system” because /usr/local may easily become a mess?

      1. 4

        Docker has become like a “universal server executable” by now.

        I must admit it - and many other people from FreeBSD ‘world’ admit that too - that Docker tooling (not the technology behind) was the thing that Docker got so much traction. IMHO its pity that Jails management was not put into similar way (may little more thought out). But from what I heard Docker containers run only on Linux so that does not make them portable. CentOS to Ubuntu migration does not count :p

        Also, most modern Linux distros now place binaries in just /usr/bin.

        That is separation between Base System binaries (at /{bin,sbin} and /usr/{bin,sbin} dirs) and Third Party Packages mainained by pkg(8) located at /usr/local/{bin/sbin} dir.

        We all know differences between bin (user) and sbin (admin) binaries but in FreeBSD there is also more UFS related division. When there was only UFS in FreeBSD world then /bin and /sbin were available at boot before /usr was mounted - this the historical (and useful in UFS setups) distinguish between /{bin,sbin} and /usr/{bin,sbin} dirs. In ZFS setups it does not matter as all files are on ZFS pool anyway.

        I get the separation between “system packages” and “user installed packages that are also available to other users” (…)

        Users on FreeBSD are not allowed to install packages, only root can do that.

        I assume that’s why FreeBSD has an emphasis of being able to reset everything to the “base system” because /usr/local may easily become a mess?

        Its not about ‘mess’ in /usr/local as FreeBSD Ports Maintainers keep the same logic and order in the /usr/ports as in the Base System. Its about another layer of ‘security’ if you fuck up something. If you broke RPM database in the Linux distribution you need to reinstall (or need really heavy repair time). On FreeBSD when you (for some readon) mess up the packages you just reset the packages to ‘zero’ state and Base System is untouched.

        Hope that helps.

        1. 1

          Thanks for the thoughtful answers! I have considered using FreeBSD on a server or two, since it seems like a very solid and well put together system, while at the same time keeping things minimal and fast.

          I feel like a good package system in combination with ZFS (or another filesystem with snapshots) makes many of the old ideas and solutions obsolete. Why keep anything in /usr/local if a package system can handle (possibly manually built) packages better? For instance, on Arch Linux, creating a custom package is so easy and fast that the threshold is low enough to never having to install anything directly in /usr/local. And once a user did that, the threshold is equally low to upload it to AUR, so that new users can just use that one.

          Similarly, I’m not sure that the additional layer of ‘security’ of having a Base System covers anything that a good package system (like pacman) in combination with filesystem snapshots can cover. Do you have an example?

          About Docker, AFAIK it can also be used on Windows and macOS, not only Linux. There is some heavy black box lifting going on within those applications, though. :)

          1. 3

            One of the things you want to avoid is if some package gets the great idea to install a new compiler and name it “cc” and override the system compiler, or add libs/includes in such a way that makes it super hard to get back into a working system. If some random bsd port adds to /usr/local/lib, you are not prevented from running programs, because system programs link to stuff in /usr/lib, and not “any random lib everywhere I find .so files in”. A well meaning package system will of course tell you that this new “cc” seems to collide with the system compiler packaged cc if that did exist before, but if it didn’t, then the package system might work against you trying to get back. BSD wants you to be able to use base things without hiccups (which is lovely when things act up), then allow packages to add things on top of that, without risking the base set of files, be it manpages, libs, includes, fsck tools or whatever.

            1. 1

              I could not stress this better. Thanks.

            2. 1

              These seem like theoretical problems to me. Two packages can’t both install cc in /usr/bin on Arch Linux, and that’s the only directory that is in $PATH by default (since the other bin directories are symlinks).

              Isn’t it better to just install everything on the system with a package manager, so that nothing collides and everything is in its place? Then, if the combined results of a package upgrade should ever go wrong, one can just undo it by using filesystem snapshots.

              1. 2

                Indeed, if the package manager is in control of all installed software then it’s not a problem. It’s only relatively recently that this has happened with FreeBSD (Package Base) and the work is still ongoing - before this the base system was distributed purely in tarball format, with the package manager only managing /usr/local.

              2. 1

                It’s quite common on FreeBSD to have both a base-system version with long-term support guarantees in /usr/bin and a newer version from ports in /usr/local/bin. It’s up to you which order you put the two in you PATH and it’s also up to you whether you install aliases that favour one over the other in specific cases.

                1. 1

                  Why would you want that? If you want to use an experimental version, install that and test that it looks good. Or, replace it with the stable version again if your tests didn’t pass.

                  Or, for example, Firefox and Open Office are available as both stable and fresher versions on Arch Linux, and you can install both at the same time.

                  If you need some special piece of software to have an old and stable environment, you can use a jail, docker or a VM for that.

    4. 2

      C: having tons of ways to interpret (). As a ( *function ), as container for parameters to a function(p1,p2), as a cast (int)long or as the simple mathematic precedence to calculate 2+(5*3) in the correct order. Somehow the compilers gets it right even though I might not always.

    5. 7

      I’m surprised that musl is bigger than glibc. Isn’t size and simplicity like the whole point of musl?

      1. 4

        Yup, I was surprised at it too!

        I’ll try to get the musl image to build statically, as that’s what musl was thought for, and in containers there’s no point in dynamically linking. So that might reduce size.

        But yeah, weird.

        Taking a glance with dive I see the musl-based binary takes 14MB while the glibc image has a 3.5MB HAProxy binary. The /lib folder takes 7.9MB in glibc and 5.3MB in musl.

        Weird. I’ll look into in over the weekend. Thanks!

        1. 6
          $ ls -lh /lib/ /lib/musl/lib/ /lib/libc.a /lib/musl/lib/libc.a
          -rwxr-xr-x 1 root root 2.1M Apr 17 21:11 /lib/*
          -rw-r--r-- 1 root root 5.2M Apr 17 21:11 /lib/libc.a
          -rw-r--r-- 1 root root 2.5M Apr 16 13:49 /lib/musl/lib/libc.a
          -rwxr-xr-x 1 root root 595K Apr 16 13:49 /lib/musl/lib/*

          Sounds like an issue with compile flags or whatnot.

          1. 1

            The author doesn’t need the .a, do they? I thought the .a were the static libraries. I don’t think I’ve seen them since I did a Linux From Scratch build over 10 years ago.

            1. 2

              I’m not sure how cc -static works to be honest, I just included it to demonstrate that musl is smaller on my system both as a dynamic library and a static one.

              1. 1

                cc -static should look into the .a file (which is an archive of .o files) and pick out only the parts actually needed to build the static binary and then you don’t need the .a file anymore.

      2. 1

        I’ve been looking into if during my free time. I’ve managed to get rid of all other shared objects but libc, whose removal would cause a segfault. Adding CFLAGS="-static" and LDFLAGS="-static" to the make step doesn’t help.

        It does not reduce binary size, though, right now it’s down to 18.6MB the image and 17.2M the binary (with the other objects statically linked, of course).

        See the changes in this branch.

    6. 3

      sendmsg(2), sendto(2), recvfrom(2) and recvmsg(2) are run without KERNEL_LOCK.

      Does anybody have an idea how this affects the performance of those system calls?

      1. 2

        On the general topic of performance, OpenBSD is usually known to be not as fast as other BSDs or Linux? Can someone who knows more about this say what the reasons are? Is it just programming that isn’t focused on micro optimisation or is there a necessary tradeoff between security and speed?

        1. 3

          It’s not always a tradeoff, but doing it right takes huge effort. Anyone who ran fbsd 5.x with the 3? different thread models and emulating linux threads for mysql perf knows there is lots more to it than just flipping a bit and let the cores race around at their whim. Obsd seems to try ‘not screw up’ with available manpower and that means being behind. Might also mean less silly bugs slip through.

        2. 3

          Not a contributor but I am lurking in the community for quite a while now and AFAIK the largest reason why OpenBSD is as “performant” as it is, is that they don’t aim for performance. So the trade off is not as much about security as it is about man power and interest.

          What I can say is that it was never performing bad for any workload I aimed for, besides web browsing - which is getting better and better.

          But I also value those lovely man pages, sane defaults and the overall stable and thus boring interfaces that they keep on providing and improving without alienating long time users more than “performance” whatever the metric is by which you want to measure it.

      2. 1

        According to my best knowledge KERNEL_LOCK is the so called big kernel lock. By having these syscalls not use that lock performance is supposed to increase.

    7. 11

      Note that SMT doesn’t necessarily have a posive effect on performance; it highly depends on the workload. In all likelyhood it will actually slow down most workloads if you have a CPU with more than two cores.

      In case you’re wondering, this refers to OpenBSD’s giant-locked kernel. Some parts of this kernel are now unlocked (e.g. network stack) but for some workloads 2 CPUs can be faster than 3 or more due to lock contention.

      1. 1

        Per my understanding, every “physical” CPU can have many cores, and each core can have multiple hardware thread if SMT is supported. So every “hardware thread” is a “logical” CPU. For OpenBSD kernel, does it do special operations according to physical CPU, core and hardware thread? Or just consider “logic” CPU? Thanks!

        1. 2

          As far as I know the SMT threads were simply exposed as additional CPUs to the scheduler.

          1. 1

            @stsp Thanks for your response!

            If I understand correctly, disable SMT means cut half the “logical” CPU, right? For example, if the server has one CPU, 2 cores, and every core has 2 hardware threads, in theory, the server has 4 “logical” CPUs. Assume my workload has 4 thread, and every thread is independent and computing-intensive (mostly user-space computation, not involved kernel part, such as syscall, or accessing network, etc.). Currently the workload can occupy the whole 4 “logical” CPUs. But now, if the count of “logical” CPU is halved, and my workload’s 4 thread need to contend for 2 “logical” CPUs. So in this scenario, the workload’s performance should be downgraded.

            Is it correct? Thanks in advance!

            1. 3

              At least when HT was new, it also meant the caches would be halved unless you disabled HT in bios. So if your threads are doing different things they might suffer from it.

            2. 1

              As far as I understand, it doesn’t mean that all 4 threads can progress in parallel, it will depend on which unit in the CPU each thread is utilizing.

    8. 4

      I don’t like how it claims the filesystem “lies” about its encoding. It’s like you think “this italian file server will never unzip an archive that came from Korea”. The rules (at least on unices) tend to be everything except NUL and slashes, and even if line-feed, ESC or BEL might be hard to generate, they are still valid parts of potential filenames and pretending they are not will make you program worse. A filesystem that says “nopes” to writing a file not compliant with the current locale setting would be poor indeed.

      1. 1

        Arguably if anyone were lying it would be sys.getfilesystemencoding, which promises to tell you what the filesystem’s encoding is, even when the filesystem makes no claims to even having an encoding. But arguably it’s not lying either, per the documentation:

        Return the name of the encoding used to convert Unicode filenames into system file names, or None if the system default encoding is used. The result value depends on the operating system:

        Note that it only talks about “Unicode filenames into system file names” (aside: don’t miss the “filename” vs “file name” here) but says nothing about the going the other way. It can’t.

        Knowing too little Python to be sure where the mistake is, I do not trust the article on the correctness or the necessity of the solution it outlines. I have seen developers in other languages be gravely mistaken about their language’s string model, and this article feels iffy in a way that is reminiscent of such misconceptions to me; however, I’ve just as well seen languages screw up their string model, so if I knew more Python I might well be nodding along.

    9. 4

      Why can’t ads go back to just being images?

      It’s not like their tracking algos are providing much insight. I guess that’s one of those things you can’t say out loud, though.

      1. 1

        Yeah, as someone else wrote, why would I want to get targeted ads for refrigerators one week after I bought one…