Threads for yossarian

  1. 4

    This parse ambiguity is similar to the ambiguity exploited by the recently-discussed rubygems CVE-2022-29176, though the article reports discovering the Python issue independently. The problem in both cases is looking up a package by the concatenated string {name}-{version} instead of by the name and the version separately.

    1. 2

      Yeah, these kinds of parsing ambiguities can be extremely pernicious! As best I can tell no modern Python packaging tools will be confused by the “vexing parse” here, but it’s possible that older tooling could be confused into installing package-someversion==suffix instead of package==someversion-suffix.

    1. 2

      This is an awesome series. I hadn’t considered interpreting LLVM bitcode before but I’m aware sulong does just that so now you’ve inspired me to give it a shot sometime too.

      1. 2

        Thanks for the kind words! I wasn’t actually aware of sulong1; it’ll probably be an excellent reference going forwards :-)

      1. 23

        This isn’t a C problem, it’s an operating system problem.

        In fact, I’d even argue that, on systems that use this sort of allocation scheme, they’re not in keeping with the C standard, which states that a successful return from malloc (or related functions) yields a pointer that can be used to access objects or arrays of any type up to the specified size. Crashing on a valid return is not a specified error condition.

        1. 8

          This is a pedantic response, but the pedantic answer is important for why this behavior isn’t incompatible with the C standard: malloc doesn’t cause a program to crash when an overcommit is realized. It causes the kernel to abort the program. The latter is perfectly acceptable in the C/C++ memory model.

          Put another way: from C’s perspective, there’s no real sense in which the malloc incorrectly fails. It succeeds, and the system decides not to play along.

        1. 1

          Really cool!!!

          My aversion to databases in personal projects is largely unfounded: turning the DOCUMERICA dataset into a small SQLite DB gave me exactly the kind of query and state control that I needed. No more hacky state files, unlike my other Twitter bots.

          I found exactly the same recently! Databases are cool and I regret not touching them for ten years :)

          A small nitpick:

          created TEXT,                 -- the ISO8301 date for the photo's creation
          

          It’s ISO 8601 :)

          1. 1

            Thanks for the kind words, and for catching that typo! I’ve pushed the fix.

          1. 5

            I’m working on a third post in my LLVM internals series.

            1. 6

              This is my opinion as just another Rust programmer in the community, but: encouraging novices to opt into runtime lifetime management and interior mutability sets them up for failure when they eventually have to interact with the majority of the Rust ecosystem. They’re also a performance and abstraction hazard: a novice who leans too heavily on interior mutability will end up designing interfaces that don’t match impedance with the rest of the crate ecosystem.

              These techniques are vital to Rust’s ability to express complex memory semantics, but they’re also a barrier to fully understanding the language when you’re a novice. I suspect that the advice in this post would have been a significant impediment to my personal understanding of the language when I was just starting out.

              1. 1

                I’m continuing to work an a (partial) reimplementation of LLVM’s IR APIs in Rust: https://github.com/woodruffw/mollusc

                I’ve been blogging about the different interesting bits of writing a bitcode parser from the ground up here: https://blog.yossarian.net/series#llvm-internals

                1. 5

                  This is a well-written post, but I think I disagree with the author w.r.t. this actually being a problem: effects within closures and iterators are subtle things, and I prefer Rust’s approach of requiring users to explicitly ferry them into the surrounding context.

                  It also tends to be only a minor hiccup in practice: collect can be used to flatten a sequence of Result<T>s into Result<Container<T>>, which is the intuitive and intended behavior in almost all circumstances.

                  1. 1

                    Oh cool, I didn’t know about this. Can it do the same with an iterator of Future<T>s into a Future<Container<T>>?

                    Actually thinking about it, I guess not, because there’s a choice to be made about whether to do them all in parallel or in sequence.

                    1. 1

                      I know this thread is long dead, but you actually can do this with join_all.

                      Or you can use FuturesUnordered to pull out results as they complete. Unfortunately it’s a little weird to use: iterating returns a Future to await, which in turn returns Some(_) or None to signal end of iteration. An Iterator needs to return Option, not a Future<Option>.

                      let mut results: FuturesUnordered<_> = ...
                      while let Some(result) = results.next().await {
                          ...
                      }
                      
                  1. 8

                    I think there are valid arguments on both sides here, but this post doesn’t seem to be grounded in experience.

                    Practically speaking, users of weird architectures do contribute patches back. Those people eventually become the maintainers. When those people go away, the project drops support for certain architectures. That happened with CPython, e.g. it doesn’t support Mac OS 9 anymore as far as I remember.

                    It’s sort of a self-fulfilling prophesy – if the code is in C, you will get people who try to compile it for unusual platforms. If it’s in Rust, they won’t be able to try.

                    I’m not saying which one is better, just that this post misses the point. If you want to use Rust and close off certain options, that’s fine. Those options might not be important to the project. Someone else can start a different project with the goal of portability to more architectures.

                    Changing languages in the middle of the project is a slightly different case. But that’s why the right to fork exists.

                    1. 25

                      Author here: this post is grounded in a couple of years of experience as a packager, and a couple more years doing compiler engineering (mostly C and C++).

                      Practically speaking, users of weird architectures do contribute patches back. Those people eventually become the maintainers. When those people go away, the project drops support for certain architectures. That happened with CPython, e.g. it doesn’t support Mac OS 9 anymore as far as I remember.

                      This is the “hobbyist” group mentioned in the post. They do a fantastic job getting complex projects working for their purposes, and their work is critically undervalued. But the assumptions that stem from that work are also dangerous and unfounded: that C has any sort of “compiled is correct” contract, and that you can move larger, critical work to novel architectures just by patching bugs as they pop up.

                      1. 6

                        OK I think I see your point now. TBH the post was a little hard to read.

                        Yes, the people contributing back patches often have a “it works on my machine” attitude. And if it starts “working for others”, the expectation of support can arise.

                        And those low quality patches could have security problems and tarnish the reputation of the project.

                        So I would say that there are some projects where having the “weird architectures” off to the side is a good thing, and some where it could be a bad thing. That is valid but I didn’t really get it from the post.


                        I also take issue with the “no such thing as cross platform C”. I would say it’s very hard to write cross platform C, but it definitely exists. sqlite and Lua are pretty good examples from what I can see.

                        After hacking on CPython, I was surprised at how much it diverged from that. There are a lot of #ifdefs in CPython making it largely unportable C.

                        In the ideal world you would have portable C in most files and unportable C in other files. Patches for random architectures should be limited to the latter.

                        In other words, separate computation from I/O. The computation is very portable; I/O tends to be very unportable. Again, sqlite and Lua are good examples – they are parameterized by I/O (and even memory allocators). They don’t hard-code dependencies, so they’re more portable. They use dependency inversion.

                        1. 10

                          TBH the post was a little hard to read.

                          That’s very fair; I’m not particularly happy with how it came out :-)

                          I also take issue with the “no such thing as cross platform C”. I would say it’s very hard to write cross platform C, but it definitely exists. sqlite and Lua are pretty good examples from what I can see.

                          I’ve heard this argument before, and I think it’s true in one important sense: C has a whole bunch of mechanisms for making it easy to get your code compiling on different platforms. OTOH, to your observation about computation being generally portable: I think this is less true than C programmers generally take for granted. A lot of C is implicitly dependent on memory models that happen to be shared by the overwhelming majority of today’s commercial CPUs; a lot of primitive operations in C are under-specified in the interest of embedded domains.

                          Maybe it’s possible to truly cross-platform C, but it’s my current suspicion that there’s no to verify that for any given program (even shining examples of portability like sqlite). But I admit that that’s moving the goalposts a bit :-)

                          1. 12

                            Maybe it’s possible to truly cross-platform C, but it’s my current suspicion that there’s no to verify that for any given program (even shining examples of portability like sqlite).

                            I think the argument holds up just fine despite the existence of counterexamples like Sqlite and Lua; basically it means that every attempt to write portable and safe code in C can be interpreted as an assertion that the author (and every future contributor) is as capable and experienced as Dr. Hipp!

                            1. 6

                              A lot of C is implicitly dependent on memory models that happen to be shared by the overwhelming majority of today’s commercial CPUs

                              That’s largely a result of the CPU vendors optimising for C, due to its popularity. Which leads to its popularity. Which…

                              1. 2

                                A lot of C is implicitly dependent on memory models that happen to be shared by the overwhelming majority of today’s commercial CPUs; a lot of primitive operations in C are under-specified in the interest of embedded domains.

                                As the author of a C library, I can confirm that fully portable C is possible (I target the intersection of C99 and C++). It wasn’t always easy, but I managed to root out all undefined and unspecified behaviour. All that is left is one instance of implementation defined behaviour: right shift of negative integers. Which I have decided is not a problem, because I don’t know a single platform in current use that doesn’t propagate the sign bit in this case.

                                The flip side is that I don’t do any I/O, which prevents me from directly accessing the system’s RNG.

                                Incidentally, I’m a firm believer in the separation of computation and I/O. In practice, I/O makes a relatively small portion of programs. Clearly separating it from the rest turns the majority of the program into “pure computation”, which (i) can be portable, and (ii) is much easier to test than I/O.

                              2. 5

                                I also take issue with the “no such thing as cross platform C”. I would say it’s very hard to write cross platform C, but it definitely exists. sqlite and Lua are pretty good examples from what I can see.

                                I see this as parallel to “no such thing as memory-safe C”. Sure, cross-platform C exists in theory, but it’s vanishingly rare in practice, and I’d wager even the examples you cite are likely to have niche platform incompatibilities that haven’t been discovered yet.

                                1. 1

                                  I’d wager even the examples you cite are likely to have niche platform incompatibilities that haven’t been discovered yet.

                                  Portability in C is hard, but it is simple: no undefined behaviour, no unspecified behaviour, no implementation defined behaviour. If you do that, and there are still are platform incompatibilities, then the platform’s compiler is at fault: it has a bug, fails to implement part of the standard, or simply conforms to the wrong standard (say, C89 where the code was C99).

                                  If we’re confident a given project is free of undefined, unspecified, and implementation defined behaviour, then we can be confident we’ll never discover further niche platform incompatibilities. (Still, achieving such confidence is much harder than it has any right to be.)

                                  1. 3

                                    Portability in C is hard, but it is simple: no undefined behaviour, no unspecified behaviour, no implementation defined behaviour.

                                    That is a very tall order, though. Probably impossibly tall for many (most?) people. I asked how to do this and the answers I would say were mixed at best. Simple isn’t good enough if it’s so hard nobody can actually do it.

                            2. 3

                              If it’s in Rust, they won’t be able to try.

                              I think this is the most trenchant point here. If someone wants to maintain a project for their own “weird” architecture, then they need to maintain the toolchain and the project. I’ve been in that position and it sucks. In fact, it’s worse, because they need to maintain the toolchain before they even get to the project.

                              I’m particularly sensitive to this because I’m typing this on ppc64le. We’re lucky that IBM did a lot of the work for us, but corporate interests shift. There’s no Rust compiler for half the systems in this room.

                              1. 2

                                I’m not familiar with these systems. What are they used for? What kind of devices use them? What industries/sectors/etc.?

                                1. 3

                                  Ppc is very common in aerospace and automotive industries. Of course there are also power servers running Linux and Aix, but those are comparatively a niche compared to the embedded market.

                                  1. 6

                                    Got it. Sounds like definitely something that would not be hobbyists working on side projects using mass-market hardware. I think the article was referring to this–these corporate users should be willing to pay up to get their platforms supported.

                                    1. 3

                                      So does that mean we should only care about architectures that have corporate backing? Like I say, this isn’t a situation where it’s only a project port that needs maintainers. The OP puts it well that without a toolchain, they can’t even start on it. If Rust is going to replace C, then it should fill the same niche, not the same niche for systems “we like.”

                                      For the record, my projects are all officially side projects; my day job has nothing officially to do with computing.

                                      1. 8

                                        So does that mean we should only care about architectures that have corporate backing?

                                        Yes, it does. Money talks. Open source is not sustainable without money. I can work on a pet project on the side on evenings and weekends only for a relatively short period of time. After that it’s going to go unmaintained until the next person comes along to pick it up. This is going to happen until someone gets a day job working on the project.

                                        If Rust is going to replace C, then it should fill the same niche, not the same niche for systems “we like.”

                                        C has a four-decade head start on Rust, if no one is allowed to use Rust until it’s caught up to those four decades of porting and standardization effort–for the sake of people’s side projects–then that argument is a non-starter.

                                        1. 3

                                          Yes, it does. Money talks.

                                          In such a world there would be no room for hobbyists, unless they work with what other people are using. Breakage of their interests would be meaningless and unimportant. That’s a non-starter too.

                                          But, as you say, you’re unfamiliar with these systems, so as far as you’re concerned they shouldn’t matter, right?

                                          1. 9

                                            In that (this) world, there is room for hobbyists only insofar as they support their own hobbies and don’t demand open source maintainers to keep providing free support for them.

                                      2. 2

                                        OpenWrt runs on TP-Link TL-WDR4900 WiFi Router. This is a PowerPC system. OpenWrt is nearly a definition of hobbyists working on side projects using mass-market hardware.

                                        1. 2

                                          It says on that page that this device was discontinued in 2015. Incidentally, same year Rust reached 1.0.

                                          1. 2

                                            I am not sure what you are trying to argue. The same page shows it to be in OpenWrt 19.07, which is the very latest release of OpenWrt.

                              1. 5

                                One point: ARM instructions tend to fixed-width instructions (like UTF-32), vs x86 instructions tend to vary in size (like UTF-8). I always loved that.

                                I’m intrigued by the Apple Silicon chip, but I can’t give you any one reason it should perform as well as it does, except maybe smaller process size / higher transistor count. I am also curious how well the Rosetta 2 can JIT x86 to native instructions.

                                1. 10

                                  “Thumb-2 is a variable-length ISA. x86 is a bonkers-length ISA.” :)

                                  1. 1

                                    The x86 is relatively mild compared to the VAX architecture. The x86 is capped at 15 bytes per instruction, while the VAX has several instructions that exceed that (and there’s one that, in theory, could use all of memory).

                                    1. 2

                                      If you really want to split your brain, look up the EPIC architecture on the 64-bit Itaniums. These were an implementation of VLIW (Very Long Instruction Word). In VLIW, you can just pass a huge instruction that tells what individual functional unit should do (essentially moving scheduling to the compiler). I think EPIC batched these in groups of three .. been I while since I read up on it.

                                      1. 6

                                        interestingly by one definition of RISC, this kind of thing makes itanium a RISC machine: The compiler is expect to work out dependencies, functional units to use, etc which was one of the foundational concepts of risc in the beginning. At some point RISC came to mean just “fewer instructions”, “fixed length instructions”, and “no operations directly with memory”.

                                        Honestly at this point I believe it is the latter that most universally distinguishes CISC and RISC at this point.

                                        1. 3

                                          Raymond Chen also wrote a series about the Itanium.

                                          https://devblogs.microsoft.com/oldnewthing/20150727-00/?p=90821

                                          It explains a bit of the architecture behind it.

                                        2. 1

                                          My (limited) understanding is that it’s not the instruction size as much as the fact that x86(-64) has piles of prefixes, weird special cases and outright ambiguous encodings. A more hardwarily inclined friend of mine once described the instruction decoding process to me as “you can never tell where an instruction boundary actually is, so just read a byte, try to figure out if you have a valid instruction, and if you don’t then read another byte and repeat”. Dunno if VAX is that pathological or not, but I’d expect most things that are actually designed rather than accreted to be better.

                                          1. 1

                                            The VAX is “read byte, decode, read more if you have to”, but then, most architectures which don’t have fixed sized instructions are like that. The VAX is actually quite nice—each opcode is 1 byte, each operand is 1 to 6 bytes in size, up to 6 operands (most instructions take two operands). Every instruction supports all addressing modes (with the exception of destinations not accepting immediate mode for obvious reasons). The one instruction that can potentially take “all of memory” is the CASE instruction, which, yes, implements a jump table.

                                      2. 6

                                        fixed-width instructions (like UTF-32)

                                        Off-topic tangent from a former i18n engineer, which in no way disagrees with your comment: UTF-32 is indeed a fixed-width encoding of Unicode code points but sadly, that leads some people to believe that it is a fixed-width encoding of characters which it isn’t: a single character can be represented by a variable-length sequence of code points.

                                        1. 10

                                          V̸̝̕ȅ̵̮r̷̨͆y̴̕ t̸̑ru̶̗͑ẹ̵̊.

                                        2. 6

                                          I can’t give you any one reason it should perform as well as it does, except maybe smaller process size / higher transistor count.

                                          One big thing: Apple packs an incredible amount of L1D/L1I and L2 cache into their ARM CPUs. Modern x86 CPUs also have beefy caches, but Apple takes it to the next level. For comparison: the current Ryzen family has 32KB L1I and L1D caches for each core; Apple’s M1 has 192KB of L1I and 128KB of L1D. Each Ryzen core also gets 512KB of L2; Apple’s M1 has 12MB of L2 shared across the 4 “performance” cores and another 4MB shared across the 4 “efficiency” cores.

                                          1. 7

                                            How can Apple afford these massive caches while other vendors can’t?

                                            1. 3

                                              I’m not an expert but here are some thoughts on what might be going on. In short, the 4 KB minimum page size on x86 puts an upper limit on the number of cache rows you can have.

                                              The calculation at the end is not right and I’d like to know exactly why. I’m pretty sure the A12 chip has 4-way associativity. Maybe the cache lookups are always aligned to 32 bits which is something I didn’t take into account.

                                            2. 3

                                              For comparison: the current Ryzen family has 32KB L1I and L1D caches for each core; Apple’s M1 has 192KB of L1I and 128KB of L1D. Each Ryzen core also gets 512KB of L2; Apple’s M1 has 12MB of L2 shared across the 4 “performance” cores

                                              This is somewhat incomplete. The 512KiB L2 on Ryzen is per core. Ryzen CPUs also have L3 cache that is shared by cores. E.g. the Ryzen 3700X has 16MiB L3 cache per core complex (32 MiB in total) and the 3900X has 64MiB in total (also 16MiB per core complex).

                                              1. 2

                                                How does the speed of L1 on the M1 compare to the speed of L1 on the Ryzen? Are they on par?

                                            1. 1

                                              One small correction: _N is not a prefix.

                                              1. 1

                                                Indeed it isn’t. Thanks, fixing now.

                                              1. 1

                                                Like every ISA, x86 (and AMD64) have multiple ways to encode the semantics of a particular (higher level, conceptual) operation.

                                                While true, “like every ISA”, irked me, as CISC architectures (and particularly x86 and amd64) do have excessive ways to achieve the same thing, whereas sane (RISC) architectures do minimize this as an effect of only adding ISA complexity with strong justification.

                                                1. 9

                                                  This is true even on RISC, though, as long as there are general-purpose registers; the compiler can build a side channel by permuting the used GPRs on every procedure prologue and epilogue, with a minimum of one bit per procedure when there are two GPRs. (Specifically, for n GPRs used in a procedure, we should be able to encode (n-1)! bits in the side channel.) The difference is in the necessary conditions for a disassembler to be able to find/visualize the side channel.

                                                  That said, it would be interesting to consider The Mill, which will have a belt of GPRs instead of slots, as not having this flexibility. Instead, The Mill would require a procedure to be entirely rescheduled in order to change the numbering of GPRs.

                                                  1. 2

                                                    Absolutely! The x86 family is particularly guilty of this, and its strange operand encoding is why the technique discussed in the post works.

                                                    The “like every ISA” is there for completeness, and because it’s important to the compiler-based technique that the post mentions but doesn’t use.

                                                  1. 2

                                                    This is really neat. Thanks for putting this together. I might be using this in the near future. In your opinion, are there any pain points around documentation or other limiting factors?

                                                    1. 2

                                                      Thanks for the kind words!

                                                      The pyO3 docs are excellent, overall1. If I had to pick something to complain about: #[pyproto] is slightly underdocumented, causing the hiccup I ran into with the “in” (__contains__) protocol. But that’s minor compared to how painless the experience was as a whole.