Threads for lcapaldo

  1. 2

    I don’t understand the motivation for the ScopedTypeVariable example, can you just not elide the type annotation for ys altogether in this case?

    1. 1

      You could, but there’s some more advanced cases where a type signature will be required (often involving higher rank types).

      There’s also cases where having the function signature visible helps with code readability a lot.

      1. 1

        I mean it’s not that I don’t believe there are good uses for it, it’s just that this example doesn’t make any of them obvious to me.

    1. 2

      If everything is still an fd I’m not sure what the difference between actually implementing this approach and acting as if this approach was implemented. Nothing is stopping you from using pread/pwrite on files (obviously) and just never calling lseek. An error from read on a file is not much different than an error from lseek on a pipe or socket, or named pipe, not to mention platform specific things like timerfds and what not.

      Also unless you also remove dup altogether you just shift the problem to when you duplicate the post-streamified fd. Even if lseek is gone reads on the two fds will interfere with the current position in the same way.

      I could see this working if fds and sds (“stream descriptors”) were different types but I think the existence of fifos means open can’t return just fds (non-streamified descriptors).

      1. 2

        You can avoid calling lseek yourself but if you dup a descriptor and hand it off to a library or another process you can’t control whether or not it calls lseek on its descriptor. I guess if it decides to do that you’d still be fine as long as you only used pread/pwrite and never did anything that read the file position.

        I’m not entirely clear on the author’s proposal but it sounds like the idea is that if you dupped a “streaming view” of a file then the duplicated copy would have its own independent file position? Or maybe dup on a “streaming view” works the same way that things do now (with a shared file position) but if that bothered you then you could choose to not call dup on the streaming view. Instead you’d create a brand new streaming view from the same underlying file. Then each streaming view would have its own position and you could hand one to other code without worrying about the effects on your own streaming view.

        Of course none of this solves the issue of what do to if you have a real stream (not a streaming view of a file) like a pipe. If you dup it then a read from any copy will consume the data and that data won’t be available on other copies of the descriptor. Maybe this is simply defined as the expected behavior and thus as OK?

        Named pipes (FIFOs) would complicate things. But this article seems like it proposing an alternative OS design that is not POSIX but is instead “POSIX-like”. In this alternative world we could say that named pipes are not supported. Or that they have to be opened with a special open_named_pipe syscall. Or that the file descriptor returned by calling open on a named pipe is a special stub descriptor. Attempting to call pread/pwrite on the stub will fail. The only way to do anything useful with the stub would to be to create a streaming view from it and then call read/write on that streaming view. This is admittedly kind of ugly but that’s the price for maintaining the ability to open named pipes with open.

        There are probably other complications. How do you handle writes to files open for O_APPEND? Does pwrite write to the file at the requested offset or does it ignore that offset and write at the end? If it does write at the requested offset, how can you atomically append some data to the file? You can’t ask for the current size and then write at that offset because the file size might change between the first call and the second.

        What do you do about select and poll and friends? Do these take streaming views instead of file descriptors now?

        Overall I don’t hate the idea. If we were going to put this in object-oriented terms then the current system would have pread and pwrite methods on the file descriptor interface. But some classes that implement that interface (like pipes, sockets, etc.) don’t support those operations so they just fail at runtime if you try to call those methods. Usually this is a sign that you’ve got your interfaces designed poorly. The most obvious fix for this type of thing would be to split things up into two (or more) interfaces and have each class implement only the methods that make sense for that particular class, and maybe create some adapter classes to help out. That seems to be what’s being proposed here, with the role of the adapter class being played by the “streaming view”. The most significant difference that I can see is that constructing new wrapper objects would normally be considered fairly cheap but constructing the streaming view would require an extra syscall which could be expensive enough that people would want to avoid it.

        I wonder if it would be possible to emulate the streaming view in userspace in some place like the C library. That would get the kernel entirely out of the business of maintaining file offsets and leave them up to the process to track. The C library would be allowed to call read and write on objects like pipes and sockets but for real files it would only be allowed to call pread and pwrite. If the user code creates a streaming view of a file and tries to call read on it then the C library would have to translate that request to an equivalent pread call and then update the file position stored in the streaming view. Doing this for any POSIX environment would probably be somewhere between difficult and impossible but maybe one can imagine an OS design where it could be made to work.

        1. 1

          My point isn’t that “this isn’t necessary because discipline”, it’s “the amount that this helps doesn’t reduce the discipline required in any significant way.” Everything is still read(Object, … ), pread(Object, …), ioctl(Object, …) etc. Removing lseek doesn’t stop two processes or threads from interfering with each other with read and its implicit seeks on a pipe, socket or streamed file.

        1. 2

          It’s a C function and often C++ avoids adding extra qualifications to these.

          There’s another variant of this that I’ve used. If you have a template class where the template parameter is the size of a char array then you can return the length via a static method or cosntexpr field. You can use a deduction guide to generate the right instantiations given a string literal. This is probably overkill for this use, but you need such a thing if you want to use strings as template arguments in C++20 (string literals are pointers and so can’t be used, but a class containing a char array can) and if you have one anyway (I wish the standard library provided one, it’s under 10 lines of code, it’s just annoying to have to add it to every project) then you can use it in this context. This version has the benefit that you get a compile failure if you can’t deduce the length at compile time, whereas the constexpr call will fall back to run-time evaluation.

          1. 1

            Because strlen is from the C library, and C doesn’t have constexpr.

            1. 1

              This is not an answer. __cplusplus exists for a reason.

              1. 2

                I’m not sure what you’re getting at. “__cplusplus” doesn’t exist in C, and so it can’t help at all.

                It’s clunky, but that’s how it is.

                1. 1

                  There is such thing as #ifdef. __cplusplus is not defined in C and defined in C++, so you can conditionally declare strlen to be constexpr only in C++ and not in C.

                  1. 1

                    __cplusplus is not defined in C

                    But “strlen” is defined in C, and that’s why it can’t be changed. The C++ standards body can’t change C any more than it can change Rust, Posix, or glibc.

                    1. 2

                      Sure, but strlen can be unchanged in C and can be constexpr in C++. That doesn’t involve any change to C standard.

                      1. 1

                        They can change std::strlen, though, and this kind of difference isn’t unprecedented, std::isinf (C++) is a set of overloaded functions, whereas isinf (C) is a macro.

            1. 1

              Can one claim that the three finger salute holds up better against XScreensaver and this bug? Does the fact that Windows is natively graphical play to its advantage?

              1. 2

                I don’t know if it holds up better, but the screensaver is on a separate desktop than the default desktop is on a separate desktop than the login screen. These desktops are different than X11 style virtual desktops, and are a sort of security boundary. The screensaver simply crashing will not lead to the desktop being switched IIUC, and calling SwitchDesktop is guarded against from those secured desktops.

                1. 1

                  This seems similar to having the lock screen be on a different VT, say directly in the login manager – I think gdm should work like that these days…

              1. 7

                There is no standard way to iterate over a sequence of values in Go.

                Go was publicly announced in November 2009, and version 1.0 was released in March 2012.

                I can’t help but feel like something went badly wrong here but what do I know.

                1. 20

                  These types of comments really drive me up a wall. It feels like what you are saying is “this is a common feature in other languages, the people behind Go (vague bad thing) since they didn’t add the feature too” which is just not sound reasoning.

                  In order to form a judgement about how bad something is, we should consider the consequences of it. The “normalness” of a behavior is an OK pointer, but thats all it is.

                  Maybe you can argue that the consequences have been grave and thus this is a grave failure, but that doesn’t seem true to me.

                  1. 26

                    I can’t argue that any of the things about Go’s design that people have been unproductively grouchy about in comments sections for the past decade have had grave consequences for any given Go adopter or for the widespread adoption of Go. Having written one short program in Go for my own purposes, the lack of a proper idiomatic iteration construct (no map, no iterating for loop, no list comprehension, no yielding to a block, just the apparently-hacky for-range) was flummoxing. Go’s designers are entitled to their priorities but idk I feel like I’m entitled to make fun of those priorities a little bit, especially because they get paid more than I do.

                    1. 14

                      IMO there is a solid iteration idiom, and they hit on it early in the stdlib, although the fact that they managed to do it several different ways afterwards is a disappointment. It’s the

                      for iter.Next() {
                          item := iter.Val()
                          // ...

                      one. You can do pretty much anything with it, it doesn’t impose any burden on the caller, it doesn’t make the caller ugly, and you can implement it on top of pretty much anything else. With generics you could even codify it as an interface (parameterized on the return type of Val).

                      None of which is to say that I opposite this proposal — it looks pretty nice to me. But in 7+ years of writing Go professionally, the lack of an iterator syntax hasn’t been a major thorn in my side — or even a substantial annoyance.

                  2. 7

                    I’m pretty sure Zig has no concept of iterators either


                    I saw in the latest blog that the Zig for loop is being changed, but AFAIK there will still be no iterators. You basically do what you do in Go – write your own set of methods on a struct.

                    So it seems like Zig will be a ~2025 language without iterators

                    I would say languages have different priorities, and it’s harder than you think. Or you could just make vague statements about people doing bad things for no reason

                    (edit: this bug seems to confirm what I thought

                    1. 3

                      Zig deliberately has no interfaces or traits whatsoever. You put the methods in the struct and they get called and it will compile if and only if the types work out after comptime propagation. I might be wrong but as far as I understand “iterators” in the language will be a bit of syntax sugar relying on a documented (but informal) interface, and Zig will very much have iterators exactly like, say, Julia or JavaScript or Python have iterators (except those languages check if things work out at runtime instead of compile time).

                      On the other hand the major selling point of Go is having interfaces enforced by the compiler. But a fast iteration interface needs to be a generic interface so that wasn’t really possible until recently…

                      Hopefully it all works out on both fronts.

                      1. 4

                        Eh I don’t see what you’re saying. My point is that, as of now, Go and Zig are the same as far as iterators.

                        As of ~2025, Go might have iterators, and Zig probably won’t. Go is thinking about adding some concept of iterators to the language to enforce consistency.

                        Python’s for loop and list comprehensions understand iterators; I don’t think the same is true of Zig.

                        If you know otherwise, please provide a link to the docs.

                    2. 4

                      The dominating quality of Go’s development over the past decade has been the most extreme caution when it came to adding features. You can’t have performant, extensible iteration without some kind of generics and they were stuck in place on that issue out of fear of C++ compile times until finally a couple of years ago.

                      1. 8

                        You can’t have performant, extensible iteration without some kind of generics

                        It’s even stronger than that: if you do want to map’n’filter, you need a boatload of machinery inside the compiler to make that fast, in addition to significant amount of machinery to make it expressible at all.

                        Rust’s signature for map is roughly

                        trait Iterator {
                          type Item;
                          fn map<T, F>(self, f: F) -> Map<Self, F> 
                            F: FnMut(Self::Item) -> B;

                        That is, .map returns a struct, called Map, which is parameterized by the type of the original iterator Self, as well as the type of unnameable closure F. Meditating on this single example for a long time explains half of why Rust looks the way it does.

                      2. 3

                        Go’s developers focused on higher priority concerns, such as pretty great performance, a pretty great (though basic) type system, an awesome runtime and compilation model, and fantastic tooling. Go’s feature set (including the features it elided) made developers really, really productive compared with other languages.

                        While there are a few use cases that weren’t feasible without generics, the absence of generics made for some really interesting and compelling properties–like “everyone writes their code the same way (and thus any developer can jump into any other project and be immediately productive)” and “code is very concrete; people don’t usually try to make things overly abstract” which aren’t present in other languages. It wasn’t actually as obvious that generics were the right choice as Go’s critics claim (whose analyses flatly pretended as though there were no disadvantages to generics).

                        The net upside to generics (including iterators) was relatively small, so it makes sense that the decision was deferred.

                        1. 4

                          Go is a Google language. If a proposal helps or is of benefit to Google, it’ll be added. If it’s bad for Google, it will be ignored. If it’s neutral, then the only concern Google has is how well does it externalize training costs for Google.

                          1. 10

                            Google doesn’t really figure in at this level of discussion. The Plan 9 guys who made Go are the relevant actors for this. They were skeptical of generics, so it wasn’t a priority for Go 1.0. With no generics, a generic iterator protocol doesn’t make any sense, so that wasn’t in Go 1.0 either. Now Go has generics as of Feb. 2022, so there is a discussion about the best way to do an iterator protocol. This is the second proposal, which builds off of ideas from the first discussion and some ideas that had been in the issues tracker before that. It’s not really more complicated than that.

                            1. 4

                              You’re obviously right that the decision making an process is entirely about Google’s desires, but I’d hesitate to assume that it’s necessarily utilitarian. Google does a lot of self-sabotage.

                            2. 1

                              There is no standard way to iterate over a sequence of values in Standard ML, which is from circa 1970s/80s depending on who you ask, and is widely considered one of the most elegant of language designs. Something went badly wrong here or…?

                              1. 1

                                After having to deal with Rust iteration for a bit and missing out on Python…. I think the decent explanation here is that in more dynamic languages with stuff like coroutines it’s pretty easy to come up with a nice iterator protocol, but in more serious things it’s harder to come up with one that is both flexible enough for the “right” use cases without being very hard to use.

                                Like C++ has iterators right? And they do the job but they’re kind of miserable to use (or at least were 5+ years back, I’m sure things are better now).

                                Combine that with the perrenial generics things meaning container classes aren’t a thing and “stuff things into arrays and use indices” feels like a pretty OK solution for a long time.

                                1. 2

                                  I think C++ iterators are uniquely awkward in their design, and it’s not an inherent design problem or any sort of static typing limitation.

                                  C++ iterators are based around emulating pointer arithmetic with operator overloading, with a state awkwardly split between two objects. There’s no reason to do it this way other than homage to C and a former lack of for loop syntax sugar.

                                  And C++ iterators aren’t merely tasked with iterating over a set once from start to finish, but double as a general-purpose description of a collection, which needlessly makes both roles harder.

                                  1. 2

                                    There’s no reason to do it this way other than homage to C and a former lack of for loop syntax sugar.

                                    I think this is a little unfair, the primary reason to do it this way is so that code, especially templates work on pointers or iterators, eg being able to have a single implementation for something like std::find work for list or pointers. It’s not a “homage” so much as a source level interoperability consideration.

                                    1. 1

                                      OK, “homage” is a poor way of phrasing it. But it’s still an “interoperability consideration” with pointer arithmetic and C’s way of doing things, rather than a ground-up iterator design. The messy end result is not because iteration is such a hard problem, but because preserving C legacy is messy.

                                      1. 1

                                        Right it’s not inherent to “designing iterators in statically typed language.” Go doesn’t have a different language it’s trying to be incrementally adoptable from.

                                2. -6

                                  but what do I know.

                                  Not much.

                                1. 1

                                  Yeah… Serenity is fundamentally a C++ project and it shows. The C standard library itself uses C++’s runtime type information in order to link properly. It’s unfortunate, but it works, so it’s fine.

                                  I would love to hear more details about this. I’ve never seen RTTI used in linking before!

                                  1. 2

                                    I probably misspoke there and meant “needs C++ RTTI symbols in order to link”.

                                    1. 1

                                      I’m still not sure what this means. What type_info structures does it need to exist in linked code?

                                      1. 2

                                        I had to dig it up, but here’s the exact original error:

                                        ld.lld: error: undefined symbol: vtable for __cxxabiv1::__class_type_info
                                        >>> referenced by spawn.cpp
                                        >>>               spawn.cpp.o:(typeinfo for AK::Function<int ()>::CallableWrapperBase) in archive /bitplane/Serenity/Build/i686/Root/usr/lib/libc.a
                                        >>> the vtable symbol may be undefined because the class is missing its key function (see
                                        ld.lld: error: undefined symbol: vtable for __cxxabiv1::__si_class_type_info
                                        >>> referenced by spawn.cpp
                                        >>>               spawn.cpp.o:(typeinfo for AK::Function<int ()>::CallableWrapper<posix_spawn_file_actions_addchdir::'lambda'()>) in archive /bitplane/Serenity/Build/i686/Root/usr/lib/libc.a
                                        >>> referenced by spawn.cpp
                                        >>>               spawn.cpp.o:(typeinfo for AK::Function<int ()>::CallableWrapper<posix_spawn_file_actions_addfchdir::'lambda'()>) in archive /bitplane/Serenity/Build/i686/Root/usr/lib/libc.a
                                        >>> referenced by spawn.cpp
                                        >>>               spawn.cpp.o:(typeinfo for AK::Function<int ()>::CallableWrapper<posix_spawn_file_actions_addclose::'lambda'()>) in archive /bitplane/Serenity/Build/i686/Root/usr/lib/libc.a
                                        >>> referenced 2 more times
                                        >>> the vtable symbol may be undefined because the class is missing its key function (see
                                        1. 2

                                          Okay, so it looks like their libc needs to be linked to a C++ runtime library (libsupc++, libcxxrt, libc++abi)? If you’re static linking, you need to add this explicitly because *NIX static libraries aren’t really libraries, they’re just archives of .o files. That doesn’t mean that it requires RTTI for linking, it just means that it depends on a C++ runtime. I’m a bit surprised that they enable RTTI in libc, I would generally expect libc code to be compiled with -fno-rtti -fno-exceptions, but it is useful to have C++ thread-safe statics in libc, so you do want at least the __cxa_guard_* functions from the C++ runtime.

                                          1. 1

                                            It’s pretty all in on c++ afaict lambdas and everything. Not that those require rtti (I don’t think), but I wouldn’t be surprised by an internal use of or exceptions or rtti

                                            1. 1

                                              SerenityOS does not use exceptions, but it does make use of RTTI via its use of AK::Function (similar to std::function) within LibC.

                                              1. 2

                                                That does that use RTTI for? Most implementations of std::function (all of the ones that I’ve read, but I haven’t read all of them) work fine without RTTI. They use a templates constructor that wraps the statically typed lambda in a class with a virtual invoke function that calls the lambda’s call operator and either embeds the lambda (via a move or copy constructor) in the object or a separate heap allocation.

                                                The only things that use RTTI in C++ are exceptions (which dynamically map the thrown type to one of the caught types), dynamic_cast and a dynamic typeid statement. If you don’t use exceptions, then that just leaves dynamic_cast and typeid.

                                                Most modern C++ codebases avoid dynamic_cast because it’s very slow and you can get better and faster code with an explicit virtual cast method for the classes that actually need it. The only place where this is difficult is diamond inheritance (moving from one branch to another) and dynamic cast is very slow there (and it’s usually a bad idea).

                                                There are also problems with typeid. It returns a std::type_info object, which has a name method that returns a char*. The contents of this string are implementation defined (though it must be unique), but the Itanium ABI specifies that it is the managed type encoding. This means that you end up with some very large strings embedded in binaries. You often see 20% of the total binary size of a C++ library made up of type info strings, which is the main reason that you’d want to disable them. Personally, I’d love to see an ABI that replaced them with 64-bit integers formed from a cryptographic hash of the mangled name and emitted a map from integer value to string in a separate section that could be stripped in release builds.

                                            2. 1

                                              You’re right, I couldn’t explain myself very well.

                                    1. 1

                                      starttls: Some(true)

                                      Why an option? What’s the difference between None and Some(false)?

                                      1. 3

                                        It is to represent an optional field. This way you can recognize what is mandatory from what is not. It is also useful for the serde crate, which considers Option<T> fields as optional and can therefore be omitted during (de)serialization. For eg, the CLI based on this lib stores the configuration in a TOML file, the Option<bool> means that you can just omit the field (whereas bool would force the user to put it in his config file).

                                        1. 1

                                          Thanks, I had a feeling it might be related to something like a config file.

                                        2. 2

                                          I guess because a simple boolean can’t express disabled vs opportunistic vs required? An enum might have been the cleaner choice, as option is often confusing, but maybe i am missing something :)

                                          1. 1

                                            An option is already an enum, isn’t it?

                                            1. 1

                                              To be clear I am asking specifically about starttls, as the code afaict simply uses unwrap_or_default, the default of a bool is false, so it’s not like None is representing some alternative contextual default. A bool seems strictly powerful enough here. Compare to tls which wants to default to true, and the consuming code uses unwrap_or.

                                              I would probably make tls the enum if it were me, and remove starttls altogether but I admit to having a hard time with the names, maybe initiallytls, starttls, insecure

                                              1. 2

                                                An enum could be more accurate indeed, to avoid confusion between options. I keep in mind, thanks for the idea ;)

                                          1. 10

                                            Not that I’m saying MMOs need blockchains; but it turns out Proof-of-Work ledger technology had a use case here!

                                            Um, so where does PoW come in? The argument for blockchains here is reasonable, but MMOs have a single trusted entity to oversee the blockchain - the company that makes them. Proof-of-Work would only be useful if you let other people write to your item blockchain too, but why the hell would you do that?

                                            It’s worth noting that the author is the founder of an NFT company, whose site seems to make similar logical leaps to justify their product. I also might just be missing something obious.

                                            1. 5

                                              Yeah, a byzantine-fault-tolerant consensus mechanism is overkill for the (non-)problem of ensuring uniqueness in a central database. Though I’ve heard two interesting opinions on putting game assets on the blockchain, neither of which are related to dupe prevention:

                                              • Having game assets on the blockchain allows “true ownership” in the sense that you, a player, can’t “lose” them if the game shuts down or if you get banned.
                                              • Having game assets on the blockchain lets one trade it for other blockchain assets, and ultimately real-world money. The game owner can automatically earn a cut from each transaction by designing the asset’s smart contract.

                                              I cannot, however, think of a single reason you’d want to own an asset in a game you can no longer play. And I know that MMOs already explicitly disapprove of third party trading of game assets for real money, so they don’t get hit by international banking laws. I think you’re spot on – perhaps the author, being a NFT founder, is inclined to see these problems as nails for a certain kind of hammer.

                                              1. 2

                                                What about the use case of an open source MMO that has players on independently operated instances, but ensures fair item distribution to players on different servers with some kind of cryptographic protocol, made distributed by a proof-of-work blockchain?

                                                1. 3

                                                  Congrats, you have come up with the only use for a blockchain that I’ve ever heard that sounds like a good fit for an actual problem.

                                                  And it still won’t make the coin holders any money.

                                                  1. 2

                                                    And it still won’t make the coin holders any money.

                                                    Making coin holders money is never a goal of a serious project :P

                                                    1. 3

                                                      I disagree, making the initial investors money out of the gullibility of later investors has been the goal of pretty much every cryptocurrency project.

                                                      1. 1

                                                        I think I’d classify the various currencies as taking themselves seriously.

                                                        1. 2

                                                          They’re not intended to make anyone money by themselves though. Sometimes they do, because whatever can’t stop markets, but it’s not the use case.

                                                      2. 2

                                                        Coin holders? Money? Not of my concern!

                                                      3. 1

                                                        Wouldn’t the servers still need some kind of way to distribute items and such to players for e.g. completing quests? How would you ensure that they only give items to the players that deserve it, without putting the entire game state on the blockchain?

                                                        1. 1

                                                          Servers to clients are easy. Servers among themselves is harder if you don’t trust all operators.

                                                          1. 2

                                                            I think @dzwdz is considering the following kind of attack:

                                                            1. Client A wants a load of stuff.
                                                            2. The user of client A creates a server.
                                                            3. The server creates 1,000 made-up clients.
                                                            4. The server creates items for these 1,000 clients and records them on the blockchain. This meets the consortium policy because it’s a small number of items per client.
                                                            5. The server provides all of those items to client A.

                                                            You could possibly use a shared ledger to track everyone’s inventory, then use it for auditing to observer that the server had been doing malicious things and roll back a load of transactions.

                                                            This case seems like a much better fit for confidential computing though: require every server operator to provide a remote attestation quote before they can join and then you know that they’re all running a valid build of the software. If you want to allow custom per-server behaviour then provide an interface for untrusted plugins that enforces some policy at the boundary.

                                                      4. 2

                                                        He defends this idea on HN, I’ve linked to what I felt was the most compelling version of the argument, which still isn’t super compelling, IMO, which is that sometimes “other people” are internal bad actors.

                                                      1. 4

                                                        Careful with spaces in filenames! Theres a -print0 argument to find and a corresponding argument to xargs that takes care of this issue. Though it still makes problems if your filenames contain a literal 0 byte. But then you’re screwed anyway. Fortunately, Unix file names cannot contain null bytes.

                                                        % find src -iname "*.ts" \
                                                          | LC_ALL=C sort \
                                                          | xargs md5sum \
                                                          | md5sum \
                                                          | cut -d" " -f 1
                                                        md5sum: src/this: No such file or directory
                                                        md5sum: is: No such file or directory
                                                        md5sum: a: No such file or directory
                                                        md5sum: test.ts: No such file or directory
                                                        % ls src
                                                        'this is a test.ts'

                                                        From the man page of find(1):

                                                        -X Permit find to be safely used in conjunction with xargs(1). If a file name contains any of the delimiting characters used by xargs(1), a diagnostic message is displayed on standard error, and the file is skipped. The delimiting characters include single (“ ’ ”) and double (“ “ ”) quotes, backslash (“\”), space, tab and newline characters.

                                                        However, you may wish to consider the -print0 primary in conjunction with “xargs -0” as an effective alternative.

                                                        1. 4

                                                          Though it still makes problems if your filenames contain a literal 0 byte. But then you’re screwed anyway.

                                                          This is as “impossible” as a file name with a literal ‘/‘ in it.

                                                          1. 2

                                                            Ah, that’s a relief. Thanks!

                                                          2. 3

                                                            To be fair, if you have files with white spaces IN YOUR SOURCE CODE you deserve any pain you get =P

                                                            1. 1

                                                              Thanks, I’ll fix that when I’m back at a computer!

                                                            1. 17

                                                              People have been trying to make Unix-y desktop environments look like Mac OS for just about forever – I recall some truly hideous ones in the 2000s – but I’ve never seen one get close enough to even achieve an uncanny-valley effect. Looking at the screenshots, this is yet another one that’s not even close.

                                                              And in general the idea of “finesse of macOS” on a *nix desktop environment is contradictory. You can’t have both the “freedom of (insert your favorite OS here)” and the level of enforced conformity of interface design and other niceties that makes macOS actually nice to use. There simply are too many developers out there building too many applications which will refuse to adopt the necessary conventions, and the project almost certainly doesn’t have the resources to fork and maintain all of them.

                                                              1. 9

                                                                I am not so sure.

                                                                The trouble is that people focus on trivial cosmetics rather than structure & function. Themes, for example: those are just not important.

                                                                Ubuntu’s Unity desktop got a lot of the OS X user-level functionality. A dock/launcher that combined open windows with app launchers, which had indicators to show not only that a window was open (Mac OS X style) but how many windows were open (a gain over OS X), and had a standardised mechanism for opening additional empty windows (another gain over OS X).

                                                                But people didn’t register that, because it was on the left. Apple’s is at the bottom by default (although you can move it, but many people seem not to know that.) NeXT’s was at the right, but that’s partly because NeXT’s scrollbars were on the left.

                                                                Cosmetics obscured the functional similarity.

                                                                Unity showed that a global menu bar on Linux was doable, and somehow, I don’t know how, Unity did it for Gtk apps and for Qt apps and for apps using other frameworks or toolkits. Install new software from other desktops and it acquired a global menu bar. Still works for Google Chrome today. Works for the Waterfox browser today, but not for Firefox.

                                                                On top of that, Unity allowed you to use that menu with Windows shortcut keystrokes, and Unity’s dock accepted Windows’ Quick Launch bar keystrokes. But that went largely unnoticed too, because all the point-and-drool merchants don’t know that there are standard Windows keystrokes or how to use them.

                                                                Other aspects can be done as well: e.g. Mac OS X’s .app bundles are implemented in GNUstep, with the same filenames, the same structure, the same config files, everything. And simpler than that, AppImages provide much the same functionality. So does GoboLinux.

                                                                This stuff can be implemented on FOSS xNix: the proof is that it’s already been done.

                                                                But nobody’s brought the pieces together in one place.

                                                                1. 8

                                                                  There are a few things that are harder:

                                                                  Making the same keyboard shortcuts work everywhere. Even little things like the navigation within a text field can behave differently between toolkits, though the big distros have configured at least GTK and Qt to behave the same way. Beyond that, on macOS, command-, will open preferences in any application that has preferences. Command-shift-v pastes and matches style in every application that has rich text. There are a load of shortcuts that are the same everywhere. Apple achieved this by having XCode (and, before that, Interface Builder) populate the default menu bar with all of these shortcuts preconfigured. This is much harder to do with a diverse tooling ecosystem.

                                                                  Drag and drop works everywhere. It’s somewhat embarrassing that drag-and-drop works better between MS Office apps on macOS than Windows. There are a couple of things that make this work well on macOS:

                                                                  • The drag-and-drop protocol has a good content-negotiation model. The drag source provides a list of types that it can provide. The drop target picks one. The drag source then provides the data. This means that there’s no delay on drag start (you just need to provide a list of types, which is trivial, the Windows and Wayland models both encourage you to provide the drag data up front). It also means that you can do complex transcoding things on drop, where users are more tolerant of delay.
                                                                  • There’s a built-in type for providing a file promise, where I can offer a file but not actually need to write it to the filesystem until the drop operation completes. This means, for example, I can drag a file from a attachment and it doesn’t need to do the base64 decode and write the file until the drop happens.

                                                                  There are some things that could be improved, but this model has worked pretty well since 1988. It’s significantly augmented by the fact that there’s a file icon in the title bar (newer macOS has hidden this a bit) that means that any document window has a drag source for the file. I can, for example, open a PDF in Preview and then drag from the title bar onto’s icon to create a new file with that PDF file as an attachment.

                                                                  The global menu bar mostly works with Qt and GTK applications but it requires applications to have separate app and document abstractions, which a lot of things that aim for a Windows-like model lack. Closing the last window in a macOS app doesn’t quit the app, it leaves the menu bar visible and so you can close one document and then create a new one without the app quitting out from under you. The Windows-equivalent flow requires you to create the new doc and then close the old one, which I find jarring.

                                                                  The macOS model works particularly well on XNU because of the sudden termination mechanism that originated on iOS. Processes can explicitly park themselves in a state where they’re able to be killed (with the equivalent of kill -9) at any point. They are moved out of this state when input is available on a file descriptor that they’re sleeping on. This means that apps can sit in the background with no windows open and be killed if there’s memory pressure.

                                                                  Sudden termination requires a bunch of cooperation between different bits of the stack. For example, the display server needs to own the buffers containing the current window contents so that you can kill an app without the user seeing the windows go away. When the user selects the window, then you need the window server to be able to relaunch the app and give it a mechanism to reattach to the windows. It also needs the window server to buffer input for things that were killed and provide it on relaunch. It also needs the frameworks to support saving state outside of the process.

                                                                  Apple has done a lot of work to make sure that everything properly supports this kind of state preservation across app restarts. My favourite example is the terminal, which will restore all windows in the same positions and a UUID in an environment variable. When my Mac reboots, all of my remote SSH sessions are automatically restored by a quick check in my .profile to see if I have a file corresponding to the session UUID and, if so, reading it and reestablishing the remote SSH session.

                                                                  1. 7

                                                                    I prefer not to engage with comments that resort to phrases like “point-and-drool”.

                                                                    But I’ll point out that you’re not really contradicting here – Unity was a significant amount of work that basically required an entity with the funding level of Ubuntu to pull off even briefly, and in general Ubuntu’s efforts to make the desktop experience more “polished” and easier have been met with outright hostility. Heck, even just GNOME tends to get a lot of hate for this: when they try to unify, simplify, and establish clear conventions, people attack them for “dumbing down”, for being “control freaks”, for their “my way or the highway” approach, for taking away freedom from developers/users/etc., and that ought to be the clearest possible evidence for my claim about the general contradiction between wanting the “finesse” of macOS with the “freedom” of a Free/open-source stack.

                                                                    1. 1

                                                                      I prefer not to engage with comments that resort to phrases like “point-and-drool”.

                                                                      [Surprised] What’s wrong with it? It is fairly mildly condemnatory, IMHO. FWIW, I am not an American and I do not generally aim to conform to American cultural norms. If this is particularly offensive, it’s news to me.

                                                                      I completely agree that Unity was a big project and a lot of work, which was under-appreciated.

                                                                      But for me, a big difference is that Unity attempted to keep as many UI conventions as it could, to accommodate multiple UI method: mainly keyboard-driven and mainly pointing-device driven; Windows-like conventions (.e.g window and menu manipulation with the standard Windows keystrokes) and Mac-like conventions (e.g. a global menu bar, a dock, etc.)

                                                                      GNOME, to me, says: “No, we’re not doing that. We don’t care if you like it. We don’t care if you use it. We don’t, and therefore, it’s not needed. You don’t need menu bars, or title bars, or desktop icons, or a choice of sidebar in your filer, or any use for that big panel across the top of the screen. All that stuff is legacy junk. Your phone doesn’t have it, and you use that, therefore, it’s baggage, and we are taking all that away, so get used to it, move on, and stop complaining.”

                                                                  2. 2

                                                                    People have been trying to make Unix-y desktop environments look like Mac OS

                                                                    Starting with Apple. :-). macOS is Unix.

                                                                    And in general the idea of “finesse of macOS” on a *nix desktop environment is contradictory.

                                                                    Considering the above, that seems … unlikely. Or maybe macOS is a contradiction onto itself?

                                                                    NeXTstep arguably had even higher standards.

                                                                    It’s funny, because I use the old “stable but ugly server os vs. pretty+easy-to-use but crashy client OS”-dichotomy as an example of things we used to believe were iron laws of nature, but that turned out to be completely accidental distinctions that were swept away by progress.

                                                                    My phone is running Unix, and so is just about everybody’s. I think my watch is running Unix as well.

                                                                    and the level of enforced conformity of interface design

                                                                    Considering how crappy the built in apps are these days compared to 3rd party apps, and how little they conform to any rules or guidelines, I think that’s at best a highly debatable claim.

                                                                    1. 7

                                                                      A lot of the “finesse” of macOS actually isn’t in superficial appearance, though, it’s in systemwide conventions that are essentially impossible to enforce in a useful way without also bringing in the level of platform control that’s alleged to be evil when Apple does it.

                                                                      1. 4

                                                                        Right, there’s a lot of subtle things you’d have to enforce within the ecosystem to make it work even close - just the notion of NSApplication and friends is alien to a world where it’s assumed a process has to have a window to exist in the GUI.

                                                                      2. 2

                                                                        People have been trying to make Unix-y desktop environments look like Mac OS

                                                                        Starting with Apple. :-). macOS is Unix.

                                                                        I think if you read “unix-y desktop environments” where “unix-y” modifies “desktop environment” as opposed to as “unixes that have a desktop environment” (eg as “X11 DEs/WMs such as GNOME/KDE/Englightenment”) I think this observation is more compelling. A common theme of most phones, most (all?) watches, NextStep and Mac OS is that they are very much not running “a unix-y desktop environment.”

                                                                      3. 2

                                                                        If I want a Mac, I’ll buy a Mac. What’d be more interesting is trying to build something better. Despite Gnome 3’s flaws, I do appreciate it at least tried to present a desktop model other than “Mac OS” or “Windows 98”.

                                                                        1. 1

                                                                          Yes, I have to give you that.

                                                                          I personally hate the model it proposes, but you make a good point: at least it’s not the same old same old.

                                                                          Oddly, Pop OS pleases me more, because it goes further. Pop OS, to me, says:

                                                                          “OK, so, GNOME took away your ability to manage windows on the desktop and it expects you to just use one virtual desktop per app. Window management is for simps; use it like a phone, all apps full screen all the time. We don’t think that’s enough, so, we’re going to keep GNOME but slap a tiling window manager on top, because we respect that you might not have a vast desktop or want one app per screen, so let us help you by automating that stuff… without sacrificing a desktop environment and switching to some kind of alien keyboard-driving tiling window-manager thing that takes away everything you know.”

                                                                        2. 1

                                                                          Isn’t SerenityOS an example of finesse of macos with some level of open source freedom?

                                                                          It depends how you define freedom, but they seem to have a lot of different people working on different things entirely that are still built on a shared vision seemingly quite well, it is still early but i wouldnt give up on the idea all together.

                                                                          I understand what you are saying, but I think its an interesting compromise thats surprisingly effective (from the outside perspective)

                                                                        1. 8

                                                                          I’ve been talking a lot about make lately, it seems, and I kinda feel sometimes like I’m a “Make developer” as much as some folks call themselves “Java developer” or “Go developer” these days because I’ve spent so much time making my codebases easier to use by using Make as the interface for both one-off tasks without dependencies or file generation (“phony” in Make parlance) as well as for the textbook use with output files and dependent input files.

                                                                          Just on Lobsters alone:

                                                                          1. 3

                                                                            At my last job I worked with the developer of the Procfile format:

                                                                            I asked him a few times to explain what Procfiles were for that couldn’t be done equally well with existing tooling using Makefiles and phony targets.

                                                                            Never got a straight answer. And now we’re stuck with yet another Filefile entry:

                                                                            1. 2

                                                                              :) I find phony targets to be not as simple as a Procfile (which I’ve accidentally memorized). Makefile I think fits nicely with make just as Procfile fits nicely with ps if you can imagine that heroku is starting a process for you. But it’s a valid point, I’m not arguing. They do similar things in different ways. Most of Heroku’s “ahead of its time”-ness was on the massive effort to detect your app. I kind of accepted touching an empty Procfile while I was amazed at what Heroku was doing near launch.

                                                                              The Filefile thing is funny and I starred that repo way back when. It’s odd to see them all collected up even if some of those tools aren’t as recognizable anymore. It’s rare for a project to have all those things in the root and they’d be broken up by a similar troupe of cargo.lock, yarn.lock, package-lock.json, Gemfile.lock, poetry.lock.

                                                                              1. 2

                                                                                I asked him a few times to explain what Procfiles were for that couldn’t be done equally well with existing tooling using Makefiles and phony targets

                                                                                1. Have a target named web and a web process type. You’d at least need 2 separate Makefiles then.

                                                                                2. As far as I can tell you’d have to implement a Makefile parser to implement the scaling UI. There might be a way to get make(1) to tell you the phony targets but if so it is not obvious to me how. That seems awfully complicated for the use case.

                                                                                1. 1

                                                                                  There might be a way to get make(1) to tell you the phony targets but if so it is not obvious to me how.

                                                                                  You would use the same mechanism that bash uses to determine tab-completions.

                                                                                  Have a target named web and a web process type. You’d at least need 2 separate Makefiles then.

                                                                                  Seems like using a sledgehammer to squash a fly if you ask me.

                                                                                  1. 1

                                                                                    You would use the same mechanism that bash uses to determine tab-completions.

                                                                                    I wasn’t aware that only completed phony targets?

                                                                                    Seems like using a sledgehammer to squash a fly if you ask me.

                                                                                    Makefiles have power in excess of what Procfiles are used for. There are no dependencies, no patterns, no inclusion, etc.

                                                                                    A well known directory (call it “procs”, or hell “bin”) with executables in it would be a vastly simpler replacement for Procfiles than a Makefile.

                                                                                    1. 1

                                                                                      A well known directory (call it “procs”, or hell “bin”) with executables in it would be a vastly simpler replacement

                                                                                      TBH I first asked him why make the Procfile over a bin directory and never got a straight answer for that either.

                                                                                2. 1

                                                                                  I thought Procfile was more for running a number of services together. I also thought it came from Foreman, but perhaps I’m wrong about that?


                                                                                3. 2

                                                                                  I wonder if you’ve considered shell for those tasks, and if so what the pros/cons you see are?

                                                                                  If you have a bunch of shell commands you want to put in a single file, I call that the “Taskfile” pattern, and funnily enough you can use either make or shell for “Task files”.


                                                                                  In shell, I just use a function for each task:

                                                                                  build() {
                                                                                    ./ --quiet build_ext --inplace      
                                                                                  test() {
                                                                                     for t in *; do

                                                                                  or you can also use a case statement.

                                                                                  Here are some pros and cons of make vs. shell I see:

                                                                                  • With make, on most distros you get shell autocompletion of targets. I have a bash completion script to do the same thing with shell functions, but most people don’t have it.
                                                                                  • Make automatically dispatches to the “verb” – with shell you need a bit of extra code. (I use "$@" but it doesn’t provide great error messages.)
                                                                                  • In shell you have to remember “unofficial strict mode” to get good error handling. I think Make has similar problems, although maybe the default is slightly safer.

                                                                                  Make downsides:

                                                                                  • Syntax conflicts because Make embeds shell:
                                                                                    • $(VAR) conflicts with shell variables, so for shell variables you need something like $$my_shell_var
                                                                                    • How do you write a for loop? You have to put an extra \ at the end of every line in a Makefile, which is very ugly
                                                                                  • You have to remember to make everything .PHONY; otherwise it’s a subtle bug

                                                                                  Any others? I prefer shell but I can see why people choose Make. Really it is the language “mashup” that really bothers me :-/

                                                                                  BTW I take Task files to an extreme and Oil has over 10,000 lines of shell automation


                                                                                  e.g. to generate all the tests, benchmarks, and metrics here:

                                                                                  Related story from 5 months ago:

                                                                                  Other comment: – OK another issue is listing the “tasks”, which is related to the autocompletion issue. Make doesn’t have this by default either

                                                                                  1. 1

                                                                                    Make is a tool for managing a directed acyclic graph of commands. So I’m not sure why you’d compare it to bash. Make is a wrapper for bash lines that defines the relationships between your bash code.

                                                                                    1. 1

                                                                                      I understand that theory, but that’s not what what the OP is talking about using for. Look at one comment he linked:


                                                                                      Those are six .PHONY verbs, not nouns. Even build is a verb.

                                                                                      So they’re using make as a “Task runner” (verbs), not as a build system (to “demand” nouns). (FWIW Oil’s Ninja build has hundreds of nouns, mainly for tests and several build variants: ASAN, UBSAN, coverage, etc.)

                                                                                      Make isn’t great as a build system for all the reasons discussed here and many other places:

                                                                                      As mentioned, I wrote 3 Makefiles from scratch starting from 2017 and concluded it was a big mistake (and I’m still maintaining them).

                                                                                      1. One was rewriting Python’s build system from scratch (which is still being used by Oil today, but needs to go away)
                                                                                      2. Apache-style Log analysis (since I don’t use Google Analytics or any hosted service). Requires dynamic/globbed rules.
                                                                                      3. Building the blog (which is surprisingly speed sensitive). Requires dynamic/globbed rules.

                                                                                      For those use cases (and I think most), Python/Ninja is way better, and similar to Bazel, but much lighter. The sandboxing like Landlock Make would be great though – that is a real problem.

                                                                                      I think you forked Make and made it a good build system for your use cases, but that doesn’t mean it’s good in general :)

                                                                                      1. 1

                                                                                        Where is your Python Makefile? If your effort to write a Makefile for Python didn’t work out, then that doesn’t make it it’s Make’s fault. There were probably just some things you failed to consider. I wrote a Makefile for Python about a year ago . If I build Cosmopolitan, then rm -rf o//third_party/python and then time make -j16 o//third_party/python then it takes 17 seconds to compile Python and run the majority of its tests. The build is sandboxed and pledged. It doesn’t do things like have multiple outputs. We removed all the code that does things like communicate with the Internet while tests are running.

                                                                                        1. 1

                                                                                          It starts here:

                                                                                          It definitely works, but doesn’t do all the stuff I want.

                                                                                          Where Make falls down is having any kind of abstraction or reuse. That seems to show in your lengthy ~4500 line Makefile. (If it works for you, great, but I wouldn’t want to maintain the repetition! )

                                                                                          1. For the Python makefile, I want to build oil.ovm, opy.ovm, hello.ovm, i.e. three different apps. I’m using the % pattern for that. If I want to add a second dimension like ASAN/UBSAN/coverage, in my experience that was difficult and fragile

                                                                                          2. For the log analysis, I want to dynamically create rules for YYYY-MM-DD-accesslog.tar.gz.

                                                                                          3. For the blog, I want to dynamically make rules for blog/*/*/*.md, and more.

                                                                                          That pattern interacts poorly with the all the other features of Make, including deps files with gcc -M, build variants, etc.

                                                                                          In contrast, it’s trivial with a script generating Ninja.

                                                                                          But I don’t think even those use cases necessary to justify my opinion, there are dozens of critiques of Make that are 10 years old and based on lots of experience. And comments like this right in the thread:


                                                                                          I have looked quite deeply into Make, and used it on a variety of problems, so I doubt it will change my mind, e.g.


                                                                                          Remember when I say “make is a bad language”, this is coming from the person who spent years reimplementing most of bash and more :) i.e. I don’t really have any problem with “bad” or “weird” or “string-ish” languages, at least if they have decent / salvageable semantics. And I don’t think Make does for MANY common build problems.

                                                                                          The sandbox/pledge stuff you added to Make is very cool, and I would like something like that, and hopefully will get time to experiment with it.

                                                                                          1. 1

                                                                                            For the Python makefile, I want to build oil.ovm, opy.ovm, hello.ovm, i.e. three different apps. I’m using the % pattern for that. If I want to add a second dimension like ASAN/UBSAN/coverage, in my experience that was difficult and fragile

                                                                                            Consider using o/$(MODE)/%.o: %.c pattern rules. Then you can say make MODE=asan.

                                                                                            Remember when I say “make is a bad language”, this is coming from the person who spent years reimplementing most of bash and more :)

                                                                                            I don’t doubt you’re an expert on shells. Being good at shells is a different skillset from directed acyclic graphs.

                                                                                            That seems to show in your lengthy ~4500 line Makefile. (If it works for you, great, but I wouldn’t want to maintain the repetition!

                                                                                            If by repetition you mean my makefile code is unfancy and lacks clever abstractions, then I’ll take it as a compliment. Python has 468,243 lines of code. I’m surprised it only took me 4k lines of build code to have a fast parallelized build for it that compiles and runs tests in 15 seconds. Even the Python devs haven’t figured that out yet, since their build takes more like 15 minutes. I believe having fast build times with hermeticity guarantees is more important than language features like rich globbing, which can make things go slow.

                                                                                1. 10

                                                                                  Not universally applicable, but another advantage of print based debugging is that it pushes the system towards being more observable and thus helps with debugging production issues. If you can’t debug it from the logs during development, what hope do you have of debugging a prod issue?

                                                                                  Or, looking at it from the other end, if you have to put in enough logging to debug prod issues anyway, then the need for a debugger largely disappears as a result.

                                                                                  1. 13

                                                                                    A solid majority of the software I’ve been responsible for at my day job has been some kind of web server process that is breaking for a specific user in prod. There is no way for me to use a debugger to diagnose the problem if I first hear about it in the context of error alarms reporting that 500 responses are getting served to live customers. Print debugging is the only tool I have.

                                                                                    1. 3

                                                                                      There’s a key advantage to trace-based debugging: it can capture flows, rather than point in time. In debugging clang, for example, I generally see a crash when walking the AST that is caused by an error in the code that created that AST node. By the time that the debugger attaches to the crash, all of the state that was associated with the incorrect code is gone, other than the output. The debugger can let me introspect a load of ephemeral state associated with the victim code, but that isn’t useful.

                                                                                      Print-based debugging is the simplest case of trace-based debugging, where the traces are unstructured and ad-hoc. Structured logging is better, but requires a bit more infrastructure.

                                                                                      The biggest value of tools like valgrind is not that they catch memory-safety bugs, it’s that they automatically correlate these bugs with temporally distant elements. When I have a use-after-free bug in a program that I run with valgrind, it tells me where the use is but it also tells me when the object was allocated and when it was freed. These two extra pieces of information make it incredibly easy to fix the bug in comparison to having only the location of the use of the dangling pointer.

                                                                                      1. 2

                                                                                        Why not connect the debugger to prod?

                                                                                        1. 6

                                                                                          Depends on the type of project I guess, but in my corner of the industry, that would be rather complicated due to the deployment and security model. Logs are much easier.

                                                                                          1. 4

                                                                                            Can you guarantee the application won’t halt for everyone while you’re taking your time single-stepping through it?

                                                                                            1. 3

                                                                                              I will preface this by saying it is always dangerous to perturb something by using atypical mechanisms (debugging apis eg ptrace, but also in many cases runtime flags that turn on rarely used instrumentation).

                                                                                              That said, don’t single step. Even outside of production threads or having a gui tends to make that awkward. Instead take advantage of programmable breakpoints and/or memory dumps. In the simplest form you can do printf debugging but with printfs that didn’t occur to you before the process was started.

                                                                                            2. 3

                                                                                              Most operations are subject to timeouts that are immediately triggered by a debugger breakpoint. Which means every debugged operation basically immediately fails. Debuggers don’t really work in distributed systems, unfortunately.

                                                                                          1. 2

                                                                                            I really wish there was a SQL database that could provide a queryable log along side the other tables. That feature would enable so much goodness and largely obviate the need for soft deletes in the usecases I most often see soft deletes used for.

                                                                                            1. 2

                                                                                              MS SQL has temporal tables, not sure if that covers what you were thinking of.

                                                                                            1. 5

                                                                                              A real problem with string_view is that by design it isn’t memory safe: you can return one from a function, store one in a strict, or whatever and it will silently carry a pointer to its backing store, whether or not the backing store is kept alive.

                                                                                              Personally I think it’s a foot gun of an API given how easy it is to use it in a way that results in UAF bugs.

                                                                                              1. 4

                                                                                                This feels slightly unfair to call out string_view, iterators have all the same problems, do you consider them a foot gun of an api too? It is a foot gun, but it’s not any more of a foot gun than char * and maybe only slightly more of a foot gun than string&.

                                                                                                And it at least has the advantage of consistently being a non-owning reference to the backing store, something that could go either way with char *.

                                                                                                1. 3

                                                                                                  In the context of C and C++, string_view is an improvement over alternatives. However, Rust has shown the value of statically guaranteeing no dangling pointers. Rust’s &str (the equivalent of C++‘s string_view) can’t point to a string which is no longer valid to reference. That’s pretty great!

                                                                                                  1. 2

                                                                                                    Sure. And if the original post had said something like “it’s too bad C++ doesn’t allow string_view to be memory safe by design.” I probably wouldn’t have made an objection. This is a “real problem” with vast swaths of C++ apis though, it is not notable or exceptional that string_view suffers from it.

                                                                                                    1. 2

                                                                                                      Sure, in the context of C++, it’s not notable that string_view has this problem, but in the broader context of this set of competing languages (C, C++, Rust), it’s worth noting when a new API is offering less than state of the art guarantees.

                                                                                                      1. 1

                                                                                                        But by focusing on the api the implication is the specific api is flawed as opposed to the language. No new C++ api will ever be “state of the art” without a language change. We’re not talking about adding C++’s string_view to Rust where it would make sense to criticize the api for not being memory safe, where the option to do better exists.

                                                                                                    2. 2

                                                                                                      I’m curious about how this works. If in Rust, I have a reference counted object A, that points to a reference-counted object B, that has a string field, and I use &str to take a reference to a substring of that string field. I then pass A and the substring to a function, which follows the reference from A to B and truncates the string, how does Rust enforce the bounds checks? If I completely invalidate the range that the substring covers, what happens when I try to use it?

                                                                                                      I believe the answer is that the Rust borrow checker would prevent objects reachable from A from being mutated during the function that has the substring reference and so I can’t express the above idiom at all. In C++, there is no such restriction and you get all of the power, flexibility, and the footguns that accompany this.

                                                                                                      1. 2

                                                                                                        I think (and could local Rust experts please correct me if I am wrong; @matklad, @kornel) Rust will not let you both have a mut reference from A to B and an &str to substring in B since that would mean you have two references and one of them is mut. You could have a non-mut reference from A to B and an &str substring to B but that means you wouldn’t be able to modify B in your function. But I could be complete wrong.

                                                                                                        1. 3

                                                                                                          Yeah, this is mostly correct, with the asterisk that a) what Rust really tracks is aliasing, rather than mutability, and it is possible to use interior mutability to mutate aliased data and b) you can use unsafe and raw pointers to gain all power, flexibility, and even more footguns.

                                                                                                          Couple of examples here:


                                                                                                    3. 3

                                                                                                      I consider atring_view to be a new API introduced well into the period of knowing that dangling pointers are bad. I couple that with it behaving as a value object rather than iterators that have pointer semantics in my belief that string_view is bad.

                                                                                                      Iterators date to the earliest days of c++ and have a well understood pile of issues that make them not memory safe (the for(:) using iterators removes a lot of safety checks and introduce vulnerabilities from previously safe code with indices)

                                                                                                      1. 2

                                                                                                        Modern C++ APIs are not designed to eliminate memory safety bugs, they are designed to permit coding styles that eliminate pointer safety bugs. Combined with the lifetime annotation on accessors, a compiler can warn if you have locally created a string view that outlives the object from which it was created. A coding convention that says not to capture string views (pass them down the stack only) then gives you most of the lifetime safety that you need.

                                                                                                        The corner case is if the object that the string view references is reachable from another object and the called code mutates it in such a way that it invalidates the string view. For string views derived from something like std::string, this could be avoided by having the string view reference the string object and its bounds but that adds performance overhead and would reduce adoption and it cannot work with string views derived from C strings: One of the big benefits that you get from string views is the ability to provide safe implementations of APIs that are exposed to C and take C string arguments and can also be used from C++ with string objects.

                                                                                                        1. 1

                                                                                                          But new APIs should be designed such that they make it very easy - to the extent I’ve seen multiple blogposts and mailing list comments on the memory unsafely of string_view when used in a way that looks reasonable. If a new object is added that is trivially unsafe if moved, copied, or otherwise stored is introduced, then that object should not support copy or move.

                                                                                                          One of the more common errors in llvm is storing the semantically equivalent StringRef, which often works fine due to other objects keeping buffers live long enough. Except when they don’t, and then you get crashes or worse not crashes but whatever happens to be present at the time.

                                                                                                          It is not reasonable to add new APIs to C++ where code that looks correct is not memory safe.

                                                                                                          1. 2

                                                                                                            But new APIs should be designed such that they make it very easy - to the extent I’ve seen multiple blogposts and mailing list comments on the memory unsafely of string_view when used in a way that looks reasonable. If a new object is added that is trivially unsafe if moved, copied, or otherwise stored is introduced, then that object should not support copy or move.

                                                                                                            If you remove the copy constructor in string_view then you eliminate its only valid use: passing down value type down the stack to other functions for temporary delegation. The problem is that C++ doesn’t differentiate between copy and capture, so there is no way of implementing a thing that can be passed down but cannot be captured. You could force it to be passed by reference, but then you’re adding an extra indirection for a two-word object and that will hurt performance.

                                                                                                            The main reason that string_view exists is to have an explicitly non-owning type that replaces passing a char* and a size_t in a way that makes it easy for one of them to be lost (for other code to assume that the char* is null terminated, for example).

                                                                                                            As you say, llvm’s StringRef is analogous and any code that captures a StringRef is a bad code smell and gets caught in auditing. Prior to StringRef, code either defensively allocated std::strings all over the place and hurt performance or decided that performance was important and captured const char*s. StringRef has been a huge improvement over the prior state.

                                                                                                            If you have a proposal for a performant, memory-safe, non-owning string reference type that could be implemented in C++ (even with some modest language additions, but without completely redesigning the language) then I’d be very interested to hear it because it’s something that I’d happily adopt in a bunch of codebases.

                                                                                                            1. 1

                                                                                                              No, if you remove copy you force pass by reference.

                                                                                                              That has the benefit of meaning you can’t make a string_view outlast its lexical context, and you restrict it to what it ostensibly is: an abstraction API over the many variations on what a string is. In exchange for having to put an ampersand in you remove - or at least significantly reduce the footgun-ness of the API and potentially improve perf+code size as you can pass a pointer/reference in a single register :D

                                                                                                              1. 1

                                                                                                                No, if you remove copy you force pass by reference.

                                                                                                                But you also allow capture by reference for lambdas, for example, so you’re still not memory safe.

                                                                                                                That has the benefit of meaning you can’t make a string_view outlast its lexical context

                                                                                                                True, but that just moves the problem. You can trivially make the reference outlive its lexical context, for example by lambda capture, by using std::forward_as_tuple to capture a parameter pack, and so on. Worse, the things that capture now look a lot more like things that are safe to capture and so it’s harder to spot in code review.

                                                                                                                You also introduce a load of more subtle lifetime issues. By passing std::string_view by value, you can construct them in arguments from things like const char * or std::string references. This makes it very easy to gradually transition an API from one that takes const std::string& and const char * to something uniform. If std::string_view had to be passed by reference then creating a temporary std::string_view and passing it would be very dangerous because the lifetime of the temporary could very easily be shorter than the use, even in the common case where the underlying string had a longer lifetime. Auditing code for this would be much harder than spotting misuse of std::string_view.

                                                                                                                If you want a safe string view, then it needs to be integrated with string memory management and prevent any mutation of a string that has outstanding string views. That would be impossible to integrate with APIs that exposed const char* parameters and so would not have the incremental migration benefits of the current design.

                                                                                                        2. 1

                                                                                                          So is your argument string_view just shouldn’t exist? “by design” implies to me you think there is an alternative design for string_view that would be memory safe.

                                                                                                          I couple that with it behaving as a value object rather than iterators that have pointer semantics in my belief that string_view is bad.

                                                                                                          string_view is tantamount to a pair of iterators, I’m not sure how that makes it a value object and iterators have pointer semantics.

                                                                                                          1. 1

                                                                                                            My argument is that the existent of bad APIs in the past should not be used as a justification to add new unsafe APIs.

                                                                                                            The fact that the implementation is/may be two pointers is irrelevant, it could also be a copy of the data, it could be a pointer and a length, it could be a unordered_map from size_t char, the implementation is entirely moot.

                                                                                                            The string_view API behaves like a value type - it looks (in use) to be a memory safe value type like std::string. The iterator API is via the standard pointer operators *, ->, etc. All things that display very clearly that they are semantically only pointers to data, nothing else.

                                                                                                            What matters is how a type behaves, not your thoughts about how it should be implemented.

                                                                                                            Your argument is “C++ is filled with things that are unnecessarily unsafe because of backwards compatibility, therefore it is ok to add new unsafe things”, which is not compelling to me.

                                                                                                            1. 2

                                                                                                              Your argument is “C++ is filled with things that are unnecessarily unsafe because of backwards compatibility, therefore it is ok to add new unsafe things”, which is not compelling to me.

                                                                                                              My argument is it is impossible to add a memory safe string_view to C++.

                                                                                                              These things aren’t “unnecessarily” unsafe, they’re intrinsically unsafe.

                                                                                                              The argument from a memory safety standpoint is “just don’t use C++”, not “don’t add things to C++”. Adding string_view does not make C++ less safe.

                                                                                                              string_view can’t be a copy of the data because then it ceases to be a view, you would just use string then.

                                                                                                              1. 1

                                                                                                                You could make string_view not movable or copyable, and the problem is solved.

                                                                                                                1. 1

                                                                                                                  It is? How do you implement string_view::substr? I suppose it could be substr(pos, n, LambdaTakingConstRefStrinView f) -> decltype(f(declval<const string_view&>()))


                                                                                                                  auto s = strdup(“ok”);
                                                                                                                  string_view v(s);

                                                                                                                  is still obviously a UAF even with a non-copyable non-movable string_view.

                                                                                                                  Though I suppose to solve that problem we could again use the same continuation trick, and not make the char* ctor public:

                                                                                                                  [](const string_view& v) 
                                                                                                                  { f(v) });
                                                                                                                  1. 1

                                                                                                                    Yes you can explicitly free the backing store but there is a world of difference between safety in the face of explicitly releasing backing buffer - what you’re suggesting is not that far from shared_ptr<std::yolo> ptr = …; ptr->~yolo();

                                                                                                                    The specific issue I have with string_view is the ease with which it makes UAFs possible, without apparently doing anything sus.

                                                                                                                    I think I said elsewhere, but LLVM has llvm::StringRef which has the same API semantics, and it is a frequent source of memory bugs - only sometimes caught during review. It has the same issues: it often appears to work due to incidental lifetime of the backing store, or the general “releasing memory doesn’t magically make the memory cease to exist” behavior that makes manual memory management so fun :)

                                                                                                                    1. 1

                                                                                                                      Sigh, why does lobsters occasionally hold off multiple days before it says “there are replies to your compelling commentary!” and make me seem obsessive.

                                                                                                      2. 2

                                                                                                        isn’t memory safe

                                                                                                        This criticism totally depends on what you use instead! If the alternative is a pointer or reference, they obviously have exactly the same problem – they can outlive the memory:


                                                                                                        I’ve heard this criticism before, but I don’t think it’s relevant with the right mindset: It isn’t a string replacement! As a slice data type, it’s just a fancy reference. It exists to replace references and pointers, not owned strings.

                                                                                                        As the article mentions, a good use is function arguments. The reason is that the caller only needs to make the memory outlive the call. This may seem like a problem if the function needs to store the string, but that’s not a (valid) use case, as then, it needs an owned string, and should therefore take an owned string.

                                                                                                        It is as a unifying replacement for other pointer-ish interfaces that it really shines:


                                                                                                        void foo(const std::string&);
                                                                                                        void foo(const char* c_str);
                                                                                                        void foo(const char* ptr, size_t len);
                                                                                                        void multifoo( // clang-format force line break
                                                                                                            const char* a_ptr, size_t a_len, // clang-format force line break
                                                                                                            const char* b_ptr, size_t b_len, // clang-format force line break
                                                                                                            const char* c_ptr, size_t c_len);


                                                                                                        void foo(std::string_view);
                                                                                                        void multifoo(std::string_view a, std::string_view b, std::string_view c);
                                                                                                      1. 3

                                                                                                        I have doubts about this. He says he threw something together in half an hour that moved some colors around and called it done, and was blown away when other people did lots of cool things. And he saw the code and those cool things were done really messily in code. His takeaway is that you can do cool things or things with clean code.

                                                                                                        I think there are two different problems here.

                                                                                                        First, the author was mentally unengaged and threw together something quickly and then felt embarrassed, and so came up with a dichotomy that, “oh, well, because I did something with clean code, I couldn’t think about how to make what I was doing cool.” I’m reminded of Alan Kay recounting E. Power Biggs’s advice to “Play it grand.”

                                                                                                        Second, the other kids were producing interesting work, but there was no feedback cycle of looking at it with them and saying, “Okay, you can do this more easily here like this. What kind of cool stuff could you do if you had that additional capability?” Kind of the atelier model in art and architecture.

                                                                                                        Yes, producing ambitious work that stretches your technique is going to led you to technical issues which you then have to resolve. That’s the point of the ambitious work. On the other side, when a company hires a senior engineer they’re usually looking for someone who has done that ambitious work and for whom the company’s problems are not particulary ambitious technically.

                                                                                                        1. 2

                                                                                                          I have doubts about this. He says he threw something together in half an hour that moved some colors around and called it done, and was blown away when other people did lots of cool things. And he saw the code and those cool things were done really messily in code. His takeaway is that you can do cool things or things with clean code.

                                                                                                          Yeah, that’s the big difference. How long did it take them? If he spent the same amount of time as they did, would his demo have been even better?

                                                                                                          1. 1

                                                                                                            And also, what would happen if a product manager decided that the skull should be a little taller, slightly to the left and the blood dripping animation should be 30% slower. Oh, and please insert an additional blue skull over here while you’re at it, but that one should wiggle a little. Surely, making small changes like this should take a fraction of the time it took to build the thing from scratch…

                                                                                                            1. 1

                                                                                                              unless what you really want to be is a software engineer, and not the designer of an experience.

                                                                                                              I think one of the points is that “product manager” is the programmer. The whole thing starts from the idea that they learned to program to make the skull.

                                                                                                              1. 1

                                                                                                                I have nothing but respect for that programmer. They wanted to do something and they’ve achieved it through what they knew/could learn. My objection is aimed at the idea that we should still write code like them as professional programmers.

                                                                                                                1. 1

                                                                                                                  If you’re a skull experience creator you can hire some programmers to execute your vision. Those folks are professional programmers. On the other hand if you as the skull experience creator happen to do the programming as well, the concerns/considerations about adding a “blue skull” etc change. You’re intimately familiar with the requirements and the cost of a requirements change. The product manager never surprises the programmer because they are them.

                                                                                                                  My objection is aimed at the idea that we should still write code like them as professional programmers

                                                                                                                  I don’t think this is in conflict with what I quoted. Again if somebody else is deciding where the skull goes you’re clearly not the designer of the experience.

                                                                                                        1. 18

                                                                                                          every written language ever goes from top to bottom, not the reverse. Some to L to R, some go R to L, some do both (boustrophedon) but they all go top to bottom.

                                                                                                          Well that’s not true.

                                                                                                          1. 3

                                                                                                            Note Tagbanwa is traditionally written in vertical columns running from bottom to top and from left to right, however it is read from left to right in horizontal lines.


                                                                                                            1. 2

                                                                                                              [Author here]

                                                                                                              Fascinating. Never heard of those before: thank you!

                                                                                                              I tell you what, though, I don’t fancy trying to learn them. I am a left-hander who lives in a very right-hand dominated world and when I was a child it caused me major difficulty learning to write in the direction righties do.

                                                                                                              But you do more or less have to pick a direction, and I suspect that it’s no coincidence that L-to-R scripts dominate a mostly-right-handed world. I tried to learn to write boustrophedon, just for fun, since when I was under 10, R-to-L was much easier for me. I could do it, but it’s not easy, and it causes issues with numbers and so on. You really need at least a convention for which direction to start in, and I think at the end of the day, staying going in that direction works and simplifies matters.

                                                                                                            1. 7

                                                                                                              To elaborate on what carlmjohnson said, this program is not useful as a program. To be useful it needs to chain to another process, be exposed as a library, or similar. As it is this program will run, remove the variables from its environment and then exit. This will not effect the parent process.

                                                                                                              1. 1

                                                                                                                Even worse: It doesn’t even unset them. It just exits.

                                                                                                                1. 1

                                                                                                                  Yeah that’s new since my comment.

                                                                                                                  1. 1

                                                                                                                    That is not worse. That is a better separation of concerns IMO. Probably the author thought unsetting them in its process would propagate to the parent environment. The readme is still wrong. But the repo subtitle is correct. It exits on the presence of any of them.

                                                                                                                    But as other people stated, in essence this is just a list of variable names and would probably be more useful as such than wrapping it in a go program.

                                                                                                                  2. 1

                                                                                                                    its more thought as a list of known environment variables which are used for storing api tokens, for example like recent happenings in the python ctx package which attempt to steal authentication tokens from the users environment. Unfortunately the description is a little bit off.

                                                                                                                  1. 3

                                                                                                                    Crash-tolerant software is indeed a virtue, but crash-only software often tends to encourage programming practices that make applications fiendishly difficult to model. If every error is a terminal error then you kind of opt out of deterministic control flow.

                                                                                                                    1. 1

                                                                                                                      crash-only software often tends to encourage programming practices that make applications fiendishly difficult to model.

                                                                                                                      Can you expand on that a bit?

                                                                                                                      If every error is a terminal error then you kind of opt out of deterministic control flow.

                                                                                                                      Well, there’s a bit more depth to it than that. For example, within the lifecycle of an individual request, experiencing a transient error (such as a timeout) might be fatal to that request, but not to the web server as a whole. Or for example, if your message queue consumer loses it’s connection; then you’d usually only restart the consumer, rather than the process as a whole.

                                                                                                                      1. 1

                                                                                                                        Is that what crash only means? All errors are crashes? My understanding was more that there was no happy/sad path when you terminated, a normal exit is indistinguishable from an abrupt one due to a crash, so any recovery happens in the startup path (and is constantly exercised).

                                                                                                                        1. 6

                                                                                                                          Going by Wikipedia definition:

                                                                                                                          Crash-only software refers to computer programs that handle failures by simply restarting, without attempting any sophisticated recovery.

                                                                                                                          The argument usually is: sophisticated (precise) error recovery is hard, and if you’re attempting it, you already have a potentially-broken program in an inconsistent state. Throwing it all away and starting from a blank state again is easier, well-tested, and therefore more robust.

                                                                                                                          Take for example an out-of-memory error: if an allocation fails, you can either carefully backtrack the current operation and report the problem to the caller, or just abort() the whole program.

                                                                                                                          I generally agree with the approach, but it’s not a silver bullet. Crash-only systems are prone to DoS-ing themselves. A persistent bad input can put the whole system in a crash loop, instead of being carefully skipped over. Even when everything technically works (and makes incremental progress as it should), the restart may be expensive/disruptive (e.g. loses caches or performs extra work on startup) and the system can still fail due to insufficient throughput.

                                                                                                                          1. 1

                                                                                                                            In a crash-only service, if an incoming request encounters a database invariant violation, does the entire process die? Or if the database connection fails, are all in-flight requests abandoned?

                                                                                                                            Designing software so that it can start up and recover from a wide variety of prior states is a good idea in general, but it’s structurally impossible to write a program that can crash at any point during its execution and reliably leave underlying resources in a recoverable state. Any nontrivial DB transaction commit, for example, is a multi-stage operation. Same with network operations.

                                                                                                                            More generally, it’s definitely a good idea to design the layer above the individual process to be resilient, but you can’t just assert that error handling isn’t a concern of the process. The process, the program, is what human beings need to mentally model and understand. That requires deterministic control flow.

                                                                                                                            1. 2

                                                                                                                              but you can’t just assert that error handling isn’t a concern of the process.

                                                                                                                              I agree that is a bad idea, and I would almost say objectively so. Which is why I don’t think it is actually what the idea of “crash-only” is trying to convey.

                                                                                                                              Again, my understanding was that crash-only software wasn’t “we crash on all errors/we don’t care about errors”, but rather “we don’t have shutdown code/the shutdown code isn’t where all the invariants are enforced”. It’s more about not having atexit handlers than not having catch blocks if you will. All program terminations are crashes, not all errors are crashes. If you have no shutdown code do you have to put that code somewhere else (periodic autosaves say, etc.), which means when you do crash you’re much likely to be closer to a recent good state.

                                                                                                                              1. 1

                                                                                                                                I may misunderstand what “crash-only” means. I take “crash” to mean “terminate the operating system process unceremoniously and without unwinding call stacks”, and I understand “only” not to mean all conceivable errors but definitely more than is strictly necessary.

                                                                                                                              2. 1

                                                                                                                                if an incoming request encounters a database invariant violation, does the entire process die? The original paper “Crash Only Software” talks about it in terms of individual components that can perform micro-reboots, and references Erlang heavily.

                                                                                                                                So for an example, you’d want to use tools such as transactions so you can pretend that a multi-step operation is a single one. Alternatively, make everything idempotent, and retry a lot.

                                                                                                                                structurally impossible to write a program that can crash at any point during its execution and reliably leave underlying resources in a recoverable state.

                                                                                                                                I’m reasonably sure that this is what database software should be designed to do. Unless I’m somehow misunderstanding you.

                                                                                                                                1. 1

                                                                                                                                  There is no OS-level operation against e.g. disks, networks, etc. which can be interrupted at any arbitrary point, and can be reliably recovered-from. You can do your best to minimize the damage of that interruption – and maybe this is what crash-only is gunning for – but you can’t solve the problem. Every transition between layers of abstractions, between system models, represents an exchange of information over time. No matter how you design your protocol, no matter how tiny the “commit” or “sync” signal is, if your operation spans more than one layer of abstraction, and it’s possible for that operation to be interrupted, you can’t avoid the possibility of ending up in an invalid state. That’s fine! My point is only that systems at each layer of abstraction should not only do their best to recover from wonky initial state, but should also do their best to avoid making that wonky state in the first place. If you encounter an error, you should deal with it responsibly, to the best of your ability. That’s it.

                                                                                                                          1. 1

                                                                                                                            I use Vimwiki, especially the diary feature, I don’t bother to version control it (I have backups of course).

                                                                                                                            1. 1

                                                                                                                              Same. I don’t need to sync it to other machines and do regular scheduled borg backups including my vimwiki folder.

                                                                                                                            1. 2

                                                                                                                              Each server socket needs two file descriptors

                                                                                                                              Is this always the case? I made an iocp/uring based http2 server, and it would be cool to claim it can handle 1 million concurrent connections.

                                                                                                                              Also, even without uring, why isn’t one socket per connection enough? They allow for both reading and writing.

                                                                                                                              1. 2

                                                                                                                                More of the quote is:

                                                                                                                                Each server socket needs two file descriptors:

                                                                                                                                A buffer for sending

                                                                                                                                A buffer for receiving

                                                                                                                                You don’t need two file descriptors, (unless you want to count the listening fd which i don’t think they meant), seems like they were conflating the descriptor and the underlying socket, maybe by analogy with pipes, which do need 2 descriptors if you want bi-directional communication.

                                                                                                                                1. 1

                                                                                                                                  That’s my bad. I mis-remembered. For some reason i thought there are two file descriptors. After reviewing the docs, I realized I’m wrong. Both methods are only returning one file descriptor.

                                                                                                                                  From beej’s networking guide:

                                                                                                                                  accept() returns the newly connected socket descriptor, or -1 on error, with errno set appropriately.

                                                                                                                                  The new socket descriptor to be used in subsequent calls, or -1 on error (and errno will be set accordingly).

                                                                                                                                2. 2

                                                                                                                                  There was a Slashdot article over 10 years ago about WhatsApp handling over a million connections on a single machine with FreeBSD / Erlang. Given that io_uring should be more efficient and computers are a lot faster, I would be pretty shocked if you couldn’t handle that many. I’d expect RAM for TLS protocol state to be your limiting factor, though if those connections are actually doing anything then network bandwidth might start to be (WhatsApp connections were mostly idle), though with 40 GigE that’s only 40 KB/s per connection and at that speed NUMA issues and the LLC sizes start to be important concerns (Netflix was saturating 40 GigE with TLS a few years back, the difficult thing was DMAing data into L3, encrypting it, and DMAing the encrypted data out again before the next DMA from disk started evicting things from L3 and slowed everything to a crawl).