1. 4

    I’m about to release an image viewer for SKIF files that runs on 32-bit Mac OS (at least as far back as System 6).

    SKIF (Simple Kinetic Image Format) is a format I designed for rasters that are not still images, but ongoingly updated like a framebuffer. It’s how the Advanced Mac Substitute back end communicates with the host UI front end, and also what my screenshot tools create, so having a general-purpose viewer for it is convenient.

    1. 2

      I’m revising the image format used for the shared memory framebuffer in Advanced Mac Substitute. It’s now called SKIF (Simple Kinetic Image Format). The new version specifies color pixel formats more flexibly.

      This involves making programs that create image files use the new format (without merging those changes yet) and updates to programs that view images to work with both formats, with extensive testing to make sure I haven’t broken anything in the process.

      1. 3

        I have some bad news for you.

        First of all, the code

        _lock.lock();
        if (head[nb] == NULL)
            head[nb] = new atomic_ring_element<Z>(zero);
        _lock.unlock();
        

        is not exception-safe (though this can be easily fixed).

        More problematic is that C++‘s std::atomic<T> isn’t necessarily lock-free:

        1. 2

          It’s not really news that atomics aren’t necessarily lock free; it just means there are some restrictions on what types it makes sense to use them for. Scalars that fit into a CPU word and pointers, basically. By using pointers you can manage larger more complex objects. (Though I don’t believe atomic unique_ptrs are lock-free, so you need to do some manual memory management.)

        1. 1

          I know about JS, that can be a part of copied command. However I wanted to see if there is something more to this story. Unfortunately, paywall (accountwall?) is too much for me. Maybe someone could summarize it here?

          1. 4

            You don’t even need JS. CSS is enough to create what appears as an innocent-looking command line which, when copied, actually includes a malicious command as well.

            https://nakedsecurity.sophos.com/2016/05/26/why-you-cant-trust-things-you-cut-and-paste-from-web-pages/

            1. 1

              copying text on a website can be exploited extremely easily with JavaScript. JavaScript can react to the press of “copy” or the key combination and write something into the clipboard on its own — completely independent from the text we actually wanted to copy. This can cause us to paste commands into our terminal that we did not want.

              Even bigger is the problem that depending on the command we insert, we don’t even have to confirm the execution by pressing enter. If the command contains a new-line \n, it will be executed immediately when we insert it into the terminal.

            1. 2

              Actually, even small writes to a pipe are not atomic once the pipe buffer fills up. The normal thing that happens there is the write partially completes and then blocks and the process is put to sleep. Latter parts of that write call only happen when a reader has read from the pipe to make room. If there are >1 writers to the pipe, then they can be awoken in any order in which case their writers are interleaved. This is avoidable with O_NONBLOCK, of course, but can be a real gotcha.

              As a concrete example, this is not reliable unless you change it to -P1 (defeating the purpose of real-time-speed up of the very slow file program):

              find . -print |
                xargs -P4 file -n -F:XxX: |
                grep ":XxX: .*script" |
                sed -e  's/:XxX: .*$//'`
              

              To make it fail you want to save the output and run this in a tree with a boatload of scripts or just change “script” to “.” to match everything. Then check that output against names in the tree. It may take multiple trials and your -P might need to be about as large as the number of CPU core threads you have, depending upon how busy/idle your system is.

              Anyway, the point is this fails even though every individual write(2) call by the file children of xargs is well below the “atomicity size limit” “guarantee” due to the sleeping/scheduling pattern noted above the example pipeline. (At least they’re well below if you are in a normal file tree where path + separator + file type is a reasonably short string.)

              1. 6

                Actually, even small writes to a pipe are not atomic once the pipe buffer fills up.

                That’s incorrect. According to POSIX: “Write requests of {PIPE_BUF} bytes or less shall not be interleaved with data from other processes doing writes on the same pipe.” The entire small buffer is written in one go, blocking the process first if necessary.

                https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

                1. 3

                  Oops. You are right. I stand corrected. Apologies!

                  That pipeline (and a simpler one with just grep 'XxX.*XxX') does interleave, but it is from stdio buffering. Pipelines work fine with stdbuf -oL file. I should have been more careful about concluding something about the OS.

                  Reading the source, it turns out that file -n|--no-buffer only does unbuffered output when also using --files-from. The file man page (in combination with an strace test to a tty with line buffering) fooled me by saying it was “only useful” (not “only active”) with --files-from .

              1. 2

                It never occurred to me to desire write() calls to be atomic with respect to simultaneous read() calls. Mainly what I think of (and care about) is multiple threads or processes writing to the same file, and those are indeed atomic if it’s a regular file or if the amount written is small enough.

                1. 15

                  The author presents code equivalent to char* p = malloc(8); char* q = p + 4; *q; and argues that it’s not useful to null-check q — because if malloc() returns NULL, then q would be 0x4, not 0x0 (thus passing the null check).

                  However, this is begging the question — q is only able to have an invalid but non-null value after neglecting to null-check p in the first place.

                  I don’t find the article convincing. Just because memset() needs to be usable with any mapped address (which might include 0x0) doesn’t mean that we shouldn’t check for NULL ever.

                  1. 10

                    While his example of passing 0x4 might be bogus, the overall message isn’t. He’s not saying “never check for NULL” (because his sample function dup_to_upper() does check for NULL when it calls malloc()) but the immediate reflex to check the input pointer for NULL won’t really help when there are a large number of invalid addresses that aren’t NULL that could be passed in.

                    The point he made was better made in the book Writing Solid Code. That book changed how I code C, and now, instead of:

                    char *foo(char *s)
                    {
                      if (s == NULL)
                        return NULL;
                      ...
                    }
                    

                    but instead:

                    char *foo(char *s)
                    {
                      assert(s != NULL);
                      ...
                    }
                    

                    In my mind, unless I have a very good reason, passing in NULL to a function is a bug. Checking for NULL and returning an error is quite possibly hiding a real bug. Why are you passing NULL? How did that NULL get there in the first place? Are you not checking for NULL elsewhere?

                    1. 2

                      The use of assert() here goes against sanitary program behaviour. Returning an error (while printing out useful information) is preferable (for example in Koios I used an enum for errors, null is represented as KERR_NULL) because the program can then act on that and save state, even if some of it might be non-useful, saving as much as is possible is just generally good behaviour. It also allows you to do more with that. Maybe you want to recheck assumptions and reload data, or some other kind of error-avoidance/compensation.

                      How would you like it, as a user, if Firefox or Libreoffice, or other programs in which people tend to commonly work, just up and failed for (to the user) no observable reason?

                      I don’t see how this is good guidance for anything but the simplest of C programs.

                      edit: I forgot about NDEBUG.

                      But that in effect makes the check useless for anything but (isolated) testing builds.

                      Checking for NULL and returning an error is quite possibly hiding a real bug.

                      I really don’t see how. Reporting an error and trying to compensate, either by soft-crashing after saving state, or using other / blacklisting data-sources, is not ‘hiding’ a bug. It’s the bare minimum any program should do to adapt to real world cases and scenarios.

                      1. 4

                        The use of assert() here goes against sanitary program behaviour. Returning an error (while printing out useful information) is preferable (for example in Koios I used an enum for errors, null is represented as KERR_NULL) because the program can then act on that and save state, even if some of it might be non-useful, saving as much as is possible is just generally good behaviour. It also allows you to do more with that. Maybe you want to recheck assumptions and reload data, or some other kind of error-avoidance/compensation.

                        Assertions are intended for catching bugs not recoverable errors. An error that isn’t recoverable or was unexpected has occurred and it isn’t safe to continue program execution.

                        How would you like it, as a user, if Firefox or Libreoffice, or other programs in which people tend to commonly work, just up and failed for (to the user) no observable reason?

                        Many large systems including Firefox, the Linux kernel, LLVM, and others use a combination of assertions and error recovery.

                        Assertion usage is one of the rules in NASA’s The Power of Ten – Rules for Developing Safety Critical Code.

                        1. Rule: The assertion density of the code should average to a minimum of two assertions per function. Assertions are used to check for anomalous conditions that should never happen in real-life executions.
                        1. 1

                          An error that isn’t recoverable or was unexpected has occurred and it isn’t safe to continue program execution.

                          A null pointer error doesn’t automagically invalidate all of the state that the user put into the program expecting to get it out again.

                          Many large systems including Firefox, the Linux kernel, LLVM, and others use a combination of assertions and error recovery.

                          Of course.

                          Assertion usage is one of the rules in NASA’s The Power of Ten – Rules for Developing Safety Critical Code. […]

                          Right, but that’s not the usage of the above null check, which advocates for never saving state on null, ever.

                        2. 2

                          If the documentation says “this function requires a valid pointer”, why would I bother checking for NULL and returning an error? It’s a bug if NULL is passed in. The assert() is there just to make sure. I just checked, and when I pass NULL to memset() it crashed. I’m not going to blame memset() for this, as the semantics of the function don’t make sense if NULL is passed in for either of its functions.

                        3. 2

                          Out of interest, why assert() instead of using the ptr and allowing the SEGV signal handler and/or core give you similar info?

                          1. 9

                            Some reasons off the top of my head:

                            • To catch the null pointer as soon as possible. Otherwise, it may be propagated further and cause a segfault much farther away, which is harder to debug. Much more so if a null/corrupt pointer is stored in some data structure instead of just passed to callees, since the stack trace is no longer useful in that case.
                            • To document the expectations of the function to future developers.
                            • You can redefine ASSERT() in debug builds to fail unit tests if they are running, which is more dev-friendly than crashing your test system.
                            1. 1

                              Beside future developers, they’re also useful for documenting expectations for static analysis tools. The Clang static analyzer, for example, takes them into account.

                              1. 1

                                The “find the NULL as soon as possible” makes most sense to me. I guess I was thinking that using it (straight away) would provide this, but I agree we may do non-dangerous things (like storing it somewhere) before we deref it.

                                Thank you.

                              2. 1

                                Some systems are configured to allow mapping memory at NULL, which would open up a potential NULL ptr deref vulnerability wherein arbitrary data was stuffed at 0x0.

                              3. 2

                                I love using assert. It’s simple and concise. In a project I wrote to integrate with Pushover, I use assertions at the beginning of any exported function that takes pointers as arguments.

                                Sample code:

                                EXPORTED_SYM
                                bool
                                pushover_set_uri(pushover_ctx_t *ctx, const char *uri)
                                {
                                
                                	assert(ctx != NULL);
                                	assert(uri != NULL);
                                	assert(ctx->psh_uri == NULL);
                                
                                	ctx->psh_uri = strdup(uri);
                                	return (ctx->psh_uri != NULL);
                                }
                                

                                Also: I almost always use calloc over malloc, so that I know the allocation is in a known state. This also helps prevent infoleaks for structs, including compiler-introduced padding between fields. Using calloc does provide a little perf hit, but I prefer defensive coding techniques over performance.

                                1. 1

                                  The one issue with using calloc() is that a NULL pointer on a system does not have to be all-zero (C standard, POSIX requires a NULL pointer to be all zeros). Yes, in source code, a literal 0 in a pointer context is NULL, but internally, it’s converted to whatever the system deems a NULL address.

                                  I’ve used a custom malloc() that would fill memory with a particular value carefully selected per architecture. For the x86, it would fill the memory with 0xCC. As a pointer, it will probably crash. As an unsigned integer, it’s a sizable negative number. As an unsigned integer, it’s a large number. And if executed, it’s the INT3 instruction, aka, breakpoint. For the Motorola 68000 series, 0xA1 is a good choice—for all the above, plus it can cause a misaligned read access for 16 or 32 bit quantities if used as an address. I forgot what value I used for MIPS, but it was again, for the same reasons.

                              4. 6

                                I just want to salute you for using “begging the question” to describe circular reasoning. I don’t often see that original meaning in the wild.

                                1. 2

                                  Assuming NULL is 0x0 is wrong. Of course. The article’s entire argument falls down right there.

                                  1. 2

                                    On what architectures is NULL not 0x0? I can’t seem to think of any off-hand.

                                    1. 3

                                      Multics has it be -1, AFAIK. Probably not anything with non-integer pointers either like x86 segmentation.

                                      1. 1

                                        Here’s a good page on exactly that question: http://c-faq.com/null/machexamp.html

                                  1. 8

                                    An alternative to youtube-dl is Jamie Zawinski’s youtubedown: https://www.jwz.org/hacks/youtubedown

                                    1. 4

                                      I’m working on a set of data recovery and management tools for Firefox profiles. So far, I have a largely working JSON library in lightweight C++.

                                      My initial attempt in Varyx (my own programming language) is able to recover data from a 6.5 MB decompressed sessionstore file, but, due to certain poorly scaling inefficiencies, takes too long to be practical. The new JSON library will also serve as a prototype for a redesign of Varyx’s object system.

                                      1. 1

                                        Check https://github.com/nst/JSONTestSuite it is the most comprehensive. But there might be gaps. There is also Google that is running a fuzzer on open-source project where you can find small but interesting test-cases.

                                      1. 49

                                        I wonder why you consider unordered maps a failing rather than a reasonable design decision: You want an ordered set of hash keys? Order that list yourself or use a library implementation that does. I like the opinionated decision that Go made from the start, and eventually Perl arrived at, to intentionally produce unpredictable key ordering so naive programmers do not rely on an assumption that isn’t true (and probably incurs overhead).

                                        1. 13

                                          perlsec says that Perl uses unpredictable key ordering for resistance to algorithmic complexity attacks.

                                          1. 12

                                            JFYI Rust also took the same approach. Deliberately randomly different between program runs. The position was to prevent HashDoS attacks.

                                            https://doc.rust-lang.org/std/collections/struct.HashMap.html

                                            1. 6

                                              Python has different hashing per process invocation for security purposes. They just keep a separate insertion ordered array for stable iteration orders.

                                            2. 11

                                              Came here to post exactly that. Ordered maps are significantly more expensive than unordered ones, because hash tables. Making the more expensive collection the default is a bad idea, especially when the benefit is so small — I can only think of a few cases where it’s been needed.

                                              This is a pet peeve of mine because I’ve run into several issues where some existing system or other decides to assign meaning to the order of keys in a JSON object — despite the standard saying they’re unordered — causing serious problems with processing such JSON in a language whose maps are unordered. (Several examples of this come from CouchDB, such as the way it used to use a JSON object to describe the following multipart MIME bodies.)

                                              1. 7

                                                Though one would think “adding the ordering constraint makes it more expensive”, Python landed here because a better dict implementation gave insertion ordering for free.

                                                Now, sure, maybe a billion years down the line we’ll find some other dic management strategy that is better, but Python is pretty mature and the dict change seemed to align well with a lot of stuff.

                                                So Python was faced with either:

                                                • just exposing the key ordering on the standard object
                                                • Keep around this separate object (OrderedDict) despite dict now being able to fulfill its requirements, for what amount to philosophical reasons.

                                                I think pragmatism won out here. And now you don’t have to tell beginners “remember, insertion order on dictionaries aren’t kept!”, You can just say that (or have beginners assume it, correctly if for the wrong reasons).

                                                1. 6

                                                  Ordered maps are significantly more expensive than unordered ones, because hash tables.

                                                  “Because hash tables” what? :-) A dict in Python was a hash table when it was unordered, and remained a hash table when it suddenly became ordered as a side-effect of them wanting to share the keys (which saves a lot of space, given that instances of the same class would all have the same keys). Here’s a concise explanation I wrote recently: https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/en/

                                                  As for “significantly more expensive”, I have no idea what you’re talking about!

                                                  1. 2

                                                    As for “significantly more expensive”, I have no idea what you’re talking about!

                                                    • It adds a separate data structure for the key/value list (or you could say it adds a separate entry-index list.)
                                                    • It adds a memory indirection to every hash table probe.
                                                    • When a key is removed, you have to mess with the entry list. Either you remove the entry, which requires sliding the other entries down and renumbering all the indices in the table; or you turn the entry into a tombstone that can never be reused (in which case you eventually need to GC the tombstones and renumber when there are too many.)

                                                    I’m sure this design is a win for Python, because Python (like JS and Ruby) uses dictionaries to store objects, so it happens to have a ton of dictionaries with identical sets of keys, from which keys are very rarely removed. So this actually saves a significant amount of memory. Without that special circumstance, I doubt the overhead comes close to paying for itself.

                                                    1. 1

                                                      At least more expensive in memory, you need to keep an additional list structure. I don’t really get the point of ordered maps but they’re probably fine for python.

                                                      1. 7

                                                        From what I remember, Python’s dicts got smaller in memory, not larger, when they added the ordering feature.

                                                        It’s worth looking at the notes at the top of https://github.com/python/cpython/blob/v3.8.5/Objects/dictobject.c - the authors went to a lot of trouble to explain their work.

                                                        In the old unordered implementation, each bucket was an “entry” struct (hash, key, value).

                                                        In the new ordered implementation, they have a dense array of (hash, key, value) entries, one per value which is in the hash table. The hash buckets are integer indexes into the entries array, and they change the size of the ints depending on how many entries the dict has - 8 bit ints when there are very few entries, growing to 16, 32 or 64 bits as the number of entries goes up.

                                                        The hash table always has more buckets than it has entries because its load factor can never be 1. In CPython 2 and 3 the load factor is generally between 1/3 and 2/3 because they resize at a 2/3 load factor. So the number of buckets is going to be between 1.5 and 3x the number of entries. Having more int8_t or int16_ts in memory in order to have fewer PyObject*s in memory is a net win.

                                                        The above applies mainly to the combined dict layout. The combined dict layout is the one in which the dict contains both keys & values. The other layout is the “split” layout, where 1 copy of the keys is shared between N dicts, each of which just has a simple flat array of pointers to values. The split layout saves way more memory in the case that it’s designed for, which is storing the fields for instances of classes (taking advantage of the fact that constructors will typically assign exactly the same fields in exactly the same order for every object of a given class).

                                                        1. 1

                                                          For a normal dictionary it won’t apply, right? If the array of (hash, key, value) behaves like a stack, it still consume more than the number of entries because you don’t want each insertion to re-allocate. The exception for object fields is a good one, but it’s only because python is a dictionary-based language; in general an unordered hashmap is smaller than an ordered one simply because there’ literally less information to store.

                                                          1. 2

                                                            Both the entries & buckets arrays grow multiplicatively.

                                                            https://mail.python.org/pipermail/python-dev/2012-December/123028.html shows the arithmetic. From what I remember the new combined layout was designed primarily in order to make dicts use less memory. The fact that it also makes preserving ordering feasibly was a happy side benefit that Python devs chose to take advantage of.

                                                            in general an unordered hashmap is smaller than an ordered one simply because there’ literally less information to store.

                                                            That would only necessarily be the case if you were using a succinct data structure, which no hash table is. They’re a time/space tradeoff, using more space than necessary in order to save time.

                                                        2. 2

                                                          More expensive in memory often turns out to be more expensive in CPU as well. Most to all fast hash maps I’m aware of use some form of open addressing to store the data in a flat array. The only way to maintain ordering in that case would be to bloat each element with pointers and iterate in a much less cache friendly manner. Your CPU is going to end up spinning waiting for data. For Python (and I assume Ruby), everything is already a PyObject*, so the overhead is much lower than it would be in a more value oriented language.

                                                          1. 1

                                                            and iterate in a much less cache friendly manner

                                                            Iterating the entries of a python3 dict is not cache-unfriendly in the way iterating a linked list normally is. Think “arraylist” not “linked list”.

                                                            Keys are stored in a flat array of PyDictKeyEntry structs, each of which is a (hash, key, value) triple. In a split layout dict, the value field is omitted.

                                                            In a split layout dict, the values are stored in a flat array of PyObject*s. In a combined layout dict, the values are stored in the flat array of PyDictKeyEntry structs.

                                                            The implementation for iterating both the keys & values at the same time is at https://github.com/python/cpython/blob/v3.8.5/Objects/dictobject.c#L3715 - functions for iterating only-keys or only-values are just above it.

                                                            1. 1

                                                              To be clear, I was referring to a hypothetical open addressing hash table design. Python squares that circle by but using open addressing. Since it can inline the PyObject* in the key table, you aren’t paying for an extra memory indirection. In a value oriented language (both those without GC and languages like Go), that extra memory indirection would be an unacceptable cost on every lookup.

                                                              1. 1

                                                                I think the same design with the flat entries array & separate indexes into it could work without the PyObject* indirection. If keys & values are both fixed size, use exactly the same design with a fixed-size entry struct. If not, now the integers in the bucket list would be byte offsets to the starts of entries, rather than indices of entries (so you’d have to use the wider sizes sooner). And each entry would need to have one or two length fields too, depending on whether either the keys or values might happen to be fixed-size.

                                                                1. 1

                                                                  It would work in the sense that you can build a functional hash table like that, but that table would still be slower than an open addressing table due to memory indirection. You’re still paying an extra memory indirection on every lookup. In an open addressing table, you have a single memory indirection when looking up the bucket that contains the key. In the ordered table you outlined, you have a memory indirection to look up an integer and a second memory indirection to look up the actual key with that integer.

                                                                  1. 1

                                                                    AFAIK the definition of “open addressing” is that the collision resolution mechanism is based on probing other buckets rather than following a linked list or something - not that there isn’t a memory lookup to get the entry.

                                                                    I’m not aware of anyone in python complaining about a big regression to dict lookup time when 3.6 came out (offhand I see some references to microbenchmarks showing 3% perf differences). The interpreter is pretty slow so maybe it’s just getting buried, but apparently that 1 extra indirection isn’t blowing everything up.

                                                                    1. 1

                                                                      Nowhere did I mention values. I’m specifically explaining the number of memory indirections before being able to compare the key. I absolutely believe that Python was able to maintain performance even with ordering. I’m simply trying to explain why that isn’t possible in general.

                                                                      1. 1

                                                                        Is that much worse than indirection to the values? I get the impression it’s pretty rare to have to probe very many different buckets before finding the right one or running out

                                                                        Separate thought: you could steal say 2 to 4 bits from the bucket integers and put top bits from the hash in them. Then when probing you can often reject a given bucket and move on to the next one without having to check the entries

                                                                        1. 2

                                                                          First, there are a number of use cases where you don’t use the values, notably hash sets but also when checking if a key is in a map. If you store the values next to the keys in the ordered array, that has all the same memory trade-offs as having the keys and values in a single array (i.e. like Abseil flat_hash_map), except for the ordered version has more memory indirection and makes deletes substantially more expensive.

                                                      2. 3

                                                        This is a pet peeve of mine because I’ve run into several issues where some existing system or other decides to assign meaning to the order of keys in a JSON object

                                                        You just gave me flashbacks… OAS and the idiotic way it uses the JSON field order to organise the endpoints in the UI. Makes it extremely hard to work on OAS specifications programmatically as you have to fight your libraries all the way.

                                                        To anyone who’s considering assigning meaning to JSON field order, please switch profession, you weren’t meant to be a programmer…

                                                      3. 6

                                                        Perl hashes (maps) have never been ordered as far as I know. I think the feature is from AWK.

                                                        I don’t believe it’s a conscious decision to have unordered hashes to “keep newbies on their toes”. It’s simply more (machine) efficient not to have to order internally.

                                                        Edit I mostly reacted to the statement

                                                        I like the opinionated decision that Go made from the start, and eventually Perl arrived at

                                                        (my emphasis). Perl has had unordered hashes since the 1980s, while Go was released in 2009.

                                                          1. 4

                                                            I think the quote should be amended to read

                                                            You use a hash for everything in Perl

                                                            ;)

                                                            I had a coworker whose Perl code used the basic data structure of hashes of hashes of hashes … ad infinitum and by Ghod I’m approaching that level myself.

                                                          2. 3

                                                            awk doesn’t preserve order, but you can choose from built-in features (might be gawk specific)

                                                            $ awk 'BEGIN{a["z"]=1; a["x"]=12; a["b"]=42; for(i in a) print i, a[i]}'
                                                            x 12
                                                            z 1
                                                            b 42
                                                            
                                                            $ # index sorted in ascending order as strings
                                                            $ awk 'BEGIN{PROCINFO["sorted_in"] = "@ind_str_asc";
                                                                   a["z"]=1; a["x"]=12; a["b"]=42; for(i in a) print i, a[i]}'
                                                            b 42
                                                            x 12
                                                            z 1
                                                            
                                                            $ # value sorted in ascending order as numbers
                                                            $ awk 'BEGIN{PROCINFO["sorted_in"] = "@val_num_asc";
                                                                   a["z"]=1; a["x"]=12; a["b"]=42; for(i in a) print i, a[i]}'
                                                            z 1
                                                            x 12
                                                            b 42
                                                            
                                                            1. 4

                                                              Thanks for expanding on AWK!

                                                              This is the equivalent Perl code

                                                              my %hash = ( b => 42, x => 12, z => 1 );
                                                              say "    ==> dump the hash";
                                                              foreach my $key ( keys %hash ) {
                                                                  say "    $key $hash{$key}";
                                                              }
                                                              say "    ==> order by key";
                                                              foreach my $key ( sort { $a cmp $b } keys %hash ) {
                                                                  # we use 'cmp' here because the keys are strings
                                                                  say "    $key $hash{$key}";
                                                              }
                                                              say "    ==> order by value";
                                                              foreach my $key ( sort { $hash{$a} <=> $hash{$b} } keys %hash ) {
                                                                  # we use '<=>' because the values are numbers
                                                                  say "    $key $hash{$key}";
                                                              }
                                                              

                                                              Output:

                                                              ==> dump the hash
                                                              z 1
                                                              b 42
                                                              x 12
                                                              ==> order by key
                                                              b 42
                                                              x 12
                                                              z 1
                                                              ==> order by value
                                                              z 1
                                                              x 12
                                                              b 42
                                                              

                                                              Because the result of the function keys %hash is a list, we can apply all sorts of fancy sorting to it, for example, sorting by value and then by key on a tie.

                                                              say "    ==> add a new key with the same value as an existing one";
                                                              $hash{y}=12;
                                                              foreach my $key (sort { $hash{$a} <=> $hash{$b} || $a cmp $b } keys %hash) {
                                                                  say "    $key $hash{$key}";
                                                              }
                                                              
                                                              z 1
                                                              x 12
                                                              y 12
                                                              b 42
                                                              
                                                            2. 2

                                                              Perl hashes (maps) have never been ordered as far as I know. I think the feature is from AWK.

                                                              Maybe I am mistaken. I stopped following Perl circa 2013. I recall that the ordering was an implementation detail and platform-specific but consistent and predictable. So of course, people relied on that and the Perl community said, “No, THAT’S WRONG!” (probably tchrist on something else… but why not reuse a good opener?) but it wasn’t actually fixed for a while.

                                                              1. 4

                                                                tchrist on something else: “You are wicked and wrong to have broken inside and peeked at the implementation and then relied upon it.”

                                                                1. 1

                                                                  Thanks for expanding. I think I remember something like that. In any case, it’s possible that for some small number of keys, the return order would be deterministic (like if the keys were simply one-character strings) and beginners, without internalizing the documentation, observed this and started to rely on a behavior that broke down in other cases.

                                                                  Quoting from the link that @dbremner posted:

                                                                  Perl has never guaranteed any ordering of the hash keys, and the ordering has already changed several times during the lifetime of Perl 5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order and the history of changes made to the hash over its lifetime.

                                                                  (Emphasis in original).

                                                              2. 6

                                                                It’s ergonomics, much like the namedtuple collection mentioned in the same breath. The change being referred to removed the extra step of importing OrderedDict from the collections library when you cast a namedtuple into a dict. If that dict shouldn’t be ordered, there’s probably also no reason for namedtuple to exist. Collections also has other silly-but-useful beauties like defaultdict.

                                                                The choices about many such things in Python seem absurd when you sit down as an engineer to architect an application.

                                                                When you’re doing something like data science, the percentage of code you write that will never be run again dramatically outweighs even code a software engineer would refer to as prototype code. There are 100x or more the circumstances in which you’d type several dozen keyboard characters, run something best characterized as a “code cell“, then delete it because you were wrong (or start a new cell that doesn’t necessarily follow the previous cell in execution order). It’s an activity with requirements halfway between an interactive shell and an executable file.

                                                                When 90% of your work is loading arbitrary or novel data inputs and poking at them to see if they have any life, nothing matters more about your tool than your ability to churn through this iteration cycle quickly.

                                                                Over the past 10 years, the percentage of Python used directly as a human tool (not a language to build human tools) has dramatically shifted the audience for language improvements. Maybe away from what is appropriate for software development, maybe not.

                                                                I write Python for a profession, and there is no application I would engineer in Python instead of Go. But I also think any professional Go developer who just spent the day e.g. unmarshalling json can appreciate there are other activities we all do besides engineering.

                                                                Despite its many other failings, there’s no other tool I’d reach for before Python when some new thing comes at me and I say, “now what’s THIS bullshit.” Quirks like stuffing values into a data structure and getting them back in an intuitive way is part of this charm.

                                                                To put it another way with less fanfare: if you have data in a map that’s less useful to you because that map is ordered, that’s probably already data that shouldn’t be in a Python map. This remains true if you’re already in the middle of writing Python code when this happens (we have non-Python data structures in Python).

                                                                1. 1

                                                                  … But I also think any professional Go developer who just spent the day e.g. unmarshalling json can appreciate there are other activities we all do besides engineering.

                                                                  (emphasis mine) Exploratory programming in go involving json (“nominally curly braced or bracketed UTF8 blobs”) is awful (I’ve also felt this a while back in D-Lang’s std.json library and got into the habit of using other libraries, C++ json libraries are excellent in terms of the programmer interface). If anyone thinks unordered maps is not ergonomic they’ll faint when they deal with json.

                                                                2. 3

                                                                  unordered maps a failing

                                                                  From my experience, having maps preserve insertion order is so much more convenient that it “deserves” to be the default. Additional “evidence” to that is Ruby and Python switching to do exactly that.

                                                                  1. 6

                                                                    I know preserving order is good for job security because I’ve written this genuine line of code for a real project:

                                                                    FIELDS = list({v: k for k, v in list(FIELD_MAP.items())[::-1]}.values())[::-1]
                                                                    

                                                                    But other than that, I can’t think of a time when explicitly using OrderedDict felt like an inconvenience, and there are two obvious benefits: it doesn’t constrain the implementation of dict, and it tells the reader you’re going to do something that cares about the order.

                                                                    1. 2

                                                                      OrderedDict

                                                                      …unordered?

                                                                      1. 1

                                                                        I feel like I’m perhaps missing something. But I meant OrderedDict—as in: in the unusual event that I need ordering, it doesn’t bother me to explicitly ask for it.

                                                                        1. 1

                                                                          I was confused by your comment.

                                                                          I can’t think of a time when explicitly using OrderedDict felt like an inconvenience…

                                                                          “I can’t think of a time when I wanted to use an ordered dict and I felt inconvenienced that the default dict was not ordered.”

                                                                          …it doesn’t constrain the implementation of dict…

                                                                          “Because ordering is requested explicitly (as opposed to if dict was ordered by default) the implementation of dict is not constrained.”

                                                                          …and it tells the reader you’re going to do something that cares about the order.

                                                                          This part is fine but is confusing if one interpreted the previous clauses of your comment in a different way.

                                                                    2. 2

                                                                      Use a list, seriously; arrays’ raison d’etre is to provide you with a collection of ordered items.

                                                                      1. 1

                                                                        I can think of exactly two kinds of thing where I cared. One was implementing LRU caches in coding challenges

                                                                        The other was a dirty hack. I was reinventing protobuf (but for json, with simpler syntax, and better APIs for the target languages), and the code gen was done by defining a python class with the appropriate members, and later looping over their contents. I used metaclass magic to replace the default method dict with an ordered one, then iterated the members of all classes in the namespaces to do code gen:

                                                                         class Event(metaclass=msg):
                                                                               member = int
                                                                               other = str
                                                                         ...
                                                                         for c in scrape_classes():
                                                                               for m in c.members():
                                                                                     lang.gen()
                                                                        

                                                                        For most other things, I don’t think I wanted insertion order – it’s either been key order or value order.

                                                                        Where are you using them often enough that it matters?

                                                                    1. 3

                                                                      We tried colour as a syntactic construct (colorForth), and most Forthers didn’t get it or didn’t like it.

                                                                      1. 2

                                                                        There are a lot of big reasons against it, such as accessibility for the color-blind; the fact that people prefer different backgrounds (light vs dark mode) which totally messes up the contrast of foreground colors; and the higher cost of color printing.

                                                                        1. 1

                                                                          colour-blindness

                                                                          Solved with italics, font weight, underlining etc. Not a real problem.

                                                                          different backgrounds

                                                                          The two options: deal with it (not a real problem), or invert the colours.

                                                                          colour printing

                                                                          Solved the same way as with colourbliness.

                                                                          1. 4

                                                                            If you’re only using two or three colors, I guess you could substitute font styles. But you run out quickly after that, or they become very hard to distinguish. (Some fonts like Univers have five or more weights, but telling Demibold from Bold at a glance isn’t easy. And I wouldn’t want to have to read Ultralight or Black text all day.)

                                                                            Dealing with an inverted background is harder than you think. As one example, yellow text has high contrast in dark mode; but against white it’s almost unreadable. Yeah, it could be inverted, but then whenever you read on StackOverflow about “if this is yellow that means it’s an instance variable” you have to mentally change that to “purple”, which would be a drag.

                                                                        2. 1

                                                                          Using color as syntax begs the question of how the color information is stored in the source file — i.e. what’s its syntax?

                                                                          1. 1

                                                                            With colorForth you had to use the colorForth editor or write your own IDE. It’s why most colorForth inspired Forths just prefix words with a character, instead.

                                                                          1. 5

                                                                            I believe our very own @enkiv is involved in some way.

                                                                            1. 7

                                                                              I was tangentially involved, myself — a few years ago, I did some work for Ted implementing ZigZag.

                                                                              1. 2

                                                                                Awesome! Are your impressions of Ted congruent with the ones form the old Wired article?

                                                                              2. 5

                                                                                I am. I worked on the version covered in this article, and also on the previous release. Since 2014, another (totally independent) web-based version has been released.

                                                                            1. 17
                                                                              There once was a language named COBOL
                                                                              Whose usage had started to snowball
                                                                              With its courses adjourned
                                                                              We just recently learned
                                                                              Of a shortage exceedingly global
                                                                              
                                                                              1. 2

                                                                                Crap!! You stole ;-) the limerick I was about to post, although I’d only made up the first two lines yet…

                                                                              1. 2

                                                                                I think it’s time to introduce a VM based compatibility layer to macOS. So they can break things without worrying about breaking legacy apps. I mean “legacy” here is 10 years or 20 years old apps like this one. Of course, that would slow down the execution speed, but it wouldn’t be a much problem for legacy apps. It would be nicer if old apps can look and work like it was in the classic environment.

                                                                                VM based compatibility approach would make macOS more attractive in the business market.

                                                                                1. 5

                                                                                  Apple introduced a 68K emulator into PowerPC Mac OS, the Blue Box (“Classic”) and Carbon-native CFM support into Mac OS X, and PPC emulation (Rosetta) into Mac OS X on x86. These compatibility layers worked well while they were supported, but that didn’t last.

                                                                                  The future of legacy Mac applications is 3rd-party emulation.

                                                                                  If by “legacy” you’re open to 30-year-old applications, there’s this project: https://www.v68k.org/advanced-mac-substitute/ It now has a Quartz-based front end that runs in Catalina.

                                                                                  1. 1

                                                                                    Wow!

                                                                                  2. 2

                                                                                    Something like Windows on Windows64 (WOW64). So I guess it would be MOM64 :)

                                                                                  1. 5

                                                                                    This is kinda crazy impressive. I’m surprised he was able to get it working.

                                                                                    1. 5

                                                                                      A couple years ago I took the released source code of John Calhoun’s game Glypha III and ported it forward to PPC / Carbon / OS X / Mach-O / x86: https://github.com/jjuran/glypha3-fork

                                                                                      It’s definitely not a trivial process (and there are various gotchas along the way), but it’s doable with enough effort.

                                                                                      1. 3

                                                                                        Wow. I had no idea he released his code. You have just made me want to port his glider game to linux. As someone who dabbled in mac game programming in the ’90s, it makes me really happy that he released his source.

                                                                                        1. 1

                                                                                          The old Mac game I want to do more work with is ZeroGravity, which also has source. I played that incessantly on my friend’s dad’s Mac Plus in 1988.

                                                                                    1. 5

                                                                                      This is really cool. I love how people go out of their way to express their fondness for the classic Mac era.

                                                                                      1. 23

                                                                                        the way i see it,

                                                                                        • there was a bsd-licensed osxfuse project
                                                                                        • fleischer was the maintainer and pretty much sole developer of it
                                                                                        • lots of companies maintained their own private forks of it, bundling it with proprietary apps and not contributing anything back upstream
                                                                                        • fleischer also maintained his own private fork
                                                                                        • his fork happens to work on catalina
                                                                                        • he is offering companies whose forks do not work on catalina the opportunity to access his source code, as a commercial deal

                                                                                        other than the fact that he deliberately concealed the fact that he was no longer developing a bsd version (presumably so that no one else would bother working on catalina support until it was too late) i have no issues with what he did.

                                                                                        1. 27

                                                                                          I have yet to encounter a free software or open source license that guarantees the freedom to be informed of the author’s otherwise private business plans in advance. :-)

                                                                                          osxfuse became de facto non-free software once the Kernel Extension Signing Certificate was required, much like Linux on the TiVo. Sure, you can change it but then you can’t run it. The fact that it’s no longer gratis isn’t a setback for software freedom at all.

                                                                                          1. 10

                                                                                            An interesting fact about tivoisation that shouldn’t be forgotten is that Linus Torvalds is outspokenly okay with the method.

                                                                                            1. 5

                                                                                              My impression is that opposition to tivoization was mostly from Stallman/FSF.

                                                                                              1. 5

                                                                                                At times I wonder if the FSF’s push against Tivoization was the beginning of their decline. Linus and the Linux kernel ignored them, big companies got cold feet on “free software” and moved to “open source” (and the symptoms of it like say, Mac OS not shipping modern bash, the rise of LLVM, etc.), and the GPLv3 is seen as overreach by lawyers.

                                                                                            2. 1

                                                                                              I have yet to encounter a free software or open source license that guarantees the freedom to be informed of the author’s otherwise private business plans in advance. :-)

                                                                                              that’s a good point :) and if the companies really cared they could have reached out to him and asked if there would be a catalina version, at which point he could have asked them to pay him to work on it.

                                                                                              1. 1

                                                                                                The fact that it’s no longer gratis isn’t a setback for software freedom at all.

                                                                                                It’s a kernel extension for an OS that isn’t free (AFAICT only Mojave source is available and it’s not like you can build MacOS) and seeks to restrict what you can do with it (via Gatekeeper, which can be turned off but is harder to do so than before - this is why I say seeks rather than restricts). I’m not sure there’s that much software freedom to begin with.

                                                                                              2. 12

                                                                                                lots of companies maintained their own private forks of it, bundling it with proprietary apps and not contributing anything back upstream

                                                                                                This is exactly the reason why we have GNU GPL and copyleft.

                                                                                                  1. 16

                                                                                                    Of course. But „private“ means that you are using given software for your own purposes (at your household or at your company/organization). It is OK that you do not have to publish your changes.

                                                                                                    However if someone is „bundling it with proprietary apps“ (which is the case we are talking about), then he is distributing the software to his customers – and at this point GNU GPL comes into play and says that he must share also the source codes with his customers (not with public, but the customers can publish them because GNU GPL grants them this right).

                                                                                                    P.S. For the software that is not distributed to the users but is used over the network, there is GNU Affero GPL. And TiVoization is avoided by GNU GPLv3 (since 2007).

                                                                                                    1. 1

                                                                                                      Yep, it also does not require upstream commits. Your only obligation to the maintainer is credit and preserving the freedom of the user.

                                                                                                    2. 0

                                                                                                      There may not have been anything to commit back upstream (i.e. no changes to osxfuse were made), and the source code for the application is probably not very useful for this osxfuse developer.

                                                                                                      1. 2

                                                                                                        “Maintaining a private fork” implies making changes to the source. If you’re just downloading the code but not making changes to it, you aren’t “maintaining” it.

                                                                                                        1. 1

                                                                                                          “Maintaining a private fork”

                                                                                                          Where is that claim being made? All I see being claimed is that companies are “using” it, not modifying it.

                                                                                                          1. 1

                                                                                                            https://lobste.rs/s/2alill/osxfuse_is_no_longer_open_source#c_1mh8hv:

                                                                                                            lots of companies maintained their own private forks of it, bundling it with proprietary apps and not contributing anything back upstream

                                                                                                            This is exactly the reason why we have GNU GPL and copyleft.

                                                                                                            The very comment you replied to discusses it.

                                                                                                            1. 1

                                                                                                              That’s just the summary/paraphrasing of someone (“the way i see it [..]”); I think they were just making assumptions? I read the article and all linked issues, and don’t see a mention of anything like a “fork”, just “using”, but perhaps I missed it?

                                                                                                              As I understand it forking – private or otherwise – would actually be rather hard (though not impossible) because of the special kernel module signing certificate, which are not so easy to obtain.

                                                                                                  1. 5

                                                                                                    Today is my first performance with the Washington Metropolitan Gamer Symphony Orchestra! I’m singing bass.

                                                                                                    1. 7

                                                                                                      I wish it was called web.. One syllable is far more convenient in speech than nine.

                                                                                                      1. 2

                                                                                                        I also wondered if the english speaking world would change “double-u” into something shorter now that we say “www” so often. Turns out we rather got rid of the “www” instead of the “double-u”.

                                                                                                        1. 3

                                                                                                          I shorten it to “dub”. So www is “dub dub dub”.

                                                                                                        2. 1

                                                                                                          Just pronounce it “woooo” :)

                                                                                                          1. 1

                                                                                                            I say “triple double-U”, which is only five syllables.

                                                                                                          1. 2

                                                                                                            One of my longstanding complaints about Firefox has been its CPU usage when idle. This is very welcome news.

                                                                                                            1. 2

                                                                                                              Mine too – hence my disappointment (as a Linux user) when I got to the “on macOS” part.

                                                                                                              1. 2

                                                                                                                Should be much much easier to do partial present (damage tracking) with EGL, so I’m going to try implementing that soon :)

                                                                                                                Do keep in mind that this optimization applies mostly to the “tiny animation on the screen” situation. When idling completely, nothing is redrawn, and when scrolling, you have to paint pretty much the whole window anyway

                                                                                                                1. 1

                                                                                                                  I will say WebRender on my Talos II seems to work well in my so far cursory testing, so there’s good news in 70 for Linux too even though it’s not officially supported yet.