1. 12

    After reading this through, I’m not convinced it isn’t satire.

    1. 1

      It has to be. I’ve never seen someone say they intentionally don’t version any of their software.

    1. 7

      There’s some value in supporting odd platforms, because it excercises portability of programs, like the endianness issue mentioned in the post. I’m sad that the endian wars were won by the wrong endian.

      1. 5

        I’m way more happy about the fact that the endian wars are over. I agree it’s a little sad that it is LE that won, just because BE is easier to read when you see it in a hex dump.

        1. 4

          Big Endian is easy for us only because we ended up with some weird legacy of using Arabic (right-to-left) numbers in Latin (left-to-write) text. Arabic numbers in Arabic text are least-significant-digit first. There are some tasks in computing that are easier on little-endian values, none that are easier on big-endian, so I’m very happy that LE won.

          If you want to know the low byte of a little-endian number, you read the first byte. If you want to know the top byte of a little-endian number, you need to know its width. The converse is true of a big-endian number, but if you want to know the top byte of any number and do anything useful with it then you generally do know its width because otherwise ‘top’ doesn’t mean anything meaningful.

          1. 2

            Likewise, there are some fun bugs only big endian can expose, like accessing a field with the wrong size. On little endian it’s likely to work with small values, but BE would always break.

        2. 2

          Apart from “network byte order” looking more intuitive to me at first sight, could you eloborate why big endian is better than little endian? I’m genuinely curious (and hope this won’t escalate ;)).

          1. 10

            My favorite property of big-endian is that lexicographically sorting encoded integers preserves the ordering of the numbers itself. This can be useful in binary formats. Since you have to use big-endian to get this property, a big-endian system doesn’t need to do byte swapping before using the bytes as an integer.

            Also, given that we write numbers with the most significant digits first, it just makes more “sense” to me personally.

            1. 5

              Also, given that we write numbers with the most significant digits first, it just makes more “sense” to me personally.

              A random fact I love: Arabic text is right-to-left, but writes its numbers with the same ordering of digits as Latin texts… so in Arabic, numbers are little-endian.

              1. 3

                Speaking of Endianness: In Arabic, relationships are described from the end farthest from you to the closest, as in if you were to naively describe the husband of a second cousin, instead of saying “my mother’s cousin’s daughter’s husband” you would say “the husband of the daughter of the cousin of my mother” and it makes it insanely hard to hold it in your head without a massive working memory (because you need to reverse it to actually grok the relationship) but I always wonder if it’s because I’m not the most fluent Arabic speaker or if it’s a problem for everyone that speaks it.

                1. 2

                  My guess is that it is harder for native speakers as well, but they don’t notice it because they are used to it. A comparable case I can think of is a friend of mine who is a native German speaker who came to the States for a post-doc. He commented that after speaking English consistently for a while, he realized that German two digit numbers are needlessly complicated. Three and twenty is harder to keep in your head than twenty three for the same reason.

                  1. 2

                    German has nothing to Danish.

                    95 is “fem og halvfems” - “five and half-five”, where the final five refers to five twentys (100), and the “half” refers to half of 20, i.e. 10, giving 90.

                    It’s logical once you get the hang of it…

                    In Swedish it’s “nittiofem”.

            2. 3

              Little-endian vs. big-endian has a good summary of the trade-offs.

              1. 3

                I wondered this often and figured everyone just did the wrong thing, because BE seems obviously superior. Just today I’ve been reading RISC-V: An Overview of the Instruction Set Architecture and noted this comment on endianness:

                Notice that with a little endian architecture, the first byte in memory always goes into the same bits in the register, regardless of whether the instruction is moving a byte, halfword, or word. This can result in a simplification of the circuitry.

                It’s the first time I’ve noticed something positive about LE!

                1. 1

                  From what I hear, it mostly impacted smaller/older devices with small buses. The impact isn’t as big nowadays.

                2. 2

                  That was a bit of tongue-in-cheek, so I don’t really want to restart the debate :)

                  1. 2

                    Whichever endianness you prefer, it is the wrong one. ;-)

                    Jokes aside, my understanding is that either endianness makes certain types of circuits/components/wire protocols easier and others harder. It’s just a matter of optimizing for the use case the speaker cares about more.

                  2. 1

                    Having debugged on big-endian for the longest time, I miss “sane” memory dumps on little-endian. It takes a bit more thought to parse them.

                    But I started programming on the 6502, and little-endian clearly makes sense when you’re cascading operations 8 bits at a time. I had a little trouble transitioning to the big-endian 16-bit 9900 as a result.

                  1. 20

                    Python package maintainers rarely use semantic versioning and often break backwards compatibility in minor releases. One of several reasons that dependency management is a nightmare in Python world.

                    1. 18

                      I generally consider semantic versioning to be a well-intentioned falsehood. I don’t think that package vendors can have effective insight into which of their changes break compatibility when they can’t have a full bottom-up consumer graph for everyone who uses it.

                      I don’t think that Python gets this any worse than any other language.

                      1. 20

                        I’ve heard this opinion expressed before… I find it to be either dangerously naive or outright dishonest. There’s a world of difference between a) the rare bug fix release or nominally-orthogonal-feature-add release that unintentionally breaks downstream code and b) intentionally changing and deprecating API’s in “minor” releases.

                        In my view, adopting SemVer is a statement of values and intention. It communicates that you value backwards compatibility and intend to maintain it as much as is reasonably possible, and that you will only knowingly break backwards compatibility on major release increments.

                        1. 18

                          In my view, adopting SemVer is a statement of values and intention. It communicates that you value backwards compatibility and intend to maintain it as much as is reasonably possible, and that you will only knowingly break backwards compatibility on major release increments.

                          A “statement of values and intention” carries no binding commitment. And the fact that you have to hedge with “as much as is reasonably possible” and “only knowingly break” kind of gives away what the real problem is: every change potentially alters the observable behavior of the software in a way that will break someone’s reliance on the previous behavior, and therefore the only way to truly follow SemVer is to increment major on every commit. Which is the same as declaring the version number to be meaningless, since if every change is a compatibility break, there’s no useful information to be gleaned from seeing the version number increment.

                          And that’s without getting into some of my own direct experience. For example, I’ve been on the Django security team for many years, and from time to time someone has found a security issue in Django that cannot be fixed in a backwards-compatible way. Thankfully fewer of those in recent years since many of them related to weird old functionality dating to Django’s days as a newspaper CMS, but they do happen. Anyway, SemVer’s answer to this is “then either don’t fix it, or do but no matter how you fix it you’ve broken SemVer and people on the internet will scream at you and tell you that you ought to be following SemVer”. Not being a fan of no-win situations, I am content that Django has never and likely never will commit to following SemVer.

                          1. 31

                            A “statement of values and intention” carries no binding commitment.

                            A label on a jar carries no binding commitment to the contents of the jar. I still appreciate that my salt and sugar are labelled differently.

                            1. 2

                              Selling the jar with that label on it in many countries is a binding commitment and puts you under the coverage of food safety laws, though.

                            2. 6

                              Anyway, SemVer’s answer to this is “then either don’t fix it, or do but no matter how you fix it you’ve broken SemVer and people on the internet will scream at you and tell you that you ought to be following SemVer”.

                              What do you mean? SemVer’s answer to “this bug can’t be fixed in a backwards-compatible way” is to increment the major version to indicate a breaking change. You probably also want to get the message across to your users by pushing a new release of the old major version which prints some noisy “this version of blah is deprecated and has security issues” messages to the logs.

                              It’s not perfect, I’m not saying SemVer is a silver bullet. I’m especially worried about the effects of basing automated tooling on the assumption that no package would ever push a minor or patch release with a breaking change; it seems to cause ecosystems like the NPM to be highly fragile. But when taken as a statement of intent rather than a guarantee, I think SemVer has value, and I don’t understand why you think your security issue anecdote requires breaking SemVer.

                              1. 7

                                What do you mean? SemVer’s answer to “this bug can’t be fixed in a backwards-compatible way” is to increment the major version to indicate a breaking change.

                                So, let’s consider Django, because I know that well (as mentioned above). Typically Django does a feature release (minor version bump) every 8 months or so, and every third one bumps the major version and completes a deprecation cycle. So right now Django 3.1 is the latest release; next will be 3.2 (every X.2 is an LTS), then 4.0.

                                And the support matrix consists of the most recent feature release (full bugfix and security support), the one before that (security support only), and usually one LTS (but there’s a period at the end of each where two of them overlap). The policy is that if you run on a given LTS with no deprecation warnings issued from your code, you’re good to upgrade to the next (which will be a major version bump; for example, if you’re on 2.2 LTS right now, your next LTS will be 3.2).

                                But… what happens when a bug is found in an LTS that can’t be fixed in a backwards-compatible way? Especially a security issue? “Support for that LTS is cut off effective immediately, everybody upgrade across a major version right now” is a non-starter, but is what you propose as the correct answer. The only option is to break SemVer and do the backwards-incompatible change as a bugfix release of the LTS. Which then leads to “why don’t you follow SemVer” complaints. Well, because following SemVer would actually be worse for users than this option is.

                                1. 3

                                  But… what happens when a bug is found in an LTS that can’t be fixed in a backwards-compatible way?

                                  Why do people run an LTS version, if not for being able to avoid worrying about it as a dependency? If you’re making incompatible changes: forget about semver, you’re breaking the LTS contract, and you may as well tell drop the LTS tag and people to run the latest.

                                  1. 1

                                    you may as well tell drop the LTS tag and people to run the latest

                                    I can think of only a couple instances in the history of Django where it happened that a security issue couldn’t be fixed in a completely backwards-compatible way. Minimizing the breakage for people – by shipping the fix into supported releases – was the best available option. It’s also completely incompatible with SemVer, and is a great example of why SemVer is at best a “nice in theory, fails in practice” idea.

                                    1. 3

                                      Why not just tell them to upgrade? After all, your argument is essentially that stable APIs are impossible, so why bother with LTS? Every argument against semver also applies against LTS releases.

                                      1. 3

                                        After all, your argument is essentially that stable APIs are impossible

                                        My argument is that absolute perfect 100% binding commitment to never causing a change to observable behavior ever under any circumstance, unless also incrementing the major version at the same time and immediately dropping support for all users of previous versions, is not practicable in the real world, but is what SemVer requires. Not committing to SemVer gives flexibility to do things like long-term support releases, and generally people have been quite happy with them and also accepting of the single-digit number of times something had to change to fix a security issue.

                                  2. 2

                                    “Support for that LTS is cut off effective immediately, everybody upgrade across a major version right now” is a non-starter

                                    If it’s a non-starter then nobody should be getting the critical security patch. You’re upgrading from 2.2 to 3.0 and calling it 2.2.1 instead. That doesn’t change the fact that a breaking change happened and you didn’t bump the major version number.

                                    You can’t issue promises like “2.2.X will have long term support” because that’s akin to knowing the future. Use a codename or something.

                                    1. 7

                                      It’s pretty clear you’re committed to perfect technical adherence to a rule, without really giving consideration to why the rule exists. Especially if you’re at the point of “don’t commit to supporting things, because supporting things leads to breaking SemVer”.

                                      1. 4

                                        They should probably use something like SemVer but with four parts, e.g. Feature.Major.Minor.Patch

                                        • Feature version changes -> We’ve made significant changes / a new release (considered breaking)
                                        • Major version change -> We’ve made breaking changes
                                        • Minor version change -> Non breaking new features
                                        • Patch version change -> Other non-breaking changes

                                        That way 2.*.*.* could be an LTS release, which would only get bug fixes, but if there was an unavoidable breaking change to fix a bug, you’d signal this in the version by e.g. going from 2.0.5.12 to 2.1.0.0. Users will have to deal with the breaking changes required to fix the bug, but they don’t have to deal with all the other major changes which have gone into the next ‘Feature’ release, 3.*.*.*. The promise that 2.*.*.*, as an LTS, will get bug fixes is honored. The promise that the major version must change on a breaking change is also honored.

                                        SemVer doesn’t work if you try to imbue the numbers with additional meanings that can contradict the SemVer meanings.

                                        1. 3

                                          This scheme is very similar to Haskell’s Package Versioning Policy (PVP).

                                        2. 1

                                          I’m saying supporting things and adhering to SemVer should be orthogonal.

                                  3. 5

                                    every change potentially alters the observable behavior of the software

                                    This is trivially false. Adding a new helper function to a module, for example, will never break backwards compatibility.

                                    In contrast, changing a function’s input or output type is always a breaking change.

                                    By failing to even attempt to distinguish between non-breaking and breaking changes, you’re offloading work onto the package’s users.

                                    Optimize for what should be the common case: non-breaking changes.

                                    Edit: to expand on this, examples abound in the Python ecosystem of unnecessary and intentional breaking changes in “minor” releases. Take a look at the numpy release notes for plenty of examples.

                                    1. 7

                                      Python’s dynamic nature makes “adding a helper function” a potentially breaking change. What if someone was querying, say, all definitions of a module and relying on the length somehow? I know this is a bit of a stretch, but it is possible that such a change would break code. I still value semver though.

                                      1. 3

                                        The number of definitions in a module is not a public API. SemVer only applies to public APIs.

                                        1. 4

                                          If you can access it at run-time, then someone will depend on it, and it’s a bit late to call it “not public”. Blame Python for exposing stuff like the call stack to introspection.

                                          1. 2

                                            Eh no? SemVer is very clear about this. Public API is whatever software declares it to be. Undeclared things can’t be public API, by definition.

                                            1. 7

                                              Python has no concept of public vs private. It’s all there all the time. As they say in python land, “We’re all consenting adults here”.

                                              I’m sure, by the way, when Hettinger coined that phrase he didn’t purposely leave out those under the age of 18. Language is hard. :P

                                      2. 1

                                        Adding a new helper function to a module, for example, will never break backwards compatibility.

                                        Does this comic describe a violation of SemVer?

                                        You seriously never know what kinds of things people might be relying on, and a mere definition of compatibility in terms of input and output types is woefully insufficient to capture the things people will expect in terms of backwards compatibility.

                                        1. 6

                                          No, it does not descripbe a violation of SemVer, because spacebar heating is not a public API. SemVer is very clear about this. You are right people will still complain about backward compatibility even if you are keeping 100% correct SemVer.

                                    2. 6

                                      I would agree if violations were rare. Every time I’ve tried to solve dependency issues on Python, about 75% of the packages I look into have broken semver on some level. Granted, I probably have a biased sampling technique, but I find it extremely hard to believe that it’s a rare issue.

                                      Backwards compatibility is hard to reason about, and the skill is by no means pervasive. Even having a lot of experience looking for compatibility breaks, I still let things slip, because it can be hard to detect. One of my gripes with semver is that it doesn’t scale. It assumes that tens of thousands of open source devs with no common training program or management structure all understand what a backwards breaking change is, and how to fix it.

                                      Testing for compatibility breaks is rare. I can’t think of any Python frameworks that help here. Nor can I think of any other languages that address this (Erlang might, but I haven’t worked with it first-hand). The most likely projects to test for compatibility between releases are those that manage data on disk or network packets. Even among those, many rely on code & design review to spot issues.

                                      It communicates that you value backwards compatibility and intend to maintain it as much as is reasonably possible, and that you will only knowingly break backwards compatibility on major release increments.

                                      It’s more likely that current package managers force you into semver regardless if you understand how it’s supposed to be used. The “statement of values” angle is appealing, but without much evidence. Semver is merely popular.

                                      1. 7

                                        I guess this depends on a specific ecosystem? Rust projects use a lot of dependencies, all those deps use semver, and, in practice, issues rarely arise. This I think is a combination of:

                                        • the fact that semver is the only option in Rust
                                        • the combination of guideline to not commit Cargo.lock for libraries + cargo picking maximal versions by default. This way, accidental incompatibilities are quickly discovered & packages are yanked.
                                        • the guideline to commit Cargo.lock for binaries and otherwise final artifacts: that way folks who use Rust and who have the most of deps are shielded from incompatible updates.
                                        • the fact that “library” is a first-class language construct (crate) and not merely a package manager convention + associated visibility rules makes it easier to distinguish between public & private API.
                                        • Built-in support for writing test from the outside, as-if you are consumer of the library, which also catches semver-incompatible changes.

                                        This is not to say that semver issues do not happen, just that they are rare enough. I’ve worked with Rust projects with 200-500 different deps, and didn’t pensive semver breakage being a problem.

                                        1. 5

                                          I would add that the Rust type system is expressive enough that many backwards incompatible changes require type signature changes which are much more obvious than violations of some implicit contract.

                                      2. 6

                                        I don’t think I have a naïve view of versioning; putting on my professional hat here, I have a decade of experience dealing with a dependency modeling system that handles the versions of hundreds of thousands of interrelated software artifacts that are versioned more or less independently of each other, across dozens of programming languages and runtimes. So… some experience here.

                                        In all of this time, I’ve seen every single kind of breaking change I could imagine beforehand, and many I could not. They occurred independent of how the vendor of the code thought of it; a vendor of a versioned library might think that their change is minor, or even just a non-impacting patch, but outside of pure README changes, it turns out that they can definitely be wrong. They certainly had good intentions to communicate the nature of the change, but that intention can run hard into reality. In the end, the only way to be sure is to pin your dependencies, all the way down, and to test assiduously. And then upgrade them frequently, intentionally, and on a cadence that you can manage.

                                        1. 1

                                          I don’t think I have a naïve view of versioning; putting on my professional hat here, I have a decade of experience dealing with …

                                          Here here. My experience isn’t exactly like @offby1’s but I can vouch for the rest.

                                        2. 4

                                          to be either dangerously naive or outright dishonest

                                          This phrase gets bandied around the internet so much I’m surprised its not a meme.

                                          SemVer is … okay, but you make it sound like lives depend on it. There’s a lot of software running mission critical systems without using SemVer and people aren’t dying everyday because of it. I think we can calm down.

                                      3. 3

                                        Thats the problem of the package management being so old. Back then semantic versioning wasnt that common and it never really caught on. In my opinion the PyPA should make a push to make more packages use semantic versioning. I‘m seeing this trend already, but its too slow…

                                      1. 5

                                        What is the actual path forward fixing the problem? Bringing Rust/LLVM support to all of those platforms? I can understand the reasoning by the maintainers that C is inherently insecure, but not being able to use the package for the foreseeable future isn‘t really an option either. Well it might spark some innovation :D

                                        1. 14

                                          Speaking in realistic terms and fully acknowledging that it is in some ways a sad state of affairs. Most of those platforms are dying and continuing to keep the alive is effectively volunteering to keep porting advancements from the rest of the world onto the platform. If you want to use third party packages on an AIX box you sort of just have to expect that you’ll have to share the labor of keeping that third party package working on AIX. The maintainers are unlikely to be thinking about you and for good reason.

                                          For users of Alpine linux the choice to use a distro that is sufficiently outside the mainstream means you are also effectively committing to help port advancements from the rest of the world onto the platform if you want to consistently use them.

                                          For both categories you can avoid that implicit commitment by moving to more current and mainstream platforms.

                                          1. 12

                                            Exactly. And as a long term implication, if Rust is here to stay, the inevitable fate of those platforms is “Gain Rust support or die”. Maintaining a C implementation of everything in the name of backward compatibility is only delaying the inevitable, and is ultimately a waste of time.

                                          2. 6

                                            I see it like this: Rust is not going away. This won’t be the last project introducing rust support.

                                            Either your plattform supports rust or the world will stop supporting your plattform.

                                            1. 5

                                              This definitely seems to be true. I helped drive Rust support for a relatively niche platform (illumos), and while it took a while to get it squared away everybody I spoke with was friendly and helpful along the way. We’re a Tier 2 platform now, and lots of Rust-based software just works. We had a similar experience with Go, another critical runtime in 2021, which also required bootstrapping, being no longer written in C and so on.

                                            2. 8
                                              1. The package maintainers agree not to break shit, or
                                              2. Someone from among those affected volunteers to maintain a fork.
                                              1. 5

                                                I mean, you can always pin your dependency to the version before this one. No way that could come back and bite you </sarcasm>

                                                1. 2

                                                  I think GCC frontend for Rust, which recently got funded, will solve this problem.

                                                  1. 3

                                                    Rust crates tend to rely on recently added features and/or library functions. Given that GCC releases are far less frequent, I think there will be a lot of friction when using the GCC frontend.

                                                    1. 3

                                                      Maintaining compatibility with a 6/9/12 month old rust compiler is a much smaller ask of the maintainers than maintaining a C library indefinitely.

                                                1. 2

                                                  I once wondered why some Google API returned structures like {'lo': 123, 'hi': 0} instead of just the number 123, and realized that the numbers probably can grow beyond 2**53.

                                                  1. 2

                                                    This is why I’m a big fan of using configuration formats with types. In practice for me, that’s meant protos, but anything where you can trivially typecheck in CI is a big win.

                                                    1. 5

                                                      This seems like a good compromise to me. The tools that provide safety eventually fail, but you need social pressure to avoid devs saying ‘f*** it. We’ll do it live.’ every day.

                                                      1. 9

                                                        I think devs do generally care about correctness, but they get paid not to care. People working on security, availability and correctness all have the problem that it’s really hard to convince the holder of the purse to pay for ensuring something doesn’t happen. How much is every bug that doesn’t make it to production worth?

                                                        1. 22

                                                          tl;dr: I, personally, like Golang more than C++. Ergo, C++ is bad.

                                                          Can we cut it with these fact-free language wars already?

                                                          1. 7

                                                            People can have their opinions right? Doesn’t mean we have to read them, that’s where the hide button is for.

                                                            C++ is changing rapidly, (as rapid as an iso standard can change), I guess in the future we might even ditch types all together. We already can store “anything” without explicitly knowing the type, I recently wrote about that: https://raymii.org/s/articles/Store_multiple_types_in_a_single_stdmap_in_cpp_just_like_a_python_dict.html - recently there was this as well: https://artificial-mind.net/blog/2020/10/10/return-type-overloading and with all the templates and auto everywhere, you can write a dialect of c++ which looks like Javascript or python…

                                                            Most of these “opinions” don’t write c++ professionally, but just in university. If you need raw performance or work with anything embedded, c++ is your best bet. But all these youths only know Javascript and the hip frameworks of today, which reflects in their opinions. Not to say that is wrong, and I do like that they write, I like reading opinions that differ from my own. However, I think that Your argument falls apart quickly if your biggest gripe is “std::” is annoying” or as toy say, it’s not go.

                                                            1. 3

                                                              yes but headlining the own opinion as absolute truth doesn’t hold its promise

                                                              1. 2

                                                                I don’t think that types will ever go away. You can do typeless programming right now with void*. But nobody is doing it, because it’s not useful.

                                                                Python, Ruby, JavaScript (-> TypeScript) all started without types and are heading towards types (at least they’re giving an option to use them).

                                                                My crystal ball tells me that in the future we’re more likely to use conditional types (so, more strict type of types) than to drop types altogether.

                                                              2. 2

                                                                Go is a weird comparison. If you’re in a problem domain where global garbage collection is acceptable in terms of latency and memory overheads, C++ is definitely the wrong choice because there are a lot of languages where you can be more productive if you’re willing to sacrifice some control. Once you’re in that worlds, there are also a lot of languages that are nicer than Go, the only advantage Go has in that space is that it’s easy to build stand-alone statically-linked binaries.

                                                              1. 1

                                                                Does Heroku offer network peering with AWS services on other accounts? Or does AWS magically reroute RDS traffic to public IPs via AWS infrastructure? I’m trying to understand why there isn’t a latency penalty since naively routing from Heroku VMs on AWS to a public IP is going to hit the public internet.

                                                                1. 2

                                                                  Hi, author here. Honestly I don’t know tech details on how it works. From my experience migrating a medium size client’s application from Heroku addon to RDS, none of the performance measuring tools reported any overhead after the switch.

                                                                  Maybe Heroku routing is able to detect that RDS hostname is in the same AWS region and does not go over the public net?

                                                                1. 1

                                                                  With all the discussion on using types to prevent mistakes on here recently, it’s nice to see an actual, practical example.

                                                                  1. 17

                                                                    I’m working on some ‘pretty big’ (several kilolines) project on rust, and two things that frustrate me to no end:

                                                                    • All the stuff around strings. Especially in non-systems programming there’s so much with string literals and the like, and Rust requires a lot of fidgeting. Let’s not even get into returning heap-allocated strings cleanly from local functions. I (think) I get why it’s all like this, but it’s still annoying, despite all the aids involved

                                                                    • Refactoring is a massive pain! It’s super hard to “test” different data structures, especially when it comes to stuff involving lifetimes. You have to basically rewrite everything. It doesn’t help that you can’t have “placeholder” lifetimes, so when you try removing a thing you gotta rewrite a bunch of code.

                                                                    The refactoring point is really important I think for people not super proficient in systems design. When you realize you gotta re-work your structure, especially when you have a bunch of pattern matching, you’re giving yourself a lot of busywork. For me this is a very similar problem that other ADT-based languages (Haskell and the like) face. Sure, you’re going to check all usages, but sometimes I just want to add a field without changing 3000 lines.

                                                                    I still am really up for using it for systems stuff but it’s super painful, and makes me miss Python a lot. When I finally get a thing working I’m really happy though :)

                                                                    1. 4

                                                                      I would definitely like to learn more about the struggles around refactoring.

                                                                      1. 4

                                                                        Your pain points sound similar to what I disliked about Rust when I was starting. In my case these were symptoms of not “getting” ownership.

                                                                        The difference between &str and String/Box<str> is easy once you know it. If it’s not obvious to you, you will be unable to use Rust productively. The borrow checker will get in your way when you “just” want to return something from a function. A lot of people intuitively equate Rust’s references with returning and storing “by reference” in other languages. That’s totally wrong! They’re almost the opposite of that. Rust references aren’t for “not copying” (there are other types that do that too). They’re for “not owning”, and that has specific uses and serious consequences you have to internalize.

                                                                        Similarly, if you add a reference (i.e. a temporary scope-limited borrow) to a struct, it blows up the whole program with lifetime annotations. It’s hell. <'a> everywhere. That’s not because Rust has such crappy syntax, but because it’s basically a mistake of using wrong semantics. It means data of the struct is stored outside of the struct, on stack in some random place. There’s a valid use-case for such stack-bound-temp-struct-wrappers, but they’re not nearly as common as when it’s done by mistake. Use Box or other owning types in structs to store by reference.

                                                                        And these aren’t actually Rust-specific problems. In C the difference between &str and Box<str> is whether you must call free() on it, or must not. The <'a> is “be careful, don’t use it after freeing that other thing”. Sometimes C allows both ways, and structs have bool should_free_that_pointer;. That’s Cow<str> in Rust.

                                                                        1. 4

                                                                          Indeed, but I think this proves the “Complexity” section of TFA. There are several ways to do things including:

                                                                          • References
                                                                          • Boxed pointers
                                                                          • RC pointers
                                                                          • ARC pointers
                                                                          • COW pointers
                                                                          • Cells
                                                                          • RefCells

                                                                          There’s a lot of expressive power there, and these certainly help in allowing memory-safe low-level programming. But it’s a lot of choice. Moreso than C++.

                                                                          1. 2

                                                                            Absolutely — with a GC all these are the same thing. C++ has all of them, just under different names, or as design patterns (e.g you’ll need to roll your own Rc, because std::shared_ptr will need to use atomics in threaded programs).

                                                                            There are choices, but none of them are Rust-specific. They’re emergent from what is necessary to handle memory management and thread safety at the low level. Even if C or C++ compiler doesn’t force you to choose, you will still need to choose yourself. If you mix up pointers that are like references, with pointers that are like boxes, then you’ll have double-free or use-after-free bugs.

                                                                            1. 2

                                                                              There are choices, but none of them are Rust-specific. They’re emergent from what is necessary to handle memory management and thread safety at the low level.

                                                                              I disagree. ATS and Ada offer a different set of primitives to work with memory safe code. Moreover, some of these pointer types (like Cow) are used a lot less frequently than others. Rust frequently has multiple ways and multiple paradigms to do the same thing. There’s nothing wrong with this approach, of course, but it needs to be acknowledged as a deliberate design decision.

                                                                              1. 1

                                                                                I’d honestly like to know what Ada brings to the table here. AFAIK Ada doesn’t protect from use-after-free in implementations without a GC, and a typical approach is to just stay away from dynamic memory allocation. I see arenas are common, but that’s not unique to Ada. I can’t find info what it does about mutable aliasing or iterator invalidation problems.

                                                                              2. 2

                                                                                The set of Boost smart pointers demonstrates some of the inherent complexity in efficient object ownership: https://www.boost.org/doc/libs/1_72_0/libs/smart_ptr/doc/html/smart_ptr.html

                                                                          2. 1

                                                                            It doesn’t help that you can’t have “placeholder” lifetimes

                                                                            I’m not sure what you mean, but maybe this can help? https://doc.rust-lang.org/std/marker/struct.PhantomData.html

                                                                          1. 2

                                                                            I’ve done a lot of evaluating open source, distributed databases over the last couple of years, and something like this would have been very useful. There’s just only so much signal I can get out of running and testing something for a few weeks.

                                                                            1. 2

                                                                              From the examples, I don’t see a compelling case for it. Maybe the actual standards proposal had a good idea where it could fit in.

                                                                              1. 1

                                                                                I could see storing a polymorphic type in a vector without a ton of heap allocations being a compelling use case under the right circumstances. Otherwise, I agree that this seems more like an intellectual exercise than a useful technique in practice.

                                                                              1. 25

                                                                                The article focuses on followers, which is a strange metric to focus on.

                                                                                github primarily allows physically separated developers, sometimes from different organizations, to collaborate on code.

                                                                                As a side effect, github offers the possibility to show a portfolio of work. This is possible for people whose work/study allows them to show the work they did publicly including collaboration.

                                                                                If a person does have material on github and points to it on their resume, you can get an idea of their working style, and some of their personality from this portfolio. It may help you to filter candidates and to think of things to ask them in the interview that will spark a nice discussion.

                                                                                Once, in an interview I had a nice discussion with a total stranger about an orbital simulator I’ve worked on for many years, totally unrelated to the work at hand.

                                                                                If the candidate doesn’t have a github portfolio, it is what it is and you’ll just have to ask them about what they like to code in the interview.

                                                                                Like everything else, a github portfolio is just one piece of evidence to a future colleague’s fit for a job. It’s neither a necessary piece of evidence, nor a sufficient one, to hinge a hiring decision on.

                                                                                1. 4

                                                                                  “a (github) portfolio is just one piece of evidence to a future colleague’s fit for a job. It’s neither a necessary piece of evidence, nor a sufficient one, to hinge a hiring decision on.”

                                                                                  I absolutely agree, it can add the necessary part of the information, but it certainly does not help to draw the whole personality profile of the candidate, it is necessary to evaluate many more factors.

                                                                                  1. 3

                                                                                    They did include this footnote:

                                                                                    There isn’t an API to get the contribution activity in the last year from GitHub. Instead, people seem to get the timeline image as an SVG (like hitting this endpoint https://github.com/users/benfred/contributions), and then parse the SVG to get the number of contributions. I almost went with this hack for this post but hit some problems with CORS restrictions from loading this in the browser. I could have written a proxy to host the requests, but it was getting silly so went with the number of followers instead.

                                                                                    1. 10

                                                                                      One thing I usually did when evaluating candidate is to search for “site:github.com username commented on”. That will give some ideas on the candidate communication style as we go through his interaction on github discussing bugs, suggesting new features or debating on some technical arguments.

                                                                                      1. 2

                                                                                        This is not a bad idea! I just checked some of people I like to work with and the results were great. In case a person did not have any interactions you have to rely on other means.

                                                                                      2. 3

                                                                                        Confused. Why would anyone care about those silly metrics when you can look directly at the code they have written? It’s not GitHub, it’s what you put in there. You could host your code on another website, including one that you own, if you have one.

                                                                                        “Why won’t your CV help you with hiring”…It sure depends on its contents…?

                                                                                        1. 1

                                                                                          99% of the code I have written is not available for review for you. How do we proceed?

                                                                                          1. 4

                                                                                            Certainly not by counting GitHub stars or followers.

                                                                                            You either have the code to show or you don’t. It’s up to you to prove yourself as a candidate. I would certainly not hire you based a social network metrics.

                                                                                            If you want to know that the most followed person on GitHub says about that (I learned this in this entry):

                                                                                            Talk is cheap, show me the code.

                                                                                        2. 2

                                                                                          There is an API to get contributions. I keep a log in Vim, and I’ve configured it so that pressing F5 inserts a markdown-formatted list of my GitHub contributions since the last time I pressed F5 (https://github.com/talex5/get-activity). I haven’t tried getting a whole year’s worth, but I don’t see why it wouldn’t work.

                                                                                      1. 7

                                                                                        Interesting how a nicely compiled list of CLI tools & useful animated GIFs of their usage has resulted in flame wars over programming language, color choices, emoji uses, and pipelines.

                                                                                        1. 5

                                                                                          I’m pretty sure lobste.rs is an acronym that means ‘mostly flame wars about rust and go’.

                                                                                        1. 4

                                                                                          If you’re writing a function like unwrap that may panic, you can put this annotation on your functions, and the default panic formatter will use its caller as the location in its error message.

                                                                                          Millenials reinvent exceptions? ;)

                                                                                          1. 8

                                                                                            This isn’t about exceptions, but about attribution of the error. Some errors are the fault of the caller (e.g. passes parameters that were forbidden by the contract of the function), and some errors are fault of the function itself (a bug).

                                                                                            With access to caller’s location you can blame the correct line of code. Exception/panic stack traces are independent of this feature, because Rust didn’t want to pull in dependence on debug info and relatively expensive unwinding machinery.

                                                                                            1. 1

                                                                                              Using a strong type system for error handling predates exceptions.

                                                                                              1. 1

                                                                                                Panic is for errors that the type systems cannot catch. A pragma that adds caller location to a panic message is a symptom of nostalgia for exception traces.

                                                                                                I don’t know what compiler developers’ reasoning is. As a side observer, I can’t help but think “they could as well be implementing full exception traces now”.

                                                                                                1. 2

                                                                                                  They actually do have full exception traces. You turn them on with the RUST_BACKTRACE=1 environment variable, and by making sure you don’t strip debug info.

                                                                                                  This is a convenience feature so that you get useful info when you’ve got traces turned off, or when you’re trying to get useful diagnostics from a released executable (if you’re stripping the executable, then the whole point of doing so is to ship less data than you would need for full traces, but there might still be a happy medium between nothing and full-exceptions).

                                                                                                  1. 1

                                                                                                    There’s a substantial runtime cost to full exception traces that is much smaller to non existent for the track caller annotation. For the latter, the compiler inserts some static info to be printed on panic. There will be some binary bloat and potentially some instruction density loss, but the performance impact will be very small. To do full exception traces, you have to ~constantly maintain full unwind tables somewhere and update them on every function call/return. You can already get that info via seeing RUST_BACKTRACE, but it is off by default.

                                                                                              1. 1

                                                                                                I’m somewhat curious how long it will take for the console fn improvements to trickle out to crates enough to see performance improvements.

                                                                                                1. 2

                                                                                                  I love the idea of compile time computation, but I imagine that it could be fraught for rust given it’s notorious compile times. Definitely leans into existing orientation towards runtime performance over compile time performance. That said, it’s just another arrow in the quiver.

                                                                                                  1. 9

                                                                                                    One unintuitive thing about compilers is that assessing time spent in general is really hard. E.g. early optimisations that remove code may make later optimisation passes or code generation cheaper.

                                                                                                    const functions are a case where you get a guaranteed and fault-free transformation at compile time (const function calls also have an upper runtime limit, compared to e.g. proc macros), without relying on additional optimisations to e.g. catch certain patterns and resolve them at compile time.

                                                                                                    1. 1

                                                                                                      That is an interesting point! Substituting a const function for a macro could lead to an improvement in compile time.

                                                                                                      My initial thought was that having a new compile time facility might lead people to handle new things at compile time. It’s always hard to tell how something shakes out in a dynamic system at first glance.

                                                                                                    2. 9

                                                                                                      My expectation is that const fn will speed up Rust programs. Running snippets of rust on miri, which is how const fn works, is not a slow process. Running LLVM to optimize things perfectly is. Any code that LLVM doesn’t have to deal with because it was const’d is a win.

                                                                                                      1. 8

                                                                                                        If it’s used to replace proc macro’s, it should improve compile times. But there are certainly use cases where it can degrade compile times. E.g. generating a perfect hash function. For cases like that, you’re probably better off using code generation.

                                                                                                        1. 2

                                                                                                          There’s some work on providing tools for profiling compile times. I think some of the larger crates are doing so to manage their compile times and could use that to manage the trade-off.

                                                                                                          1. 1

                                                                                                            That seems like it would help quite a bit. Once you can measure something it becomes much more actionable and you can make informed decisions about trade offs.

                                                                                                      1. 49

                                                                                                        I wonder why you consider unordered maps a failing rather than a reasonable design decision: You want an ordered set of hash keys? Order that list yourself or use a library implementation that does. I like the opinionated decision that Go made from the start, and eventually Perl arrived at, to intentionally produce unpredictable key ordering so naive programmers do not rely on an assumption that isn’t true (and probably incurs overhead).

                                                                                                        1. 13

                                                                                                          perlsec says that Perl uses unpredictable key ordering for resistance to algorithmic complexity attacks.

                                                                                                          1. 12

                                                                                                            JFYI Rust also took the same approach. Deliberately randomly different between program runs. The position was to prevent HashDoS attacks.

                                                                                                            https://doc.rust-lang.org/std/collections/struct.HashMap.html

                                                                                                            1. 6

                                                                                                              Python has different hashing per process invocation for security purposes. They just keep a separate insertion ordered array for stable iteration orders.

                                                                                                            2. 11

                                                                                                              Came here to post exactly that. Ordered maps are significantly more expensive than unordered ones, because hash tables. Making the more expensive collection the default is a bad idea, especially when the benefit is so small — I can only think of a few cases where it’s been needed.

                                                                                                              This is a pet peeve of mine because I’ve run into several issues where some existing system or other decides to assign meaning to the order of keys in a JSON object — despite the standard saying they’re unordered — causing serious problems with processing such JSON in a language whose maps are unordered. (Several examples of this come from CouchDB, such as the way it used to use a JSON object to describe the following multipart MIME bodies.)

                                                                                                              1. 7

                                                                                                                Though one would think “adding the ordering constraint makes it more expensive”, Python landed here because a better dict implementation gave insertion ordering for free.

                                                                                                                Now, sure, maybe a billion years down the line we’ll find some other dic management strategy that is better, but Python is pretty mature and the dict change seemed to align well with a lot of stuff.

                                                                                                                So Python was faced with either:

                                                                                                                • just exposing the key ordering on the standard object
                                                                                                                • Keep around this separate object (OrderedDict) despite dict now being able to fulfill its requirements, for what amount to philosophical reasons.

                                                                                                                I think pragmatism won out here. And now you don’t have to tell beginners “remember, insertion order on dictionaries aren’t kept!”, You can just say that (or have beginners assume it, correctly if for the wrong reasons).

                                                                                                                1. 6

                                                                                                                  Ordered maps are significantly more expensive than unordered ones, because hash tables.

                                                                                                                  “Because hash tables” what? :-) A dict in Python was a hash table when it was unordered, and remained a hash table when it suddenly became ordered as a side-effect of them wanting to share the keys (which saves a lot of space, given that instances of the same class would all have the same keys). Here’s a concise explanation I wrote recently: https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/en/

                                                                                                                  As for “significantly more expensive”, I have no idea what you’re talking about!

                                                                                                                  1. 2

                                                                                                                    As for “significantly more expensive”, I have no idea what you’re talking about!

                                                                                                                    • It adds a separate data structure for the key/value list (or you could say it adds a separate entry-index list.)
                                                                                                                    • It adds a memory indirection to every hash table probe.
                                                                                                                    • When a key is removed, you have to mess with the entry list. Either you remove the entry, which requires sliding the other entries down and renumbering all the indices in the table; or you turn the entry into a tombstone that can never be reused (in which case you eventually need to GC the tombstones and renumber when there are too many.)

                                                                                                                    I’m sure this design is a win for Python, because Python (like JS and Ruby) uses dictionaries to store objects, so it happens to have a ton of dictionaries with identical sets of keys, from which keys are very rarely removed. So this actually saves a significant amount of memory. Without that special circumstance, I doubt the overhead comes close to paying for itself.

                                                                                                                    1. 1

                                                                                                                      At least more expensive in memory, you need to keep an additional list structure. I don’t really get the point of ordered maps but they’re probably fine for python.

                                                                                                                      1. 7

                                                                                                                        From what I remember, Python’s dicts got smaller in memory, not larger, when they added the ordering feature.

                                                                                                                        It’s worth looking at the notes at the top of https://github.com/python/cpython/blob/v3.8.5/Objects/dictobject.c - the authors went to a lot of trouble to explain their work.

                                                                                                                        In the old unordered implementation, each bucket was an “entry” struct (hash, key, value).

                                                                                                                        In the new ordered implementation, they have a dense array of (hash, key, value) entries, one per value which is in the hash table. The hash buckets are integer indexes into the entries array, and they change the size of the ints depending on how many entries the dict has - 8 bit ints when there are very few entries, growing to 16, 32 or 64 bits as the number of entries goes up.

                                                                                                                        The hash table always has more buckets than it has entries because its load factor can never be 1. In CPython 2 and 3 the load factor is generally between 1/3 and 2/3 because they resize at a 2/3 load factor. So the number of buckets is going to be between 1.5 and 3x the number of entries. Having more int8_t or int16_ts in memory in order to have fewer PyObject*s in memory is a net win.

                                                                                                                        The above applies mainly to the combined dict layout. The combined dict layout is the one in which the dict contains both keys & values. The other layout is the “split” layout, where 1 copy of the keys is shared between N dicts, each of which just has a simple flat array of pointers to values. The split layout saves way more memory in the case that it’s designed for, which is storing the fields for instances of classes (taking advantage of the fact that constructors will typically assign exactly the same fields in exactly the same order for every object of a given class).

                                                                                                                        1. 1

                                                                                                                          For a normal dictionary it won’t apply, right? If the array of (hash, key, value) behaves like a stack, it still consume more than the number of entries because you don’t want each insertion to re-allocate. The exception for object fields is a good one, but it’s only because python is a dictionary-based language; in general an unordered hashmap is smaller than an ordered one simply because there’ literally less information to store.

                                                                                                                          1. 2

                                                                                                                            Both the entries & buckets arrays grow multiplicatively.

                                                                                                                            https://mail.python.org/pipermail/python-dev/2012-December/123028.html shows the arithmetic. From what I remember the new combined layout was designed primarily in order to make dicts use less memory. The fact that it also makes preserving ordering feasibly was a happy side benefit that Python devs chose to take advantage of.

                                                                                                                            in general an unordered hashmap is smaller than an ordered one simply because there’ literally less information to store.

                                                                                                                            That would only necessarily be the case if you were using a succinct data structure, which no hash table is. They’re a time/space tradeoff, using more space than necessary in order to save time.

                                                                                                                        2. 2

                                                                                                                          More expensive in memory often turns out to be more expensive in CPU as well. Most to all fast hash maps I’m aware of use some form of open addressing to store the data in a flat array. The only way to maintain ordering in that case would be to bloat each element with pointers and iterate in a much less cache friendly manner. Your CPU is going to end up spinning waiting for data. For Python (and I assume Ruby), everything is already a PyObject*, so the overhead is much lower than it would be in a more value oriented language.

                                                                                                                          1. 1

                                                                                                                            and iterate in a much less cache friendly manner

                                                                                                                            Iterating the entries of a python3 dict is not cache-unfriendly in the way iterating a linked list normally is. Think “arraylist” not “linked list”.

                                                                                                                            Keys are stored in a flat array of PyDictKeyEntry structs, each of which is a (hash, key, value) triple. In a split layout dict, the value field is omitted.

                                                                                                                            In a split layout dict, the values are stored in a flat array of PyObject*s. In a combined layout dict, the values are stored in the flat array of PyDictKeyEntry structs.

                                                                                                                            The implementation for iterating both the keys & values at the same time is at https://github.com/python/cpython/blob/v3.8.5/Objects/dictobject.c#L3715 - functions for iterating only-keys or only-values are just above it.

                                                                                                                            1. 1

                                                                                                                              To be clear, I was referring to a hypothetical open addressing hash table design. Python squares that circle by but using open addressing. Since it can inline the PyObject* in the key table, you aren’t paying for an extra memory indirection. In a value oriented language (both those without GC and languages like Go), that extra memory indirection would be an unacceptable cost on every lookup.

                                                                                                                              1. 1

                                                                                                                                I think the same design with the flat entries array & separate indexes into it could work without the PyObject* indirection. If keys & values are both fixed size, use exactly the same design with a fixed-size entry struct. If not, now the integers in the bucket list would be byte offsets to the starts of entries, rather than indices of entries (so you’d have to use the wider sizes sooner). And each entry would need to have one or two length fields too, depending on whether either the keys or values might happen to be fixed-size.

                                                                                                                                1. 1

                                                                                                                                  It would work in the sense that you can build a functional hash table like that, but that table would still be slower than an open addressing table due to memory indirection. You’re still paying an extra memory indirection on every lookup. In an open addressing table, you have a single memory indirection when looking up the bucket that contains the key. In the ordered table you outlined, you have a memory indirection to look up an integer and a second memory indirection to look up the actual key with that integer.

                                                                                                                                  1. 1

                                                                                                                                    AFAIK the definition of “open addressing” is that the collision resolution mechanism is based on probing other buckets rather than following a linked list or something - not that there isn’t a memory lookup to get the entry.

                                                                                                                                    I’m not aware of anyone in python complaining about a big regression to dict lookup time when 3.6 came out (offhand I see some references to microbenchmarks showing 3% perf differences). The interpreter is pretty slow so maybe it’s just getting buried, but apparently that 1 extra indirection isn’t blowing everything up.

                                                                                                                                    1. 1

                                                                                                                                      Nowhere did I mention values. I’m specifically explaining the number of memory indirections before being able to compare the key. I absolutely believe that Python was able to maintain performance even with ordering. I’m simply trying to explain why that isn’t possible in general.

                                                                                                                                      1. 1

                                                                                                                                        Is that much worse than indirection to the values? I get the impression it’s pretty rare to have to probe very many different buckets before finding the right one or running out

                                                                                                                                        Separate thought: you could steal say 2 to 4 bits from the bucket integers and put top bits from the hash in them. Then when probing you can often reject a given bucket and move on to the next one without having to check the entries

                                                                                                                                        1. 2

                                                                                                                                          First, there are a number of use cases where you don’t use the values, notably hash sets but also when checking if a key is in a map. If you store the values next to the keys in the ordered array, that has all the same memory trade-offs as having the keys and values in a single array (i.e. like Abseil flat_hash_map), except for the ordered version has more memory indirection and makes deletes substantially more expensive.

                                                                                                                      2. 3

                                                                                                                        This is a pet peeve of mine because I’ve run into several issues where some existing system or other decides to assign meaning to the order of keys in a JSON object

                                                                                                                        You just gave me flashbacks… OAS and the idiotic way it uses the JSON field order to organise the endpoints in the UI. Makes it extremely hard to work on OAS specifications programmatically as you have to fight your libraries all the way.

                                                                                                                        To anyone who’s considering assigning meaning to JSON field order, please switch profession, you weren’t meant to be a programmer…

                                                                                                                      3. 6

                                                                                                                        Perl hashes (maps) have never been ordered as far as I know. I think the feature is from AWK.

                                                                                                                        I don’t believe it’s a conscious decision to have unordered hashes to “keep newbies on their toes”. It’s simply more (machine) efficient not to have to order internally.

                                                                                                                        Edit I mostly reacted to the statement

                                                                                                                        I like the opinionated decision that Go made from the start, and eventually Perl arrived at

                                                                                                                        (my emphasis). Perl has had unordered hashes since the 1980s, while Go was released in 2009.

                                                                                                                          1. 4

                                                                                                                            I think the quote should be amended to read

                                                                                                                            You use a hash for everything in Perl

                                                                                                                            ;)

                                                                                                                            I had a coworker whose Perl code used the basic data structure of hashes of hashes of hashes … ad infinitum and by Ghod I’m approaching that level myself.

                                                                                                                          2. 3

                                                                                                                            awk doesn’t preserve order, but you can choose from built-in features (might be gawk specific)

                                                                                                                            $ awk 'BEGIN{a["z"]=1; a["x"]=12; a["b"]=42; for(i in a) print i, a[i]}'
                                                                                                                            x 12
                                                                                                                            z 1
                                                                                                                            b 42
                                                                                                                            
                                                                                                                            $ # index sorted in ascending order as strings
                                                                                                                            $ awk 'BEGIN{PROCINFO["sorted_in"] = "@ind_str_asc";
                                                                                                                                   a["z"]=1; a["x"]=12; a["b"]=42; for(i in a) print i, a[i]}'
                                                                                                                            b 42
                                                                                                                            x 12
                                                                                                                            z 1
                                                                                                                            
                                                                                                                            $ # value sorted in ascending order as numbers
                                                                                                                            $ awk 'BEGIN{PROCINFO["sorted_in"] = "@val_num_asc";
                                                                                                                                   a["z"]=1; a["x"]=12; a["b"]=42; for(i in a) print i, a[i]}'
                                                                                                                            z 1
                                                                                                                            x 12
                                                                                                                            b 42
                                                                                                                            
                                                                                                                            1. 4

                                                                                                                              Thanks for expanding on AWK!

                                                                                                                              This is the equivalent Perl code

                                                                                                                              my %hash = ( b => 42, x => 12, z => 1 );
                                                                                                                              say "    ==> dump the hash";
                                                                                                                              foreach my $key ( keys %hash ) {
                                                                                                                                  say "    $key $hash{$key}";
                                                                                                                              }
                                                                                                                              say "    ==> order by key";
                                                                                                                              foreach my $key ( sort { $a cmp $b } keys %hash ) {
                                                                                                                                  # we use 'cmp' here because the keys are strings
                                                                                                                                  say "    $key $hash{$key}";
                                                                                                                              }
                                                                                                                              say "    ==> order by value";
                                                                                                                              foreach my $key ( sort { $hash{$a} <=> $hash{$b} } keys %hash ) {
                                                                                                                                  # we use '<=>' because the values are numbers
                                                                                                                                  say "    $key $hash{$key}";
                                                                                                                              }
                                                                                                                              

                                                                                                                              Output:

                                                                                                                              ==> dump the hash
                                                                                                                              z 1
                                                                                                                              b 42
                                                                                                                              x 12
                                                                                                                              ==> order by key
                                                                                                                              b 42
                                                                                                                              x 12
                                                                                                                              z 1
                                                                                                                              ==> order by value
                                                                                                                              z 1
                                                                                                                              x 12
                                                                                                                              b 42
                                                                                                                              

                                                                                                                              Because the result of the function keys %hash is a list, we can apply all sorts of fancy sorting to it, for example, sorting by value and then by key on a tie.

                                                                                                                              say "    ==> add a new key with the same value as an existing one";
                                                                                                                              $hash{y}=12;
                                                                                                                              foreach my $key (sort { $hash{$a} <=> $hash{$b} || $a cmp $b } keys %hash) {
                                                                                                                                  say "    $key $hash{$key}";
                                                                                                                              }
                                                                                                                              
                                                                                                                              z 1
                                                                                                                              x 12
                                                                                                                              y 12
                                                                                                                              b 42
                                                                                                                              
                                                                                                                            2. 2

                                                                                                                              Perl hashes (maps) have never been ordered as far as I know. I think the feature is from AWK.

                                                                                                                              Maybe I am mistaken. I stopped following Perl circa 2013. I recall that the ordering was an implementation detail and platform-specific but consistent and predictable. So of course, people relied on that and the Perl community said, “No, THAT’S WRONG!” (probably tchrist on something else… but why not reuse a good opener?) but it wasn’t actually fixed for a while.

                                                                                                                              1. 4

                                                                                                                                tchrist on something else: “You are wicked and wrong to have broken inside and peeked at the implementation and then relied upon it.”

                                                                                                                                1. 1

                                                                                                                                  Thanks for expanding. I think I remember something like that. In any case, it’s possible that for some small number of keys, the return order would be deterministic (like if the keys were simply one-character strings) and beginners, without internalizing the documentation, observed this and started to rely on a behavior that broke down in other cases.

                                                                                                                                  Quoting from the link that @dbremner posted:

                                                                                                                                  Perl has never guaranteed any ordering of the hash keys, and the ordering has already changed several times during the lifetime of Perl 5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order and the history of changes made to the hash over its lifetime.

                                                                                                                                  (Emphasis in original).

                                                                                                                              2. 6

                                                                                                                                It’s ergonomics, much like the namedtuple collection mentioned in the same breath. The change being referred to removed the extra step of importing OrderedDict from the collections library when you cast a namedtuple into a dict. If that dict shouldn’t be ordered, there’s probably also no reason for namedtuple to exist. Collections also has other silly-but-useful beauties like defaultdict.

                                                                                                                                The choices about many such things in Python seem absurd when you sit down as an engineer to architect an application.

                                                                                                                                When you’re doing something like data science, the percentage of code you write that will never be run again dramatically outweighs even code a software engineer would refer to as prototype code. There are 100x or more the circumstances in which you’d type several dozen keyboard characters, run something best characterized as a “code cell“, then delete it because you were wrong (or start a new cell that doesn’t necessarily follow the previous cell in execution order). It’s an activity with requirements halfway between an interactive shell and an executable file.

                                                                                                                                When 90% of your work is loading arbitrary or novel data inputs and poking at them to see if they have any life, nothing matters more about your tool than your ability to churn through this iteration cycle quickly.

                                                                                                                                Over the past 10 years, the percentage of Python used directly as a human tool (not a language to build human tools) has dramatically shifted the audience for language improvements. Maybe away from what is appropriate for software development, maybe not.

                                                                                                                                I write Python for a profession, and there is no application I would engineer in Python instead of Go. But I also think any professional Go developer who just spent the day e.g. unmarshalling json can appreciate there are other activities we all do besides engineering.

                                                                                                                                Despite its many other failings, there’s no other tool I’d reach for before Python when some new thing comes at me and I say, “now what’s THIS bullshit.” Quirks like stuffing values into a data structure and getting them back in an intuitive way is part of this charm.

                                                                                                                                To put it another way with less fanfare: if you have data in a map that’s less useful to you because that map is ordered, that’s probably already data that shouldn’t be in a Python map. This remains true if you’re already in the middle of writing Python code when this happens (we have non-Python data structures in Python).

                                                                                                                                1. 1

                                                                                                                                  … But I also think any professional Go developer who just spent the day e.g. unmarshalling json can appreciate there are other activities we all do besides engineering.

                                                                                                                                  (emphasis mine) Exploratory programming in go involving json (“nominally curly braced or bracketed UTF8 blobs”) is awful (I’ve also felt this a while back in D-Lang’s std.json library and got into the habit of using other libraries, C++ json libraries are excellent in terms of the programmer interface). If anyone thinks unordered maps is not ergonomic they’ll faint when they deal with json.

                                                                                                                                2. 3

                                                                                                                                  unordered maps a failing

                                                                                                                                  From my experience, having maps preserve insertion order is so much more convenient that it “deserves” to be the default. Additional “evidence” to that is Ruby and Python switching to do exactly that.

                                                                                                                                  1. 6

                                                                                                                                    I know preserving order is good for job security because I’ve written this genuine line of code for a real project:

                                                                                                                                    FIELDS = list({v: k for k, v in list(FIELD_MAP.items())[::-1]}.values())[::-1]
                                                                                                                                    

                                                                                                                                    But other than that, I can’t think of a time when explicitly using OrderedDict felt like an inconvenience, and there are two obvious benefits: it doesn’t constrain the implementation of dict, and it tells the reader you’re going to do something that cares about the order.

                                                                                                                                    1. 2

                                                                                                                                      OrderedDict

                                                                                                                                      …unordered?

                                                                                                                                      1. 1

                                                                                                                                        I feel like I’m perhaps missing something. But I meant OrderedDict—as in: in the unusual event that I need ordering, it doesn’t bother me to explicitly ask for it.

                                                                                                                                        1. 1

                                                                                                                                          I was confused by your comment.

                                                                                                                                          I can’t think of a time when explicitly using OrderedDict felt like an inconvenience…

                                                                                                                                          “I can’t think of a time when I wanted to use an ordered dict and I felt inconvenienced that the default dict was not ordered.”

                                                                                                                                          …it doesn’t constrain the implementation of dict…

                                                                                                                                          “Because ordering is requested explicitly (as opposed to if dict was ordered by default) the implementation of dict is not constrained.”

                                                                                                                                          …and it tells the reader you’re going to do something that cares about the order.

                                                                                                                                          This part is fine but is confusing if one interpreted the previous clauses of your comment in a different way.

                                                                                                                                    2. 2

                                                                                                                                      Use a list, seriously; arrays’ raison d’etre is to provide you with a collection of ordered items.

                                                                                                                                      1. 1

                                                                                                                                        I can think of exactly two kinds of thing where I cared. One was implementing LRU caches in coding challenges

                                                                                                                                        The other was a dirty hack. I was reinventing protobuf (but for json, with simpler syntax, and better APIs for the target languages), and the code gen was done by defining a python class with the appropriate members, and later looping over their contents. I used metaclass magic to replace the default method dict with an ordered one, then iterated the members of all classes in the namespaces to do code gen:

                                                                                                                                         class Event(metaclass=msg):
                                                                                                                                               member = int
                                                                                                                                               other = str
                                                                                                                                         ...
                                                                                                                                         for c in scrape_classes():
                                                                                                                                               for m in c.members():
                                                                                                                                                     lang.gen()
                                                                                                                                        

                                                                                                                                        For most other things, I don’t think I wanted insertion order – it’s either been key order or value order.

                                                                                                                                        Where are you using them often enough that it matters?

                                                                                                                                    1. 40

                                                                                                                                      Rust is hard because it draws heavily from two of the hardest other languages to learn: C++ and Haskell. If that’s not enough, it dumps the borrow checker (not too bad if you’re from C++) and then lifetimes on you.

                                                                                                                                      C++

                                                                                                                                      • const/non-const versions of functions (&self, &mut self)
                                                                                                                                      • compile-time directives (#[cfg, etc.)
                                                                                                                                      • references vs values
                                                                                                                                      • moving and copying
                                                                                                                                      • operator overloading
                                                                                                                                      • RAII (c’tors, d’tors)
                                                                                                                                      • monomorphism

                                                                                                                                      Haskell:

                                                                                                                                      • pattern matching
                                                                                                                                      • derive as in “typeclass”, not as in OOP
                                                                                                                                      • traits are like typeclasses, not interfaces in OOP
                                                                                                                                      • enum as sum types
                                                                                                                                      • inferred typing
                                                                                                                                      • using types to indicate usage

                                                                                                                                      In isolation this is a lot of strange things. I’m an experience C++ dev, so those points weren’t difficult and I was able to get productive in Rust rather quickly. However, it wasn’t until I spent the last month or so learning Haskell that a whole bunch of the parts I thought were quirky started to make considerably more amounts of sense.

                                                                                                                                      1. 15

                                                                                                                                        I think this is spot on, and is very similar to my experience (except I knew Haskell before Rust). The reality is Rust is simply not a beginners language, and I don’t think it will ever be one.

                                                                                                                                        In terms of complexity, I do think Rust is up there with Haskell and C++, but I can’t help but think C++ and Haskell have quite a bit of accidental complexity due to legacy reasons. Rust certainly has a bit of that, but overall the language is extremely well designed and I almost think that makes it more daunting. You can’t just point to some complexity and say “oh thats just for some silly X, Y and Z,” its always “Oh, that has some real and important purpose X, Y and Z.”

                                                                                                                                        Rust will certainly get there in terms of accidental complexity (its a very young language, that cruft is built over time), and I can only imagine how complicated it might become one day. The async/await has some of this – because even though I think it is well designed (for now) it is exceedingly complicated, and I have a hard time believing that nothing better will come about (leaving cruft).

                                                                                                                                        1. 4

                                                                                                                                          async/await .. think it is well designed

                                                                                                                                          I already know people calling it poorly designed today, with statements like “optimized for ~no CPU work on the critical path”.

                                                                                                                                          1. 3

                                                                                                                                            At a minimum, it’s fairly awkward to use when you have a mixed I/O and CPU bound workload. The project I’m working on like that splits work between Tokio and Rayon for that reason. All of the client libraries we’re using are based on async, so it would be challenging to choose a different style of concurrency.

                                                                                                                                            1. 2

                                                                                                                                              Wouldn’t that be the executors that are poorly designed? The actual built-in async/await support doesn’t seem specific to any given workload.

                                                                                                                                              1. 2

                                                                                                                                                I think the issue with async/await is not so much the implementation in Rust, but all the different (incompatible) libraries built on top of it. This means that importing a few dependencies can lead to a few different async/await runtimes being used, with all the trouble that may bring.

                                                                                                                                            2. 7

                                                                                                                                              I think it the listed features are more from ML than Haskell. Even the first compiler of Rust written in OCaml. https://github.com/rust-lang/rust/commit/6997adf76342b7a6fe03c4bc370ce5fc5082a869

                                                                                                                                              1. 6

                                                                                                                                                I feel like some of these things took no time at all to learn, such as pattern matching and enums as sum types. It was more of a “well fucking finally” than a “how does this work?”

                                                                                                                                                1. 4

                                                                                                                                                  I think the sort of challenge comes at identifying usage of these when you spend your life not using sum types (and instead relying on stuff like OOP to get the effects). Also, pattern matching does affect thigns like the flow of ownership, so you can land on some non-intuitive stuff. I “get” this stuff but ended up losing a good amount of time in a refactor because of pattern matching + implicit reference stuff.

                                                                                                                                                  Rust isn’t Haskell, but it does have the “if you want to avoid problems, write code in a very specific way” quality to it that makes “porting this C code to Rust” non-obvious at times.

                                                                                                                                                  1. 1

                                                                                                                                                    I first encountered some of this when I poked at OCaml in college, after really only knowing Javascript, Java, and a bit of Scheme. It was immediately obvious to me how much better pattern matching was than the equivalent if-statements.

                                                                                                                                                    On the other hand, I’m also kind of a paranoid coder overall, and was already dissatisfied with how verbose and risky if-statements were, so I suppose I already had a use-case in mind before I saw the feature.

                                                                                                                                                2. 2

                                                                                                                                                  Coming from a C++ background, it’s generics and especially lifetimes that give me a hard time.

                                                                                                                                                  Generics are superficially like templates, but their type-checking is much stronger. Templates kind of let you sneak away from strong typing, kind of like structured macros. Generics don’t. I’m sort of used to this because I’ve used Swift (and a bit of Scala) but sometimes I find it hard to convey to the compiler exactly what the type constraints on T need to be.

                                                                                                                                                  Lifetimes are something I understand and manage all the time, both manually and by rigging up C++ stuff to enforce them for me with RAII. What I’m not used to is having to explain my logic to the compiler, especially when there are only limited facilities for doing so and the syntax is kind of arcane. This is where I tend to rage-quit and why I haven’t gotten farther.

                                                                                                                                                  This may also explain why I’m liking Nim more: it’s kind of Rust-like but comfier. Its generics hew closer to Rust than C++, but it also has things closer to C++ templates (cleverly called “templates”). And its GC/refcounting means I don’t have to sweat lifetimes so much; that comes at a cost, of course, but as I’m not writing device drivers or lightbulb firmware I don’t mind.