1. 12

    You don’t have to use the golden ratio; multiplying by any constant with ones in the top and bottom bits and about half those in between will mix a lot of input bits into the top output bits. One gotcha is that it only mixes less-significant bits towards more-significant ones, so the 2nd bit from the top is never affected by the top bit, 3rd bit from the top isn’t affected by top two, etc. You can do other steps to add the missing dependencies if it matters, like a rotate and another multiply for instance. (The post touches on a lot of this.)

    FNV hashing, mentioned in the article, is an old multiplicative hash used in DNS, and the rolling Rabin-Karp hash is multiplicative. Today Yann Collet’s xxHash and LZ4 use multiplication in hashing. There have got to be a bajillion other uses of multiplication for non-cryptographic hashing that I can’t name, since it’s such a cheap way to mix bits.

    It is, as author says, kind of interesting that something like a multiplicative hash isn’t the default cheap function everyone’s taught. Integer division to calculate a modulus is maybe the most expensive arithmetic operation we commonly do when the modulus isn’t a power of two.

    1. 1

      Nice! About the leftward bit propagation: can you do multiplication modulo a compile time constant fast? If you compute (((x * constant1) % constant2) % (1<<32)) where constant1 is the aforementioned constant with lots of ones, and constant2 is a prime number quite close to 1<<32 then that would get information from the upper bits to propagate into the lower bits too, right? Assuming you’re okay with having just slightly fewer than 1<<32 hash outputs.

      (Replace 1<<32 with 1<<64 above if appropriate of course.)

      1. 1

        You still have to do the divide for the modulus at runtime and you’ll wait 26 cycles for a 32-bit divide on Intel Skylake. You’ll only wait 3 cycles for a 32-bit multiply, and you can start one every cycle. That’s if I’m reading the tables right. Non-cryptographic hashes often do multiply-rotate-multiply to get bits influencing each other faster than a multiply and a modulus would. xxHash arranges them so your CPU can be working on more than one at once.

        (But worrying about all bits influencing each other is just one possible tradeoff, and, e.g. the cheap functions in hashtable-based LZ compressors or Rabin-Karp string search don’t really bother.)

        1. 1

          you’ll wait 26 cycles for a 32-bit divide on Intel Skylake

          And looking at that table, 35-88 cycles to divide by a 64 bit divide. Wow. That’s so many cycles, I didn’t realize. But I should have: on a 2.4 GHz processor 26 cycles is 10.83 ns per op, which is roughly consistent with the author’s measurement of ~9 ns per op.

          1. 1

            That’s not what I asked. I asked a specific question.

            can you do multiplication modulo a compile time constant fast?

            similarly to how you can do division by a constant fast by implementing it as multiplication by the divisor’s multiplicative inverse in the group of integers modulo 2^(word size). clang and gcc perform this optimisation out the box already for division by a constnat. What I was asking is if there’s a similar trick for modulo by a constant. You obviously can do (divide by divisor, multiply by divisor, subtract from original number), but I’m wondering if there’s something quicker with a shorter dependency chain.

            1. 1

              OK, I get it. Although I knew about the inverse trick for avoiding DIVs for constant divisions, I didn’t know or think of extending that to modulus even in the more obvious way. Mea culpa for replying without getting it.

              I don’t know the concrete answer about the best way to do n*c1%(2^32-5) or such. At least does intuitively seem like it should be possible to get some win from using the high bits of the multiply result as the divide-by-multiplying tricks do.

        2. 1

          So does that mean that when the author says Dinkumware’s FNV1-based strategy is too expensive, it’s only more expensive because FNV1 is byte-by-byte and fibonacci hashing multiplying by 2^64 / Φ works on 8 bytes at a time?

          Does that mean you could beat all these implementations by finding a multiplier that produces an even distribution when used as a hash function working on 8 byte words at a time? That is, he says the fibonacci hash doesn’t produce a great distribution, whereas multipliers like the FNV1 prime are chosen to produce good even distributions. So if you found an even-distribution-producing number for an 8 byte word multiplicative hash, would that then work just as well whatever-hash-then-fibonacci-hash? But be faster because it’s 1 step not 2?

          1. 1

            I think you’re right about FNV and byte- vs. word-wise multiplies.

            Re: 32 vs. 64, it does look like Intel’s latest big cores can crunch through 64-bit multiplies pretty quickly. Things like Murmur and xxHash don’t use them; I don’t know if that’s because perf on current chips is for some reason not as good as it looks to me or if it’s mainly for the sake of older or smaller platforms. The folks that work on this kind of thing surely know.

            Re: getting a good distribution, the limitations on the output quality you’ll get from a single multiply aren’t ones you can address through choice of constant. If you want better performance on the traditional statistical tests, rotates and multiplies like xxHash or MurmurHash are one approach. (Or go straight to SipHash, which prevents hash flooding.) Correct choice depends on what you’re trying to do.

            1. 2

              That makes me wonder what hash algorithm ska::unordered_map uses that was faster than FNV1 in dinkumware, but doesn’t have the desirable property of evenly mixing high bits without multiplying the output by 2^64 / φ. Skimming his code it looks like std::hash.

              On my MacOS system, running Apple LLVM version 9.1.0 (clang-902.0.39.2), std::hash for primitive integers is the identity function (i.e. no hash), and for strings murmur2 on 32 bit systems and cityhash64 on 64 bit systems.

              // We use murmur2 when size_t is 32 bits, and cityhash64 when size_t
              // is 64 bits.  This is because cityhash64 uses 64bit x 64bit
              // multiplication, which can be very slow on 32-bit systems.
              

              Looking at CityHash, it also multiplies by large primes (with the first and last bits set of course).

              Assuming then that multiplying by his constant does nothing for string keys—plausible since his benchmarks are only for integer keys—does that mean his benchmark just proves that dinkumware using FNV1 for integer keys is better than no hash, and that multiplying an 8 byte word by a constant is faster than multiplying each integer byte by a constant?

          2. 1

            A fair point that came up over on HN is that people mean really different things by “hash” even in non-cryptographic contexts; I mostly just meant “that thing you use to pick hashtable buckets.”

            In a trivial sense a fixed-size multiply clearly isn’t a drop-in for hashes that take arbitrary-length inputs, though you can use multiplies as a key part of variable-length hashing like xxHash etc. And if you’re judging your hash by checking that outputs look random-ish in a large statistical test suite, not just how well it works in your hashtable, a multiply also won’t pass muster. A genre of popular non-cryptographic hashes are like popular non-cryptographic PRNGs in that way–traditionally judged by running a bunch of statistical tests.

            That said, these “how random-looking is your not-cryptographically-random function” games annoy me a bit in both cases. Crypto-primitive-based functions (SipHash for hashing, cipher-based PRNGs) are pretty cheap now and are immune not just to common statistical tests, but any practically relevant method for creating pathological input or detecting nonrandomness; if they weren’t, the underlying functions would be broken as crypto primitives. They’re a smart choice more often than you might think given that hashtable-flooding attacks are a thing.

            If you don’t need insurance against all bad inputs, and you’re tuning hard enough that SipHash is intolerable, I’d argue it’s reasonable to look at cheap simple functions that empirically work for your use case. Failing statistical tests doesn’t make your choice wrong if the cheaper hashing saves you more time than any maldistribution in your hashtable costs. You don’t see LZ packers using MurmurHash, for example.

          1. 0

            A list of beliefs about programming that I maintain are misconceptions.

            1. 3

              Small suggestion: use a darker, bigger font. There are likely guidelines somewhere but I don’t think you can fail with using #000 for text people are supposed to read for longer than a couple of seconds.

              1. 3

                Current web design seems allergic to any sort of contrast. Even hyper-minimalist web design calls for less contrast for reasons I can’t figure out. Admittedly, I’m a sucker for contrast; I find most programming colorschemes hugely distasteful for the lack of contrast.

                1. 6

                  I think a lot of people find the maximum contrast ratios their screens can produce physically unpleasant to look at when reading text.

                  I believe that people with dyslexia in particular find reading easier with contrast ratios lower than #000-on-#fff. Research on this is a bit of a mixed bag but offhand I think a whole bunch of people report that contrast ratios around 10:1 are more comfortable for them to read.

                  As well as personal preference, I think it’s also quite situational? IME, bright screens in dark rooms make black-on-white headache inducing but charcoal-on-silver or grey-on-black really nice to look at.

                  WCAG AAA asks for a contrast ratio of 7:1 or higher in body text which does leave a nice amount of leeway for producing something that doesn’t look like looking into a laser pointer in the dark every time you hit the edge of a glyph. :)

                  As for the people putting, like, #777-on-#999 on the web, I assume they’re just assholes or something, I dunno.

                  Lobsters is #333-on-#fefefe which is a 12.5:1 contrast ratio and IMHO quite nice with these fairly narrow glyphs.

                  (FWIW, I configure most of my software for contrast ratios around 8:1.)

                  1. 2

                    Very informative, thank you!

              2. 3

                I think the byte-order argument doesn’t hold when you mentioned ntohs and htons which are exactly where byte-order needs to be accounted for…

                1. 2

                  If you read the byte stream as a byte stream and shift them into position, there’s no need to check endianness of your machine (just need to know endianness of the stream) - the shifts will always do the right thing. That’s the point he was trying to make there.

                  1. 2

                    ntohs and htons do that exact thing and you don’t need to check endianess of your machine, so the comment about not understanding why they exist makes me feel like the author is not quite groking it. Those functions/macros can be implemented to do the exact thing linked to in the blog post.

              1. 3

                One problem with std::optional, at least at the moment, while it’s relatively new, is that std is opinionated, so you often won’t find library functions that work with a std::optional-based codebase.

                For example, parsing an integer from a string is a classic example of a function which might not succeed. So it would make sense to use std::optional to store the result. However, the standard library provides int stoi(const std::string& str, std::size_t* pos = 0, int base = 10) and friends, which signal failure by throwing exceptions.

                So, in theory, std::optional provides an alternative way to handle failure, somewhat like some haskell or rust code might, making the possibility of failure explicit in the type, and thus forcing you to explicitly handle it or pass it on. However, (unless a library exists which I’m not aware of?) you may need to reimplement large parts of the standard library to make them fit.

                1. 3

                  Right. “This is a feature of the standard library!” means something entirely different in C++ than in other programming languages.

                  1. 1

                    Can you make it much less of a headache by defining a generic function that takes a lambda, calls it in a try/catch, returns the successful value from the try branch, returns nullopt from the catch?

                    1. 1

                      There are any number of workarounds, that obfuscate the code to varying degrees. This same situation arose with Optional in java 8, it’s there, but not really, so a lot of places you’d like to use it you have to go through similar contortions. The other problem is if interacting with different teams writing different parts of the app; everyone has to be on the same page or you’ll end up wrapping/unwrapping optional all over. And libraries. In the end I found optionals were a lot of trouble for very little gain.

                      1. 1

                        I did wonder about that. However, blindly catching all different exceptions and effectively discarding the information about which exception it was seems unwise. Of course you could keep the information while still using sum types, but then you don’t really want std::optional, you want an either type which can hold either a valid value or an error code. I’m not sure whether the standard library has one of these or whether you’d have to roll your own.

                        1. 1

                          blindly catching all different exceptions and effectively discarding the information about which exception it was seems unwise

                          Sure, I wouldn’t be very happy with a blind try/catch around something like a database access or RPC call. Just if the thing you’re wrapping is something really boring like (say) parsing a string into an integer, the exception if it goes wrong isn’t going to be very interesting anyway.

                    1. 1

                      On the window vs global thing: for the love of compatibility, please put both in, both pointing to exactly the same object.

                      1. 23

                        This is a bit disappointing. It feels a bit like we are walking into the situation OpenGL was built to avoid.

                        1. 7

                          To be honest we are already in that situation.

                          You can’t really use GL on mac, it’s been stuck at D3D10 feature level for years and runs 2-3x slower than the same code under Linux on the same hardware.

                          It always seemed like a weird decision from Apple to have terrible GL support, like if I was going to write a second render backend I’d probably pick DX over Metal.

                          1. 6

                            I remain convinced that nobody really uses a Mac on macOS for anything serious.

                            And why pick DX over Metal when you can pick Vulkan over Metal?

                            1. 3

                              Virtually no gaming or VR is done on a mac. I assume the only devs to use Metal would be making video editors.

                              1. 1

                                This is a bit pedantic, but I play a lot of games on mac (mainly indie stuff built in Unity, since the “porting” is relatively easy), and several coworkers are also mac-only (or mac + console).

                                Granted, none of us are very interested in the AAA stuff, except a couple of games. But there’s definitely a (granted, small) market for this stuff. Luckily stuff like Unity means that even if the game only sells like 1k copies it’ll still be a good amount of money for “provide one extra binary from the engine exporter.”

                                The biggest issue is that Mac hardware isn’t shipping with anything powerful enough to run most games properly, even when you’re willing to spend a huge amount of money. So games like Hitman got ported but you can only run it on the most expensive MBPs or iMac Pros. Meanwhile you have sub-$1k windows laptops which can run the game (albeit not super well)

                              2. 2

                                I think Vulkan might have not been ready when Metal was first skecthed out – and Apple does not usually like to compromise on technology ;)

                                1. 2

                                  My recollection is that Metal appeared first (about June 2014), Mantle shipped shortly after (by a coupe months?), DX12 shows up mid-2015 and then Vulkan shows up in February 2016.

                                  I get a vague impression that Mantle never made tremendous headway (because who wants to rewrite their renderer for a super fast graphics API that only works on the less popular GPU?) and DX12 seems to have made surprisingly little (because targeting an API that doesn’t work on Win7 probably doesn’t seem like a great investment right now, I guess? Current Steam survey shows Win10 at ~56% and Win7+8 at about 40% market share among people playing videogames.)

                                  1. 2

                                    Mantle got heavily retooled into Vulkan, IIRC.

                                    1. 1

                                      And there was much rejoicing. ♥

                          1. 5

                            Congratulations to Lua, Zig, and Rust on being in C’s territory. Lua actually beat it. Nim and D are nearly where C++ is but not quite. Hope Nim closes that gap and any others given its benefits over C++, esp readability and compiling to C.

                            1. 1

                              To be clear, and a little pedantic, Lua =/= Luajit.

                              1. 1

                                The only thing I know about Lua is it’s a small, embeddable, JIT’d, scripting language. So, what did you mean by that? Do Lua the language and LuaJIT have separate FFI’s or something?

                                1. 5

                                  I think just that there are two implementations. One is just called “Lua”, is an interpreter written in C, supposedly runs pretty fast for a bytecode interpreter. The other is LuaJIT and runs much faster (and is the one benchmarked here).

                                  1. 1

                                    I didn’t even know that. Things I read on it made me think LuaJIT was the default version everyone was using. Thanks!

                                      1. 2

                                        I waited till I was having a cup of coffee. Wow, this is some impressive stuff. More than I had assumed. There’s a lot of reuse/blending of structures and space. I’m bookmarking the links in case I can use these techniques later.

                                      2. 2

                                        I think people when doing comparative benchmarks very often skip over the C Lua implementation because it isn’t so interesting to them.

                                    1. 4

                                      Extra context: LuaJIT isn’t up to date with the latest Lia either, so they’re almost different things, sorta.

                                      LuaJIT is extremely impressive.

                                1. 1

                                  I’m a little surprised the x87 is even involved here - doesn’t targeting “modern” x86 usually involve using the scalar SSE instructions since they have behave more predictably than x87 does?

                                  1. 3

                                    Even if your compiler emits exclusively SSE instructions for actual arithmetic, the de-facto-standard calling conventions on x86 (but not x86-64), cdecl and stdcall, return floating-point values from functions by sticking them onto the x87 FPU stack. So there will still be a handful of x87 instructions emitted solely to push/pop the FPU stack, even if no other x87 features are used, which seems to be what’s happening here. That convention was set ages ago and changing it would break ABI compatibility.

                                    1. 1

                                      Interesting, thank you!

                                  1. 25

                                    This seems a good time to promote a paper our team published last year (sorry to blow my own trumpet :P ): http://soft-dev.org/pubs/html/barrett_bolz-tereick_killick_mount_tratt__virtual_machine_warmup_blows_hot_and_cold_v6/

                                    We measured not only the warmup, but also the startup of lots of contemporary JIT compilers.

                                    On the a quad-core i7-4790 @ 3.6GHz with 32GB of RAM, running Debian 8:

                                    • C was the fastest to start up at 0.00075 secs (+/- 0.000029) – surprise!
                                    • LuaJIT was the next fastest to start up at 0.00389 secs (+/- 0.000442).
                                    • V8 was in 3rd at 0.08727 secs (+/- 0.000239).
                                    • The second slowest to start up was HHVM at 0.75270 secs (+/- 0.002056).
                                    • The slowest overall to start up was JRubyTruffle (now called TruffleRuby) at 2.66179 sec (+/- 0.011864). This is a Ruby implementation built on GraalVM (plain Java on GraalVM did much better in terms of startup).

                                    Table 3 in the linked paper has a full breakdown.

                                    The main outcome of the paper was that few of the VMs we benchmarked reliably achieved a steady state of peak performance after 2000 benchmark iterations, and some slowed down over time.

                                    1. 1

                                      I saw a talk about this. Very cool stuff! It is a good antidote to the thrall of benchmarks.

                                      1. 1

                                        Cool work! You should make that a submission on its own in the morning in case someone misses it due to a filter. For instance, people who don’t care about Python specifically like main post is tagged with. Just programming, performance, and compiler tags should do. Good news is a lot of people still saw and enjoyed it per the votes. You definitely deserve an “authored by” submission, though. :)

                                        1. 3

                                          It was on the lobsters front page about six months ago. https://lobste.rs/s/njsxtv/virtual_machine_warmup_blows_hot_cold

                                          It was a very good paper and I personally wouldn’t mind seeing it reposted, but I don’t actually know what the etiquette for that is here.

                                          1. 1

                                            I forgot. My bad. I should probably do a search next time.

                                      1. 6

                                        Elegant! The simplicity is really impressive, a real 80/20 kind of solution.

                                        Maybe you could solve the pipefail thing by having a tts utility that invokes the target program as a subprocess, capturing its stdout+err and then when it stops, wait4() it and return with the same error code the child did.

                                        1. 4

                                          Added to rtss - stderr of the child still goes to stderr, so redirection works as you’d expect.

                                          1. 2

                                            Nice. ❤

                                        1. 19

                                          I came expecting to be dazzled, and instead I was only mildly amused. It’s a good tool, and a great idea, but the use of “anything” in the title is only technically correct, not practically correct, and reeks of clickbait. In other words: “for some set of programs that provide verbose printing before and after calls to processes taking time, tss can add timestamps even if the programmer didn’t provide them.”

                                          I’m unlikely to adopt the use of tss, for my own programs, instead of emitting profiling data myself. But, for unfamiliar programs that happen to have a verbose mode, I can see this being a useful first step in finding issues.

                                          1. 4

                                            It’s very clever given the constraint “the SUT might be written in anything”. Many things come with a -v flag.

                                            The workflow the post discusses sounds like it would involve iteratively putting a few printfs in, seeing which ones have inconveniently large gaps between, adding printfs to whatever happens between, until the gaps between profile lines leave no particularly difficult mysteries.

                                            I suppose it does implicitly assume a short-ish edit compile run cycle, but that’s true for any workflow based on adding printf calls.

                                          1. 16

                                            Reminded me I’ve had Google Analytics code up on my blog since forever for no benefit for me whatsoever. Off it goes!

                                            1. 2

                                              Kudos for removing it but I am curious how Google Analytics ends up running on so many sites to begin with?

                                              1. 11

                                                It’s free, it’s very easy to setup and understand, and there is a lot of documentation out there on how to integrate it into different popular systems like Wordpress. It’s definitely invasive, but it’s hard to deny that it’s easy to integrate.

                                                1. 1

                                                  not as easy as doing nothing though… it’s free and easy to crawl around on all fours… that can be invasive too if you crawl under someone’s desk… but this still leaves the question why.

                                                  1. 5

                                                    Because a lot of the time when you’ve just made a site you want to see if anyone’s looking at it, or maybe what kind of browsers are hitting it, or how many bots, or whatever, so you set up analytics. Then time passes, you find out what you wanted to find out, and you stop caring if people are looking at the site, but the tracking code is still there.

                                                    1. 2

                                                      I’d compare it to CCTV cameras in shops. You visit the shop (the website) voluntarily so the owner can and will track you. We can agree that this is a bad thing under certain conditions, but as long as it’s technically trivial it will be done. No use arguing what is, you’d need a face mask or TOR to avoid it.

                                                      That said, I’d also prefer if it wasn’t Google Analytics on most pages but something that keeps the data strictly in the owner’s hands. I can wish for it to be deleted after a while all I want but my expectation is that all the laws in the world won’t change that to a 100% certainty.

                                                  2. 8

                                                    End-user-facing SaaS products are one thing. On a site I run on infrastructure that I run myself I can just look at the httpd logs¹ and doing so is way faster than looking at GA², but if I also bought a dozen other random SaaS products then the companies that run those won’t ship me httpd logs, but they will almost always give me a place to copy-paste in a GA tracking <script>. If I have to track usage on microsites and my main website, it’s nice if the same tracking works for all of them.

                                                    It has some useful features. I believe offhand that, if you wire up code to tell it what counts as a “conversion event”, GA can out the box tell you things like “which pages tended to correlate positively and negatives with people subsequently pushing the shiny green BUY NOW button?”

                                                    There’s a populace of people familiar with it. If you hire a head of marketing³, pretty much every single person in your hiring pool has used GA before, but almost none of them have scraped httpd logs with grep or used Piwik. (Though I would be surprised if they didn’t immediately find Piwik easy and pleasant to use.) So when that person says that they require quantitative analysis of visitor patterns in order to do their job⁴, they’re likely to phrase it as “put Google Analytics on the website, please.”

                                                    (¹ GA writes down a bunch of stuff that Apache won’t, out the box. GA won’t immediately write down everything you care about because you have to tell it what counts as a conversion if you want conversion funnel statistics.)

                                                    (² I have seriously no idea whatsoever how anybody manages to cope with using GA’s query interface on a day to day basis. It’s the most frustratingly laggy UI that I’ve ever used, and I’m including “running a shell and text editor inside ssh to a server on literally the opposite side of the planet” in this comparison. I think people who use GA regularly must have their expectations for software UI adjusted downward immensely.)

                                                    (³ or whatever job title you give to the person whose pay is predicated on making the chart titled “Purchases via our website” go up and to the right.)

                                                    (⁴ and they do! If you think they don’t, take it up with Ogilvy. He wrote a whole book and everything, you should read it.)

                                                    1. 1

                                                      what’s that book?

                                                      1. 3

                                                        The book is “Ogilvy on Advertising”. It’s not long, the prose is not boring and there are some nice pictures in it.

                                                        The main thing it’s about is how an iterative approach to advertising can sell a boatload of product. That is, running several different adverts, measuring how well each advert worked, then trying another set of variations based on what worked the first time. For measurement he writes about doings things like putting different adverts for the same product up, each with a different discount code printed on it, and then counting how many customers show up using the discount code that was in each of those adverts. These days you’ll see websites doing things like using tracking cookies to work out what the conversion rate was from each advert they ran.

                                                        Obviously the specific mechanisms they used for measurement back then are mostly obsolete now, but the underlying principle of evolving ad campaigns by putting out variations, measuring, then doubling down on the things you’ve demonstrated to work is timeless.

                                                        Ogilvy also writes a little bit about specific practical things that he’s found worked when he put them in adverts in the past, such as putting large amounts of copy on the advert rather than small amounts, font choice, attention-grabbing wording, how to write a CTA, black text on white backgrounds or vice-verse, what kinds of photos to run and so on. Many are probably still accurate because human beings don’t change much.

                                                        Many are plausibly wrong now because the practicalities of staring at a glowing screen aren’t identical to those of staring at a piece of paper. If you’re following the advice to in the first bit of the book about actually measuring things, then it won’t matter much to you how much is wrong or right because you’ll rapidly find out for yourself empirically anyway. :)

                                                        Hypothetically, let’s say you’ve done a lot of little-a agile software development: you might feel that the evolutionary approach to advertising is really, really obvious. Well, congratulations, but not all advertising is done that way, and quite a lot of work is sold on the basis of how fashionable and sophisticated it makes the buyer of the advertising job feel. Ogilvy conveys, in much less harsh words, that the correct response to this is to burn those scrubs to the fucking ground by outselling them a hundred to one.

                                                    2. 6

                                                      For me it was probably ego-stroking to find out how much traffic I was getting. I’ve been blogging for more than a decade and not always from hosts where logs were easily accessible.

                                                      1. 4

                                                        What gets me is why people care about how many hits their blog gets anyway. If I write a blog, the main target is actually myself (and maybe, MAYBE, one or two other people I’ll email individually too), and I put it on the internet just because it is really easy to. Same thing with my open source libraries: I offer them for download with the hopes that they may be useful… but it really means nothing to me if you use it or not, since the reason I wrote it in the first place is for myself (or again, somebody who emailed me or pinged me on irc and I had some time to kill by helping them out).

                                                        As such, I have no interest in analytics. It… really doesn’t matter if one or ten thousand people view the page, since it works for me and the individuals I converse with on email, and that’s my only goal.

                                                        So I think that yes, Google Analytics is easy and that’s why they got the marketshare, but before that, people had to believe analytics mattered and I’m not sure how exactly that happened. Maybe it is every random blogger buying into the “data-driven” hype thinking they’re going to be the next John Rockefeller in the marketplace of ideas… instead of the reality where most blogs are lucky to have two readers. (BTW I think these thoughts also apply to the otherwise baffling popularity of Medium.com.)

                                                        1. 1

                                                          Also, it’s invasive, sure but it’s also fairly high value even at the free level.

                                                          You get a LOT of data about your users from inserting that tracking info into your site.

                                                          Which leads me into my next question - what does all this pro-privacy stuff do to such a blog’s SEO?

                                                          (I know, I know, we’re not supposed to care about SEO - we’re Maverick developers expressing our cultural otherness and doing Maverick-y things…)

                                                          1. 2

                                                            Oh, it totally tanks SEO.

                                                            Alternately, the SEO consultants that get hired by biz request to have GA added anyways and they force you to bring it in. :(

                                                            1. 1

                                                              Google will derank pages what don’t have Google Analytics?

                                                      1. 1

                                                        Cool. I wasn’t aware of PRoot, rootless and the rootless-container project in general. Since there is no mention of fakeroot and fakechroot, do you know how this compares?

                                                        1. 2

                                                          fake{root,chroot} is based on an LD_PRELOAD-like syscall interception. It has the advantage of not depending on the kernels namespace implementation, but the disadvantage of having a performance penalty.

                                                          proot is an frontend for linux namespaces.

                                                          1. 1

                                                            Thank you for your response, I see. So it’s not possible to run it inside a cointainer then? fakeroot with ldpreload is a pain, you basically can’t debootstrap Jessie on Stretch because of this.

                                                            1. 1

                                                              I thought one of them did LD_PRELOAD interception, which was fast enough that you don’t notice the performance penalty, but doesn’t work for things (e.g. Go binaries?) that make syscalls directly rather than going through libc’s wrappers. and the other did ptrace() interception, which works on everything, but makes syscalls much slower (though compilers spend a large proportion of their time doing things which aren’t syscalls, so it’s like a 20% perf hit for random C programs last time I tried).

                                                              1. 2

                                                                Both are using LD_PRELOAD. What you are thinking of is fakeroot-ng(1), which is ptrace(2)-based.

                                                                1. 1

                                                                  Thank you.

                                                          1. 5

                                                            In case anyone who worked on this is reading here: oh my goodness this feature is incredibly slick, I am terrifically impressed. Very, very, very well done. ♥

                                                              1. 2

                                                                That looks very useful and I wasn’t aware of it.

                                                                But also looks quite different to me. Indeed she explicitly says:

                                                                “Granted mounting is not a requirement of building docker images. You can always go the route of orca-build and umoci and not mount at all. umoci is also an unprivileged image builder and was made long before I even made mine by the talented Aleksa Sarai who is also responsible for a lot of the rootless containers work upstream in runc.”

                                                                This pursues that approach, and is concerned with raw builds rather than k8s.

                                                                1. 1

                                                                  ^and is^and the OP is^

                                                                  1. 1

                                                                    FWIW you may edit your comments on this site, it’s much nicer than Twitter. ;)

                                                                    edit: oh, there’s a time limit.

                                                              1. 7

                                                                I was hoping to get some tips I could use, but my use of a CAPTCHA isn’t on the web, it’s to allow legacy Telnet access to my Multics installation. It all started with a MASSIVE amount of automatic cracking attempts, which I linked to the Mirai botnet, but simply have never slowed down!

                                                                This is issue is affecting many other legacy system providers as well (see their 07-Jun-17, 01-Dec-17, and 04-Dec-17 updates).

                                                                Example of what I’m seeing over a period of about two or three months:

                                                                 » mlt_ust_captcha
                                                                 CAPTCHA: 32 passed, 18632 failed.
                                                                

                                                                My solution was to present untrusted connections via legacy methods like telnet a text-based CAPTCHA - I am using only low ASCII characters for numbers and lowercase letters A through F, because at that stage of the connection, I can’t be sure exactly what terminal type the user is connecting with:

                                                                 Please input the following text to enable access from your host.
                                                                 You have 4 chances, 15 seconds each, or else you will be banned.
                                                                   _          _
                                                                  | |__    __| |  __ _   ___
                                                                  | '_ \  / _` | / _` | / __|
                                                                  | |_) || (_| || (_| || (__
                                                                  |_.__/  \__,_| \__,_| \___|
                                                                  
                                                                  >
                                                                

                                                                I tried various methods for turning the tables and lessening the burden of proof on the human to prove they are human, like examining keystroke timing, but everything I tried seemed to increase the false positive rate unacceptably!

                                                                My biggest complaint with this CAPTCHA system is, by it’s nature, it makes my resources inaccessible to computers - which means it also makes things inaccessible to those who depend on computer-based accessibility tools, such as those used by the blind.

                                                                For my Multics use-case, it’s OK, because there channels like Mosh or SSH connections that are exempted from the CAPTCHA and won’t affect blind or disabled users. As more and more of the web moves to programmed JavaScript-based pages, I worry that it’s becoming less accessible, or that disabled and blind users will be forced to experience second-rate presentation and content.

                                                                1. 5

                                                                  Could this visual ASCII art captcha be replaced by a plain string prompt like “please type the following word: peacock” that would work fine from a screen reader? No bot author is actively trying to break it, if I understand you correctly? The hordes of logins are just from a bot that wants to log into crappy IoT kit with exposed telnet and default passwords?

                                                                  1. 2

                                                                    That would probably work pretty well I imagine: bots which just target open telnet ports would fail, but computers could still easily be programmed to automatically log in (if the challenge is always on the form “Please type the following word: (.*)”).

                                                                    1. 2

                                                                      It guess it absolutely could - yes.

                                                                      My concern and reason for not doing so originally was a concern that such a trivially solvable solution would quickly be trivially solved.

                                                                      Of course, my concern might be overblown.

                                                                      1. 1

                                                                        Also, in my case - since I offer connections via SSH, Mosh, and VNC I’m less concerned, but, also, if you solve the CAPTCHA just once, that particular IP is exempted from having to solve it ever again.

                                                                  1. 4

                                                                    I have a headcanon that the real Satoshi is someone who dreamed up a working 2nd-preimage attack on SHA2 that takes about 5 minutes to run and, instead of burning it by telling everyone, dreamed up BTC as a really long-winded way of getting rich from it without needing to do anything that’d attract unwanted attention to themselves personally.

                                                                    1. 16

                                                                      TLDR;

                                                                      • In 2004 Apple, Mozilla and Opera were becoming increasingly concerned about the W3C’s direction with XHTML, lack of interest in HTML, and apparent disregard for the needs of real-world web developers and created WHATWG as a way to get control over the web standards
                                                                      • they throw away a whole stack of powerful web technologies (XHTML, XSLT…) whose purpose was to make the web both machine readable and useful to humans
                                                                      • they invented Live Standards that are a sort of ex-post standards: always evolving documents, unstable by design, designed by their hands-on committee, that no one else can really implement fully, to establish a dynamic oligopoly
                                                                      • in 2017, Google and Microsoft joined the WHATWG to form a Steering Group for “improving web standards”
                                                                      • meanwhile the W3C realized that their core business is not to help lobbies spread broken DRM technologies, and started working to a new version of the DOM API.
                                                                      • in 2018, after months of political negotiations, they proposed to move the working draft to recommendation
                                                                      • in 2018, Google, Microsoft, Apple and Mozilla felt offended by this lack of lip service.

                                                                      It’s worth noticing that both these groups have their center in the USA but their decisions affects the whole world.

                                                                      So we could further summarize that we have two groups, one controlled by USA lobbies and the other controlled by the most powerful companies in the world, fighting for the control of the most important infrastructure of the planet.

                                                                      Under Trump’s Presidency.

                                                                      Take this, science fiction! :-D

                                                                      1. 27

                                                                        This is somewhat disingenuous. Web browser’s HTML parser needs to be compatible with existing web, but W3C’s HTML4 specification couldn’t be used to build a web-compatible HTML parser, so reverse engineering was required for independent implementation. With WHATWG’s HTML5 specification, for the first time in history, a web-compatible HTML parsing got specified, with its adoption agency algorithm and all. This was a great achievement in standard writing.

                                                                        Servo is a beneficiary of this work. Servo’s HTML parser was written directly from the specification without any reverse engineering, and it worked! To the contrary to your implication, WHATWG lowered barrier to entry for independent implementation of web. Servo is struggling with CSS because CSS is still ill-specified in the manner of HTML4. For example, only reasonable specification of table layout is an unofficial draft: https://dbaron.org/css/intrinsic/ For a laugh, count the number of times “does not specify” appear in CSS2’s table chapter.

                                                                        1. 4

                                                                          You say Backwards compatibility is necessary, and yet Google managed to get all major sites to adopt AMP in a matter of months. AMP has even stricter validation rules than even XHTML.

                                                                          XHTML could have easily been successful, if it hadn’t been torpedoed by the WHATWG.

                                                                          1. 15

                                                                            That’s nothing to do with the amp technology, but with google providing CDN and preloading (I.e., IMHO abusing their market position)

                                                                            1. -1

                                                                              abusing their market position

                                                                              Who? Google? The web AI champion?

                                                                              No… they do no evil… they just want to protect their web!

                                                                          2. 2

                                                                            Disingenuous? Me? Really? :-D

                                                                            Who was in the working group that wrote CSS2 specification?

                                                                            I bet a coffee that each of those “does not specify” was the outcome of a political compromise.

                                                                            But again, beyond the technical stuffs, don’t you see a huge geopolitical issue?

                                                                          3. 15

                                                                            This is an interesting interpretation, but I’d call it incorrect.

                                                                            • the reason to create whatwg wasn’t about control
                                                                            • XHTML had little traction, because of developers
                                                                            • html5 (a whatwg standard fwiw) was the first meaningful HTML spec because it actually finally explained how to parse it
                                                                            • w3c didn’t “start working on a new Dom”. They copy/backport changes from whatwg hoping to provide stable releases for living standards
                                                                            • this has nothing to do with DRM (or EME). These after completely different people!
                                                                            • this isn’t about lobby groups, neither is this avout influencing politics in the US or anywhere.

                                                                            I’m not speaking on behalf of my function in the w3c working group I’m in, nor for Mozilla. But those positions provided me with the understanding and background information to post this comment.

                                                                            1. 8

                                                                              XHTML had little traction, because of developers

                                                                              I remember that in early 2000s everyone started to write <br/> instead of <br> and it was considered cool and modern. There were 80x15 badges everywhere saying website is in xhtml. My Motorola C380 phone supported wap and some xhtml websites, but not regular html in builtin browser. So I had impression that xhtml was very popular.

                                                                              1. 6

                                                                                xhtml made testing much easier. For me it changed many tests from using regexps (qr#<title>foo</title>#) to using any old XML parser and XPATH.

                                                                                1. 3

                                                                                  Agreed. Worth noting that, after the html5 parsing algorithm was fully specified and libraries like html5lib became available, it became possible to apply exactly the same approach with html5 parsers outputting a DOM structure and then querying it with xpath expressions.

                                                                              2. -1

                                                                                This is an interesting interpretation, but I’d call it incorrect.

                                                                                You are welcome. But given your arguments, I still stand with my political interpretation.

                                                                                the reason to create whatwg wasn’t about control

                                                                                I was 24 back then, and my reaction was “What? Why?”.

                                                                                My boss commented: “wrong question. You should ask: who?”

                                                                                XHTML had little traction, because of developers

                                                                                Are you sure?

                                                                                I wrote several web site back then using XML, XSLT and XInclude serverside to produce XHTML and CSS.

                                                                                It was a great technological stack for distributing contents over the web.

                                                                                w3c didn’t “start working on a new Dom”. They copy/backport changes from whatwg hoping to provide stable releases for living standards

                                                                                Well, had I wrote a technical document about an alternative DOM for the whole planet, without anyone asking me to, I would be glad if W3C had take my work into account!

                                                                                In what other way they can NOT waste WHATWG’s hard work?
                                                                                Wel, except saying: “guys, from now on do whatever Google, Apple, Microsoft and few other companies from the Silicon Valley tell you to do”.

                                                                                But I do not want to take part for W3C: to me, they lost their technical authority with EME (different group, but same organisation).

                                                                                The technical point is that we need stable, well thought, standards. What you call live standard, are… working draft?

                                                                                The political point is that no oligopoly should be in condition to dictate the architecture of the web to the world.

                                                                                And you know, in a state where strong cryptography is qualified as munitions and is subject to export restrictions.

                                                                                I’m not speaking on behalf of my function in the w3c working group I’m in, nor for Mozilla. But those positions provided me with the understanding and background information to post this comment.

                                                                                I have no doubt about your good faith.

                                                                                But probably your idealism is fooling you.

                                                                                As you try to see these facts from a wider perspective, you will see the problem I describe.

                                                                              3. 4

                                                                                XHTML was fairly clearly a mistake and unworkable in the real world, as shown by how many nominally XHTML sites weren’t, and didn’t validate as XHTML if you forced them to be treated as such. In an ideal world where everyone used tools that always created 100% correct XHTML, maybe it would have worked out, but in this one it didn’t; there are too many people generating too much content in too many sloppy ways for draconian error handling to work well. The whole situation was not helped by the content-type issue, where if you served your ‘XHTML’ as anything other than application/xhtml+xml it wasn’t interpreted as XHTML by browsers (instead it was HTML tag soup). One result was that you could have non-validating ‘XHTML’ that still displayed in browsers because they weren’t interpreting it as XHTML and thus weren’t using strict error handling.

                                                                                (This fact is vividly illustrated through syndication feeds and syndication feed handlers. In theory all syndication feed formats are strict and one of them is strongly XML based, so all syndication feeds should validate and you should be able to consume them with a strictly validating parser. In practice plenty of syndication feeds do not validate and anyone who wants to write a widely usable syndication feed parser that people will like cannot insist on strict error handling.)

                                                                                1. 2

                                                                                  there are too many people generating too much content in too many sloppy ways for draconian error handling to work well.

                                                                                  I do remember this argument was pretty popular back then, but I have never understood why.

                                                                                  I had no issue in generating xhtml strict pages from user contents. This real world company had a couple handred of customers with pretty various needs (from ecommerce, to online magazines or institutional web sites) and thousands of daily visitors.

                                                                                  We used XHTML and CSS to distribute highly accessible contents, and we had pretty good results with a prototype based on XLS-FO.

                                                                                  To me back then the call to real world issues seemed pretestuous. We literally had no issue. The issues I remember were all from IE.

                                                                                  You are right that many mediocre software were unable to produce proper XHTML. But is this an argument?

                                                                                  Do not fix the software, let’s break the specifications!

                                                                                  It seems a little childish!

                                                                                  XHTML was not perfect, but it was the right direction.

                                                                                  Look at what we have now instead: unparsable contents, hundreds of incompatible javascript frameworks, subtle bugs, bootstrap everywhere (aka much less creativity) and so on.

                                                                                  Who gain most from this unstructured complexity?

                                                                                  The same who now propose the final solution lock-in: web assembly.

                                                                                  Seeing linux running inside the browser is not funny anymore.

                                                                                  Going after incompetent developers was not democratization of the web, it was technological populism.

                                                                                  1. 2

                                                                                    What is possible does not matter; what matters is what actually happens in the real world. With XHTML, the answer is clear. Quite a lot of people spent years pushing XHTML as the way of the future on the web, enough people listened to them to generate a fair amount of ‘XHTML’, and almost none of it was valid and most of it was not being served as XHTML (which conveniently hid this invalidity).

                                                                                    Pragmatically, you can still write XHTML today. What you can’t do is force other people to write XHTML. The collective browser world has decided that one of the ways that people can’t force XHTML is by freezing the development of all other HTML standards, so XHTML is the only way forward and desirable new features appear only in XHTML. The philosophical reason for this decision is pretty clear; browsers ultimately serve users, and in the real world users are clearly not well served by a focus on fully valid XHTML only.

                                                                                    (Users don’t care about validation, they care about seeing web pages, because seeing web pages is their goal. Preventing them from seeing web pages is not serving them well, and draconian XHTML error handling was thus always an unstable situation.)

                                                                                    That the W3C has stopped developing XHTML and related standards is simply acknowledging this reality. There always have been and always will be a great deal of tag soup web pages and far fewer pages that validate, especially reliably (in XHTML or anything else). Handling these tag soup web pages is the reality of the web.

                                                                                    (HTML5 is a step forward for handling tag soup because for the first time it standardizes how to handle errors, so that browsers will theoretically be consistent in the face of them. XHTML could never be this step forward because its entire premise was that invalid web pages wouldn’t exist and if they did exist, browsers would refuse to show them.)

                                                                                    1. 0

                                                                                      Users don’t care about validation, they care about seeing web pages, because seeing web pages is their goal.

                                                                                      Users do not care about the quality of concrete because having a home is their goal.
                                                                                      There will always be incompetent architects, thus let them work their way so that people get what they want.

                                                                                      Users do not care about car safety because what they want is to move from point A to point B.
                                                                                      There will always be incompetent manufacturers, thus let them work their way so that people get what they want.

                                                                                      That’s not how engineering (should) work.

                                                                                      Was XHTML flawless? No.
                                                                                      Was it properly understood by the average web developers that most companies like to hire? No.

                                                                                      Was it possible to improve it? Yes. Was it better tha the current javascript driven mess? Yes!

                                                                                      The collective browser world has decided…

                                                                                      Collective browser world? ROTFL!

                                                                                      There’s a huge number of browsers’ implementors that nobody consulted.

                                                                                      Among others, in 2004, the most widely used browser, IE, did not join WHATWG.

                                                                                      Why WHATWG did not used the IE design if the goal was to liberate developers from the burden of well designed tools?

                                                                                      Why we have faced for years incompatibilities between browsers?

                                                                                      WHATWG was turned into one of the weapons in a commercial war for the control of the web.

                                                                                      Microsoft lost such war.

                                                                                      As always, the winner write the history that everybody know and celebrate.

                                                                                      But who is old enough to remember the fact, can see the hypocrisy of these manoeuvres pretty well.

                                                                                      There was no technical reason to throw away XHTML. The reasons were political and economical.

                                                                                      How can you sell Ads if a tool can easily remove them from the XHTML code? How can you sell API access to data, if a program can easily consume the same XHTML that users consume? How can you lock users, if they can consume the web without a browser? Or with a custom one?

                                                                                      The WHATWG did not served users’ interests, whatever were the Mozilla’s intentions in 2004.

                                                                                      They served some businesses at the expense of the users and of all the high quality web companies that didn’t have much issues with XHTML.

                                                                                      Back then it was possible to disable Javascript without loosing access to the web functionalities.

                                                                                      Try it now.

                                                                                      Back then people were exploring the concept of semantic web with the passion people now talk about the last JS framework.

                                                                                      I remember experiments with web readers for blind people that could never work with the modern js polluted web.

                                                                                      You are right, W3C abandoned its leadership in the engineering of the web back then.

                                                                                      But you can’t sell to a web developer bullshit about HTML5.

                                                                                      Beyond few new elements and a slightly more structured page (that could have been done in XHTML too) all its exciting innovations were… more Javascript.

                                                                                      Users did not gain anything good from this, just less control over contents, more ads, and a huge security hole worldwide.

                                                                                      Because, you know, when you run a javascript in Spain that was served to you from a server in the USA, who is responsible for such javascript running on your computer? Under which law?

                                                                                      Do you really think that such legal issues were not taken into account from the browser vendors that flued this involution of the web?

                                                                                      I cannot believe they were so incompetent.

                                                                                      They knew what they were doing, and did it on purpose.

                                                                                      Not to serve their users. To use those who trusted them.

                                                                                2. 0

                                                                                  The mention of Trump is pure trolling—as you yourself point out, the dispute predates Trump.

                                                                                  1. 6

                                                                                    I think it’s more about all of this sounding like a science fiction plot than just taking a jab at the Trump presidency; just a few years ago nobody would have predicted that would have happened. So, no, not pure trolling.

                                                                                    1. 2

                                                                                      Fair enough. I’m sorry for the accusation.

                                                                                      Since the author is critical of Apple/Google/Mozilla here, I took it as a sort of guilt by association attack on them (I don’t mind jabs at Trump), but I see that it probably wasn’t that.

                                                                                      1. 2

                                                                                        No problem.

                                                                                        I didn’t saw such possible interpretation or I wouldn’t have written that line. Sorry.

                                                                                    2. 3

                                                                                      After 20 years of Berlusconi and with our current empasse with the Government, no Italian could ever troll an American about his current President.

                                                                                      It was not my intention in any way.

                                                                                      As @olivier said, I was pointing to this surreal situation from an international perspective.

                                                                                      USA control most of internet: most root DNS, the most powerful web companies, the standards of the web and so on.

                                                                                      Whatever effect Cambridge Analitica had to the election of Trump, it has shown the world that internet is a common infrastructure that we have to control and protect together. Just like we should control the production of oxigen and global warming.

                                                                                      If Cambridge Analitica was able to manipulate USA elections (by manipulating Americans), what could do Facebook itself in Italy? Or in German?
                                                                                      Or what could Google do in France?

                                                                                      The Internet was a DARPA project. We can see it is a military success beyond any expectation.

                                                                                      I tried to summarize the debacle between W3C and WHATWG with a bit of irony because, in itself, it shows a pretty scary aspect of this infrastructure.

                                                                                      The fact that a group of companies dares to challenge W3C (that, at least in theory, is an international organisation) is an evidence that they do not feel the need to pretend they are working for everybody.

                                                                                      They have too much power, to care.

                                                                                      1. 4

                                                                                        The last point is the crux of the issue: are technologists willing to do the leg work of decentralizing power?

                                                                                        Because regular people won’t do this. They don’t care. This, they should have less say in the issue, though still some, as they are deeply affected by it too.

                                                                                        1. 0

                                                                                          No. Most won’t.

                                                                                          Technologist are a wide category, that etymologically includes everyone that feel entitled to speak about how to do things.

                                                                                          So we have technologists that mislead people to invest in the “blockchain revolution”, technologists that mislead politicians to allow barely tested AI to kill people on the roads, technologists teaching in the Universities that neural networks computations cannot be explained and thus must be trusted as superhuman oracles… and technologists that classify as troll any criticism of mainstream wisdom.

                                                                                          My hope is in hackers: all over the world they have a better understanding of their political role.

                                                                                        2. 2

                                                                                          If anyone wonders about Berlusconi, Cracked has a great article on him that had me calling Trump a pale imitation of Berlusconi and his exploits. Well, until Trump got into US Presidency which is a bigger achievement than Berlusconi. He did that somewhat by accident, though. Can’t last 20 years either. I still think Berlusconi has him beat at biggest scumbag of that type.

                                                                                          1. 2

                                                                                            Yeah, the article is funny, but Berlusconi was not. Not for Italians.

                                                                                            His problems with women did not impress much us. But for when it became clear most of them were underage.

                                                                                            But the demage he did to our laws and (worse) to our public ethics will last for decades.
                                                                                            He did not just changed the law to help himself: he destroyed most legal tools to fight the organized crime and to fight bribes and corruption.
                                                                                            Worse he helped a whole generation of younger people like him to be bold about their smartness with law workarounds.

                                                                                            I pray for the US and the whole world that Trump is not like him.

                                                                                    1. 1

                                                                                      Staring at the profile page trying to remember if I ever bothered setting up a Gravatar on this email address.

                                                                                      1. 24

                                                                                        MISRA (the automotive applications standard) specifically requires single-exit point functions. While refactoring some code to satisfy this requirement, I found a couple of bugs related to releasing resources before returning in some rarely taken code paths. With a single return point, we moved the resource release to just before the return. https://spin.atomicobject.com/2011/07/26/in-defence-of-misra/ provides another counterpoint though it wasn’t convincing when I read it the first time.

                                                                                        1. 8

                                                                                          This is probably more relevant for non-GC languages. Otherwise, using labels and goto would work even better!

                                                                                          1. 2

                                                                                            Maybe even for assembly, where before returning you must manually ensure stack pointer is in right place and registers are restored. In this case, there’s more chances to introduce bugs if there are multiple returns (and it might be harder for disassembly when debugging embedded code).

                                                                                            1. 1

                                                                                              In some sense this is really just playing games with semantics. You still have multiple points of return in your function… just not multiple literal RET instructions. Semantically the upshot is that you have multiple points of return but also a convention for a user-defined function postamble. Which makes sense, of course.

                                                                                            2. 2

                                                                                              Sure, but we do still see labels and gotos working quite well under certain circumstances. :)

                                                                                              For me, I like single-exit-point functions because they’re a bit easier to instrument for debugging, and because I’ve had many time where missing a return caused some other code to execute that wasn’t expected–with this style, you’re already in a tracing mindset.

                                                                                              Maybe the biggest complaint I have is that if you properly factor these then you tend towards a bunch of nested functions checking conditions.

                                                                                              1. 2

                                                                                                Remember the big picture when focusing on a small, specific issue. The use of labels and goto might help for this problem. It also might throw off automated, analysis tools looking for other problems. These mismatches between what humans and machines understand is why I wanted real, analyzable macros for systems languages. I had one for error handling a long time ago that looked clean in code but generated the tedious, boring form that machines handle well.

                                                                                                I’m sure there’s more to be gleaned using that method. Even the formal methodists are trying it now with “natural” theorem provers that hide the mechanical stuff a bit.

                                                                                                1. 2

                                                                                                  Yes, definitely – I think in general if we were able to create abstractions from within the language directly to denote these specific patterns (in that case, early exits), we gain on all levels: clarity, efficiency and the ability to update the tools to support it. Macros and meta-programming are definitely much better options – or maybe something like an ability to easily script compiler passes and include the scripts as part of the build process, which would push the idea of meta-programming one step further.

                                                                                              2. 5

                                                                                                I have mixed feelings about this. I think in an embedded environment it makes sense because cleaning up resources is so important. But the example presented in that article is awful. The “simpler” example isn’t actually simpler (and it’s actually different).

                                                                                                Overall, I’ve only ever found that forcing a single return in a function often makes the code harder to read. You end up setting and checking state all of the time. Those who say (and I don’t think you’re doing this here) that you should use a single return because MISRA C does it seem to ignore the fact that there are specific restrictions in the world MISRA is targetting.

                                                                                                1. 4

                                                                                                  Golang gets around this with defer though that can incur some overhead.

                                                                                                  1. 8

                                                                                                    C++, Rust, etc. have destructors, which do the work for you automatically (the destructor/drop gets called when a value goes out of scope).

                                                                                                    1. 1

                                                                                                      Destructors tie you to using objects, instead of just calling a function. It also makes cleanup implicit vs. defer which is more explicit.

                                                                                                      The golang authors could have implemented constructors and destructors but generally the philosophy is make the zero value useful, and don’t add to the runtime where you could just call a function.

                                                                                                    2. 4

                                                                                                      defer can be accidentally forgotten, while working around RAII / scoped resource usage in Rust or C++ is harder.

                                                                                                    3. 2

                                                                                                      Firstly he doesn’t address early return from error condition at all.

                                                                                                      And secondly his example of single return…

                                                                                                      singleRet(){
                                                                                                          int rt=0;
                                                                                                          if(a){
                                                                                                              if(b && c){
                                                                                                                  rt=2;
                                                                                                              }else{
                                                                                                                  rt=1;
                                                                                                              }
                                                                                                          }
                                                                                                          return rt;
                                                                                                      }
                                                                                                      

                                                                                                      Should be simplified to…

                                                                                                      a ? (b && c ? 2 : 1) : 0
                                                                                                      
                                                                                                      1. 1

                                                                                                        Are you sure that wasn’t a result of having closely examined the control flow while refsctoring, rather than a positive of the specific form you normalised the control flow into? Plausibly you might have spotted the same bugs if you’d been changing it all into any other specific control flow format which involved not-quite-local changes?

                                                                                                      1. 3

                                                                                                        This seems pleasingly elegant. I wonder, if this were a built in feature, could it be given the ability to renumber occasionally on the fly whenever the numbering gets too dense in some part of the order, so as to completely lift that just-under-ten-million-subdivisions limit. ;)

                                                                                                        I recently found out that MS SQL server has a data type designed for a similar use case, called hierarchyid. It does paths into user-ordered trees and sounds like it’s meant for comment threading. Quoth the manual:

                                                                                                        By using the GetDescendant method, it is always possible to generate a sibling to the right of any given node, to the left of any given node, or between any two siblings.

                                                                                                        and the amount of data they claim this uses is pretty minimal - for trees with low fan out. Not sure what the space bounds are for high fan-out trees such as a tree with one parent and thousands of children, though.